I ran into similar issues with Protobuf-Java in a different use case, and 
ended up maintaining a zero-allocation fork of JavaNano for a few years. 
However, between needing to support additional features, and Google 
abandoning the project, I decided to write a new Protobuf library that's 
entirely Java based.

It's allocation free in steady state, can be used with off-heap memory, and 
for most use cases has roughly 2x the throughput of Protobuf-Java.

https://github.com/HebiRobotics/QuickBuffers

I know it's a bit late for this thread, but I couldn't share the previous 
version. Maybe it still helps.

Florian


On Tuesday, February 12, 2019 at 10:31:45 PM UTC+1, Muruga Prasath Ganesan 
wrote:
>
>
> shevek,
> Can you Please tell us the final solution that you implemented to fix the 
> issue?
>
> On Sunday, January 20, 2019 at 2:10:51 AM UTC-7, Shevek wrote:
>>
>> This project is very much in-progress. 
>>
>> We need to sort about 1e13 records, several terabytes when compressed, 
>> sort-merge, and end up with about 1e10 in sqlite. Right now, we are 
>> running sqlite with 1e9 objects, and it isn't an issue. sqlite is much 
>> better than one would naively believe it to be, if used appropriately. 
>> Oddly enough, its VM is several times faster than pg, for IO-free raw 
>> mathematical computation, too. 
>>
>> Our current bottleneck is the serialization and allocation overhead of 
>> protobuf. Many of the serializers recommended on this list can only 
>> serialize fixed-size structures, but we're working on an implementation 
>> with flatbuf right now. Thank you, Georges. flatbuf will also permit us 
>> to avoid having a separate serialized copy of the sort-key. 
>>
>> We are going to experiment with reading files via mmap rather than I/O, 
>> but we have not yet done so. It's tempting to find some way to call 
>> madvise(SEQUENTIAL) on the mmap. Not sure what the other effects are 
>> likely to be, however, but it may help us keep most/all of the data 
>> effectively off-heap during the merge phase. 
>>
>> We have mastered all the (currently known) GC issues, thank you, Gil. 
>>
>> Accessing the objects fast by id is not currently possible, although 
>> it's definitely an angle we could pursue. A major purpose of this sort 
>> is to merge identical objects, or data under the same key, so even if we 
>> did store by id, it would have to be a mutable store, which would have 
>> its own issues. 
>>
>> We started with https://github.com/cowtowncoder/java-merge-sort and 
>> assumed that due to the simplicity of that implementation, it would be 
>> easy to do better, but it turns out that the simplicity of that 
>> particular implementation is not actually a significant limiting factor. 
>> However, it turns out that once one has done the serialization, a custom 
>> version of Guava's Iterators.mergeSorted() is somewhat better. 
>>
>> S. 
>>
>>
>> On 1/19/19 3:28 PM, Steven Stewart-Gallus wrote: 
>> > I'm really confused. 
>> > 
>> > You're talking about putting the data into sqlite which suggests there 
>> > really isn't so much log data and it could be filtered with a hacky 
>> > shell script. But then you're talking about a lot of heavy optimisation 
>> > which suggests you really may need to put in custom effort. Precisely 
>> > how much log data really needs to be filtered? You're unlikely to be 
>> > able to filter much of the data faster than the system utilities which 
>> > are often very old and well-optimised C code. I'm reminded about the 
>> old 
>> > story of the McIlroy and Knuth word count programs. 
>> > 
>> > Anyway while this is a very enlightening discussion it is probably 
>> > worthwhile to reuse as much existing system utilities and code as you 
>> > can instead of writing your own. 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> > Groups "mechanical-sympathy" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> > an email to mechanical-sympathy+unsubscr...@googlegroups.com 
>> > <mailto:mechanical-sympathy+unsubscr...@googlegroups.com>. 
>> > For more options, visit https://groups.google.com/d/optout. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/615dd13d-eebd-42c0-a72b-c7eefb586f6f%40googlegroups.com.

Reply via email to