I ran into similar issues with Protobuf-Java in a different use case, and ended up maintaining a zero-allocation fork of JavaNano for a few years. However, between needing to support additional features, and Google abandoning the project, I decided to write a new Protobuf library that's entirely Java based.
It's allocation free in steady state, can be used with off-heap memory, and for most use cases has roughly 2x the throughput of Protobuf-Java. https://github.com/HebiRobotics/QuickBuffers I know it's a bit late for this thread, but I couldn't share the previous version. Maybe it still helps. Florian On Tuesday, February 12, 2019 at 10:31:45 PM UTC+1, Muruga Prasath Ganesan wrote: > > > shevek, > Can you Please tell us the final solution that you implemented to fix the > issue? > > On Sunday, January 20, 2019 at 2:10:51 AM UTC-7, Shevek wrote: >> >> This project is very much in-progress. >> >> We need to sort about 1e13 records, several terabytes when compressed, >> sort-merge, and end up with about 1e10 in sqlite. Right now, we are >> running sqlite with 1e9 objects, and it isn't an issue. sqlite is much >> better than one would naively believe it to be, if used appropriately. >> Oddly enough, its VM is several times faster than pg, for IO-free raw >> mathematical computation, too. >> >> Our current bottleneck is the serialization and allocation overhead of >> protobuf. Many of the serializers recommended on this list can only >> serialize fixed-size structures, but we're working on an implementation >> with flatbuf right now. Thank you, Georges. flatbuf will also permit us >> to avoid having a separate serialized copy of the sort-key. >> >> We are going to experiment with reading files via mmap rather than I/O, >> but we have not yet done so. It's tempting to find some way to call >> madvise(SEQUENTIAL) on the mmap. Not sure what the other effects are >> likely to be, however, but it may help us keep most/all of the data >> effectively off-heap during the merge phase. >> >> We have mastered all the (currently known) GC issues, thank you, Gil. >> >> Accessing the objects fast by id is not currently possible, although >> it's definitely an angle we could pursue. A major purpose of this sort >> is to merge identical objects, or data under the same key, so even if we >> did store by id, it would have to be a mutable store, which would have >> its own issues. >> >> We started with https://github.com/cowtowncoder/java-merge-sort and >> assumed that due to the simplicity of that implementation, it would be >> easy to do better, but it turns out that the simplicity of that >> particular implementation is not actually a significant limiting factor. >> However, it turns out that once one has done the serialization, a custom >> version of Guava's Iterators.mergeSorted() is somewhat better. >> >> S. >> >> >> On 1/19/19 3:28 PM, Steven Stewart-Gallus wrote: >> > I'm really confused. >> > >> > You're talking about putting the data into sqlite which suggests there >> > really isn't so much log data and it could be filtered with a hacky >> > shell script. But then you're talking about a lot of heavy optimisation >> > which suggests you really may need to put in custom effort. Precisely >> > how much log data really needs to be filtered? You're unlikely to be >> > able to filter much of the data faster than the system utilities which >> > are often very old and well-optimised C code. I'm reminded about the >> old >> > story of the McIlroy and Knuth word count programs. >> > >> > Anyway while this is a very enlightening discussion it is probably >> > worthwhile to reuse as much existing system utilities and code as you >> > can instead of writing your own. >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "mechanical-sympathy" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an email to mechanical-sympathy+unsubscr...@googlegroups.com >> > <mailto:mechanical-sympathy+unsubscr...@googlegroups.com>. >> > For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/615dd13d-eebd-42c0-a72b-c7eefb586f6f%40googlegroups.com.