Re: Sorting a very large number of objects

2019-11-22 Thread Florian Enner
I ran into similar issues with Protobuf-Java in a different use case, and 
ended up maintaining a zero-allocation fork of JavaNano for a few years. 
However, between needing to support additional features, and Google 
abandoning the project, I decided to write a new Protobuf library that's 
entirely Java based.

It's allocation free in steady state, can be used with off-heap memory, and 
for most use cases has roughly 2x the throughput of Protobuf-Java.

https://github.com/HebiRobotics/QuickBuffers

I know it's a bit late for this thread, but I couldn't share the previous 
version. Maybe it still helps.

Florian


On Tuesday, February 12, 2019 at 10:31:45 PM UTC+1, Muruga Prasath Ganesan 
wrote:
>
>
> shevek,
> Can you Please tell us the final solution that you implemented to fix the 
> issue?
>
> On Sunday, January 20, 2019 at 2:10:51 AM UTC-7, Shevek wrote:
>>
>> This project is very much in-progress. 
>>
>> We need to sort about 1e13 records, several terabytes when compressed, 
>> sort-merge, and end up with about 1e10 in sqlite. Right now, we are 
>> running sqlite with 1e9 objects, and it isn't an issue. sqlite is much 
>> better than one would naively believe it to be, if used appropriately. 
>> Oddly enough, its VM is several times faster than pg, for IO-free raw 
>> mathematical computation, too. 
>>
>> Our current bottleneck is the serialization and allocation overhead of 
>> protobuf. Many of the serializers recommended on this list can only 
>> serialize fixed-size structures, but we're working on an implementation 
>> with flatbuf right now. Thank you, Georges. flatbuf will also permit us 
>> to avoid having a separate serialized copy of the sort-key. 
>>
>> We are going to experiment with reading files via mmap rather than I/O, 
>> but we have not yet done so. It's tempting to find some way to call 
>> madvise(SEQUENTIAL) on the mmap. Not sure what the other effects are 
>> likely to be, however, but it may help us keep most/all of the data 
>> effectively off-heap during the merge phase. 
>>
>> We have mastered all the (currently known) GC issues, thank you, Gil. 
>>
>> Accessing the objects fast by id is not currently possible, although 
>> it's definitely an angle we could pursue. A major purpose of this sort 
>> is to merge identical objects, or data under the same key, so even if we 
>> did store by id, it would have to be a mutable store, which would have 
>> its own issues. 
>>
>> We started with https://github.com/cowtowncoder/java-merge-sort and 
>> assumed that due to the simplicity of that implementation, it would be 
>> easy to do better, but it turns out that the simplicity of that 
>> particular implementation is not actually a significant limiting factor. 
>> However, it turns out that once one has done the serialization, a custom 
>> version of Guava's Iterators.mergeSorted() is somewhat better. 
>>
>> S. 
>>
>>
>> On 1/19/19 3:28 PM, Steven Stewart-Gallus wrote: 
>> > I'm really confused. 
>> > 
>> > You're talking about putting the data into sqlite which suggests there 
>> > really isn't so much log data and it could be filtered with a hacky 
>> > shell script. But then you're talking about a lot of heavy optimisation 
>> > which suggests you really may need to put in custom effort. Precisely 
>> > how much log data really needs to be filtered? You're unlikely to be 
>> > able to filter much of the data faster than the system utilities which 
>> > are often very old and well-optimised C code. I'm reminded about the 
>> old 
>> > story of the McIlroy and Knuth word count programs. 
>> > 
>> > Anyway while this is a very enlightening discussion it is probably 
>> > worthwhile to reuse as much existing system utilities and code as you 
>> > can instead of writing your own. 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> > Groups "mechanical-sympathy" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> > an email to mechanical-sympathy+unsubscr...@googlegroups.com 
>> > . 
>> > For more options, visit https://groups.google.com/d/optout. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/615dd13d-eebd-42c0-a72b-c7eefb586f6f%40googlegroups.com.


Re: Sorting a very large number of objects

2019-02-12 Thread Muruga Prasath Ganesan

shevek,
Can you Please tell us the final solution that you implemented to fix the 
issue?

On Sunday, January 20, 2019 at 2:10:51 AM UTC-7, Shevek wrote:
>
> This project is very much in-progress. 
>
> We need to sort about 1e13 records, several terabytes when compressed, 
> sort-merge, and end up with about 1e10 in sqlite. Right now, we are 
> running sqlite with 1e9 objects, and it isn't an issue. sqlite is much 
> better than one would naively believe it to be, if used appropriately. 
> Oddly enough, its VM is several times faster than pg, for IO-free raw 
> mathematical computation, too. 
>
> Our current bottleneck is the serialization and allocation overhead of 
> protobuf. Many of the serializers recommended on this list can only 
> serialize fixed-size structures, but we're working on an implementation 
> with flatbuf right now. Thank you, Georges. flatbuf will also permit us 
> to avoid having a separate serialized copy of the sort-key. 
>
> We are going to experiment with reading files via mmap rather than I/O, 
> but we have not yet done so. It's tempting to find some way to call 
> madvise(SEQUENTIAL) on the mmap. Not sure what the other effects are 
> likely to be, however, but it may help us keep most/all of the data 
> effectively off-heap during the merge phase. 
>
> We have mastered all the (currently known) GC issues, thank you, Gil. 
>
> Accessing the objects fast by id is not currently possible, although 
> it's definitely an angle we could pursue. A major purpose of this sort 
> is to merge identical objects, or data under the same key, so even if we 
> did store by id, it would have to be a mutable store, which would have 
> its own issues. 
>
> We started with https://github.com/cowtowncoder/java-merge-sort and 
> assumed that due to the simplicity of that implementation, it would be 
> easy to do better, but it turns out that the simplicity of that 
> particular implementation is not actually a significant limiting factor. 
> However, it turns out that once one has done the serialization, a custom 
> version of Guava's Iterators.mergeSorted() is somewhat better. 
>
> S. 
>
>
> On 1/19/19 3:28 PM, Steven Stewart-Gallus wrote: 
> > I'm really confused. 
> > 
> > You're talking about putting the data into sqlite which suggests there 
> > really isn't so much log data and it could be filtered with a hacky 
> > shell script. But then you're talking about a lot of heavy optimisation 
> > which suggests you really may need to put in custom effort. Precisely 
> > how much log data really needs to be filtered? You're unlikely to be 
> > able to filter much of the data faster than the system utilities which 
> > are often very old and well-optimised C code. I'm reminded about the old 
> > story of the McIlroy and Knuth word count programs. 
> > 
> > Anyway while this is a very enlightening discussion it is probably 
> > worthwhile to reuse as much existing system utilities and code as you 
> > can instead of writing your own. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "mechanical-sympathy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to mechanical-sympathy+unsubscr...@googlegroups.com 
>  
> > . 
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting a very large number of objects

2019-01-20 Thread Shevek

This project is very much in-progress.

We need to sort about 1e13 records, several terabytes when compressed, 
sort-merge, and end up with about 1e10 in sqlite. Right now, we are 
running sqlite with 1e9 objects, and it isn't an issue. sqlite is much 
better than one would naively believe it to be, if used appropriately. 
Oddly enough, its VM is several times faster than pg, for IO-free raw 
mathematical computation, too.


Our current bottleneck is the serialization and allocation overhead of 
protobuf. Many of the serializers recommended on this list can only 
serialize fixed-size structures, but we're working on an implementation 
with flatbuf right now. Thank you, Georges. flatbuf will also permit us 
to avoid having a separate serialized copy of the sort-key.


We are going to experiment with reading files via mmap rather than I/O, 
but we have not yet done so. It's tempting to find some way to call 
madvise(SEQUENTIAL) on the mmap. Not sure what the other effects are 
likely to be, however, but it may help us keep most/all of the data 
effectively off-heap during the merge phase.


We have mastered all the (currently known) GC issues, thank you, Gil.

Accessing the objects fast by id is not currently possible, although 
it's definitely an angle we could pursue. A major purpose of this sort 
is to merge identical objects, or data under the same key, so even if we 
did store by id, it would have to be a mutable store, which would have 
its own issues.


We started with https://github.com/cowtowncoder/java-merge-sort and 
assumed that due to the simplicity of that implementation, it would be 
easy to do better, but it turns out that the simplicity of that 
particular implementation is not actually a significant limiting factor. 
However, it turns out that once one has done the serialization, a custom 
version of Guava's Iterators.mergeSorted() is somewhat better.


S.


On 1/19/19 3:28 PM, Steven Stewart-Gallus wrote:

I'm really confused.

You're talking about putting the data into sqlite which suggests there 
really isn't so much log data and it could be filtered with a hacky 
shell script. But then you're talking about a lot of heavy optimisation 
which suggests you really may need to put in custom effort. Precisely 
how much log data really needs to be filtered? You're unlikely to be 
able to filter much of the data faster than the system utilities which 
are often very old and well-optimised C code. I'm reminded about the old 
story of the McIlroy and Knuth word count programs.


Anyway while this is a very enlightening discussion it is probably 
worthwhile to reuse as much existing system utilities and code as you 
can instead of writing your own.


--
You received this message because you are subscribed to the Google 
Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to mechanical-sympathy+unsubscr...@googlegroups.com 
.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting a very large number of objects

2019-01-19 Thread Greg Young
Given the list and people having many levels of experience/age/etc its
probably worth expounding on : I'm reminded about the old story of the
McIlroy and Knuth word count programs.

https://franklinchen.com/blog/2011/12/08/revisiting-knuth-and-mcilroys-word-count-programs/

On Sat, Jan 19, 2019 at 6:28 PM Steven Stewart-Gallus
 wrote:
>
> I'm really confused.
>
> You're talking about putting the data into sqlite which suggests there really 
> isn't so much log data and it could be filtered with a hacky shell script. 
> But then you're talking about a lot of heavy optimisation which suggests you 
> really may need to put in custom effort. Precisely how much log data really 
> needs to be filtered? You're unlikely to be able to filter much of the data 
> faster than the system utilities which are often very old and well-optimised 
> C code. I'm reminded about the old story of the McIlroy and Knuth word count 
> programs.
>
> Anyway while this is a very enlightening discussion it is probably worthwhile 
> to reuse as much existing system utilities and code as you can instead of 
> writing your own.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Studying for the Turing test

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting a very large number of objects

2019-01-19 Thread Steven Stewart-Gallus
I'm really confused.

You're talking about putting the data into sqlite which suggests there 
really isn't so much log data and it could be filtered with a hacky shell 
script. But then you're talking about a lot of heavy optimisation which 
suggests you really may need to put in custom effort. Precisely how much 
log data really needs to be filtered? You're unlikely to be able to filter 
much of the data faster than the system utilities which are often very old 
and well-optimised C code. I'm reminded about the old story of the McIlroy 
and Knuth word count programs.

Anyway while this is a very enlightening discussion it is probably 
worthwhile to reuse as much existing system utilities and code as you can 
instead of writing your own.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting a very large number of objects

2018-11-14 Thread Shevek
It's a batch job. We are taking log files from N days, merging records 
by name, and building a sqlite database out of them. So the time to 
first record isn't important at all, it's entirely about time to 
completion for the whole dataset. What we have is effectively an N-way 
map-reduce in Java.


We need another two orders of magnitude in performance, but I can see a 
set of paths which ought to get us at least one.


There is also the option to swap out sqlite for C* or scylla or 
something, if we think that the distributed update will be better than a 
local sort-merge, but this isn't currently clear, and it isn't going to 
happen this cycle.


S.

On 11/14/18 3:51 PM, Michael Meehan wrote:
Is the thing that's consuming your list a stream-consumer? Does the 
consumer for the next operation need the whole sorted list, or just the 
first element of the sorted list (the largest/smallest)? If both of 
those are true, it's most important that you deliver the first element 
quickly, and achieve reasonable throughput (not slower than the next 
consumer in the stream can process) after. In that case, you don't want 
to take a lot of time to generate a fully sorted list, you want to take 
a very small time to find the first element to pass on!




On Mon, Nov 12, 2018 at 4:57 PM Shevek > wrote:


We are doing sorting by proxy. Right now I have a byte[] serialized as:

[sort-key0, data0, sort-key1, data1, ...]

and a comparator which can compare either key-bytes or
values-by-key. We
then sort a separate integer array which points to each sort-key's
offset in the underlying array, then we emit by walking the integer
array.

The challenge is to merge duplicate neighbouring objects by key, then
emit in order, so any proxy method which requires me to do an
index-lookup for an object on disk will die horribly in seek. So we are
sorting each block in a (now fairly appropriately-sized) byte array,
dumping that to file, then stream-merging files.

We still need another two orders of magnitude of performance, but at
least we're back to the profiler for a new hypothesis now.

I think to improve from here, we will need to (a) use EWMA to compute
appropriate buffer sizes per buffer, since throughput is not uniform,
and (b) use an interface to front a set of 4Mb-sized byte[] arrays
rather than using a single 1Gb array, so that (b1) we can exceed 2Gb,
and (b2) we can allocate and free at a finer granularity.

The garbage collector is no longer a significant participant in our
computation, but protobuf still has disappointingly bad mechanical
sympathy for stream processing, and the cost of  of protobuf
objects is currently an unavoidably large percentage of runtime.

S.

On 11/12/18 6:24 AM, Mindaugas Žakšauskas wrote:
 > Hi,
 >
 > Do you require the entire object to be loaded into memory in
order to
 > compare it with another object? Do these objects have IDs and
could be
 > accessed by IDs quickly after sorting? If so, you could derive a
 > lightweight proxy only containing few attributes of such object
and work
 > with those, reducing the amount of heap needed. After the
lightweights
 > are sorted, you would know the order number of each one, and in
turn,
 > its parent.
 >
 > If you can't extract a lightweight attribute subset, perhaps you can
 > come up with some sort of universal object score for each object and
 > work with that?
 >
 > m.
 >
 >
 > On Friday, 9 November 2018 15:08:23 UTC, Shevek wrote:
 >
 >     Hi,
 >
 >     I'm trying to sort/merge a very large number of objects in
Java, and
 >     failing more spectacularly than normal. The way I'm doing it
is this:
 >
 >     * Read a bunch of objects into an array.
 >     * Sort the array, then merge neighbouring objects as appropriate.
 >     * Re-fill the array, re-sort, re-merge until compaction is
"not very
 >     successful".
 >     * Dump the array to file, repeat for next array.
 >     * Then stream all files through a final merge/combine phase.
 >
 >     This is failing largely because I have no idea how large to
make the
 >     array. Estimating the ongoing size using something like JAMM
is too
 >     slow, and my hand-rolled memory estimator is too unreliable.
 >
 >     The thing that seems to be working best is messing around
with the
 >     array
 >     size in order to keep some concept of runtime.maxMemory() -
 >     runtime.totalMemory() + runtime.freeMemory() within a useful
bound.
 >
 >     But there must be a better solution. I can't quite think a
way around
 >     this with SoftReference because I need to dump the data to
disk when
 >     the
 >     reference gets broken, and defeating me right n

Re: Sorting a very large number of objects

2018-11-14 Thread Michael Meehan
Is the thing that's consuming your list a stream-consumer? Does the
consumer for the next operation need the whole sorted list, or just the
first element of the sorted list (the largest/smallest)? If both of those
are true, it's most important that you deliver the first element quickly,
and achieve reasonable throughput (not slower than the next consumer in the
stream can process) after. In that case, you don't want to take a lot of
time to generate a fully sorted list, you want to take a very small time to
find the first element to pass on!



On Mon, Nov 12, 2018 at 4:57 PM Shevek  wrote:

> We are doing sorting by proxy. Right now I have a byte[] serialized as:
>
> [sort-key0, data0, sort-key1, data1, ...]
>
> and a comparator which can compare either key-bytes or values-by-key. We
> then sort a separate integer array which points to each sort-key's
> offset in the underlying array, then we emit by walking the integer array.
>
> The challenge is to merge duplicate neighbouring objects by key, then
> emit in order, so any proxy method which requires me to do an
> index-lookup for an object on disk will die horribly in seek. So we are
> sorting each block in a (now fairly appropriately-sized) byte array,
> dumping that to file, then stream-merging files.
>
> We still need another two orders of magnitude of performance, but at
> least we're back to the profiler for a new hypothesis now.
>
> I think to improve from here, we will need to (a) use EWMA to compute
> appropriate buffer sizes per buffer, since throughput is not uniform,
> and (b) use an interface to front a set of 4Mb-sized byte[] arrays
> rather than using a single 1Gb array, so that (b1) we can exceed 2Gb,
> and (b2) we can allocate and free at a finer granularity.
>
> The garbage collector is no longer a significant participant in our
> computation, but protobuf still has disappointingly bad mechanical
> sympathy for stream processing, and the cost of  of protobuf
> objects is currently an unavoidably large percentage of runtime.
>
> S.
>
> On 11/12/18 6:24 AM, Mindaugas Žakšauskas wrote:
> > Hi,
> >
> > Do you require the entire object to be loaded into memory in order to
> > compare it with another object? Do these objects have IDs and could be
> > accessed by IDs quickly after sorting? If so, you could derive a
> > lightweight proxy only containing few attributes of such object and work
> > with those, reducing the amount of heap needed. After the lightweights
> > are sorted, you would know the order number of each one, and in turn,
> > its parent.
> >
> > If you can't extract a lightweight attribute subset, perhaps you can
> > come up with some sort of universal object score for each object and
> > work with that?
> >
> > m.
> >
> >
> > On Friday, 9 November 2018 15:08:23 UTC, Shevek wrote:
> >
> > Hi,
> >
> > I'm trying to sort/merge a very large number of objects in Java, and
> > failing more spectacularly than normal. The way I'm doing it is this:
> >
> > * Read a bunch of objects into an array.
> > * Sort the array, then merge neighbouring objects as appropriate.
> > * Re-fill the array, re-sort, re-merge until compaction is "not very
> > successful".
> > * Dump the array to file, repeat for next array.
> > * Then stream all files through a final merge/combine phase.
> >
> > This is failing largely because I have no idea how large to make the
> > array. Estimating the ongoing size using something like JAMM is too
> > slow, and my hand-rolled memory estimator is too unreliable.
> >
> > The thing that seems to be working best is messing around with the
> > array
> > size in order to keep some concept of runtime.maxMemory() -
> > runtime.totalMemory() + runtime.freeMemory() within a useful bound.
> >
> > But there must be a better solution. I can't quite think a way around
> > this with SoftReference because I need to dump the data to disk when
> > the
> > reference gets broken, and defeating me right now.
> >
> > Other alternatives would include keeping all my in-memory data
> > structures in serialized form, and paying the ser/deser cost to
> > compare,
> > but that's expensive - my main overhead right now is gc.
> Serialization
> > is protobuf, although that's changeable, since it's annoying the hell
> > out of me (please don't say thrift - but protobuf appears to have no
> > way
> > to read from a stream into a reusable object - it has to allocate the
> > world every single time).
> >
> > Issues:
> > * This routine is not the sole tenant of the JVM. Other things use
> RAM.
> > * This has to be deployed and work on systems whose memory config is
> > unknown to me.
> >
> > Can anybody please give me pointers?
> >
> > S.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "mechanical-sympathy" group.
> > To unsubscribe from this group and stop receiving emails

Re: Sorting a very large number of objects

2018-11-12 Thread Shevek

We are doing sorting by proxy. Right now I have a byte[] serialized as:

[sort-key0, data0, sort-key1, data1, ...]

and a comparator which can compare either key-bytes or values-by-key. We 
then sort a separate integer array which points to each sort-key's 
offset in the underlying array, then we emit by walking the integer array.


The challenge is to merge duplicate neighbouring objects by key, then 
emit in order, so any proxy method which requires me to do an 
index-lookup for an object on disk will die horribly in seek. So we are 
sorting each block in a (now fairly appropriately-sized) byte array, 
dumping that to file, then stream-merging files.


We still need another two orders of magnitude of performance, but at 
least we're back to the profiler for a new hypothesis now.


I think to improve from here, we will need to (a) use EWMA to compute 
appropriate buffer sizes per buffer, since throughput is not uniform, 
and (b) use an interface to front a set of 4Mb-sized byte[] arrays 
rather than using a single 1Gb array, so that (b1) we can exceed 2Gb, 
and (b2) we can allocate and free at a finer granularity.


The garbage collector is no longer a significant participant in our 
computation, but protobuf still has disappointingly bad mechanical 
sympathy for stream processing, and the cost of  of protobuf 
objects is currently an unavoidably large percentage of runtime.


S.

On 11/12/18 6:24 AM, Mindaugas Žakšauskas wrote:

Hi,

Do you require the entire object to be loaded into memory in order to 
compare it with another object? Do these objects have IDs and could be 
accessed by IDs quickly after sorting? If so, you could derive a 
lightweight proxy only containing few attributes of such object and work 
with those, reducing the amount of heap needed. After the lightweights 
are sorted, you would know the order number of each one, and in turn, 
its parent.


If you can't extract a lightweight attribute subset, perhaps you can 
come up with some sort of universal object score for each object and 
work with that?


m.


On Friday, 9 November 2018 15:08:23 UTC, Shevek wrote:

Hi,

I'm trying to sort/merge a very large number of objects in Java, and
failing more spectacularly than normal. The way I'm doing it is this:

* Read a bunch of objects into an array.
* Sort the array, then merge neighbouring objects as appropriate.
* Re-fill the array, re-sort, re-merge until compaction is "not very
successful".
* Dump the array to file, repeat for next array.
* Then stream all files through a final merge/combine phase.

This is failing largely because I have no idea how large to make the
array. Estimating the ongoing size using something like JAMM is too
slow, and my hand-rolled memory estimator is too unreliable.

The thing that seems to be working best is messing around with the
array
size in order to keep some concept of runtime.maxMemory() -
runtime.totalMemory() + runtime.freeMemory() within a useful bound.

But there must be a better solution. I can't quite think a way around
this with SoftReference because I need to dump the data to disk when
the
reference gets broken, and defeating me right now.

Other alternatives would include keeping all my in-memory data
structures in serialized form, and paying the ser/deser cost to
compare,
but that's expensive - my main overhead right now is gc. Serialization
is protobuf, although that's changeable, since it's annoying the hell
out of me (please don't say thrift - but protobuf appears to have no
way
to read from a stream into a reusable object - it has to allocate the
world every single time).

Issues:
* This routine is not the sole tenant of the JVM. Other things use RAM.
* This has to be deployed and work on systems whose memory config is
unknown to me.

Can anybody please give me pointers?

S.

--
You received this message because you are subscribed to the Google 
Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to mechanical-sympathy+unsubscr...@googlegroups.com 
.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting a very large number of objects

2018-11-12 Thread Mindaugas Žakšauskas
Hi,

Do you require the entire object to be loaded into memory in order to 
compare it with another object? Do these objects have IDs and could be 
accessed by IDs quickly after sorting? If so, you could derive a 
lightweight proxy only containing few attributes of such object and work 
with those, reducing the amount of heap needed. After the lightweights are 
sorted, you would know the order number of each one, and in turn, its 
parent.

If you can't extract a lightweight attribute subset, perhaps you can come 
up with some sort of universal object score for each object and work with 
that?

m.


On Friday, 9 November 2018 15:08:23 UTC, Shevek wrote:
>
> Hi, 
>
> I'm trying to sort/merge a very large number of objects in Java, and 
> failing more spectacularly than normal. The way I'm doing it is this: 
>
> * Read a bunch of objects into an array. 
> * Sort the array, then merge neighbouring objects as appropriate. 
> * Re-fill the array, re-sort, re-merge until compaction is "not very 
> successful". 
> * Dump the array to file, repeat for next array. 
> * Then stream all files through a final merge/combine phase. 
>
> This is failing largely because I have no idea how large to make the 
> array. Estimating the ongoing size using something like JAMM is too 
> slow, and my hand-rolled memory estimator is too unreliable. 
>
> The thing that seems to be working best is messing around with the array 
> size in order to keep some concept of runtime.maxMemory() - 
> runtime.totalMemory() + runtime.freeMemory() within a useful bound. 
>
> But there must be a better solution. I can't quite think a way around 
> this with SoftReference because I need to dump the data to disk when the 
> reference gets broken, and defeating me right now. 
>
> Other alternatives would include keeping all my in-memory data 
> structures in serialized form, and paying the ser/deser cost to compare, 
> but that's expensive - my main overhead right now is gc. Serialization 
> is protobuf, although that's changeable, since it's annoying the hell 
> out of me (please don't say thrift - but protobuf appears to have no way 
> to read from a stream into a reusable object - it has to allocate the 
> world every single time). 
>
> Issues: 
> * This routine is not the sole tenant of the JVM. Other things use RAM. 
> * This has to be deployed and work on systems whose memory config is 
> unknown to me. 
>
> Can anybody please give me pointers? 
>
> S. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting a very large number of objects

2018-11-11 Thread Gil Tene


On Sunday, November 11, 2018 at 9:53:45 PM UTC-8, Shevek wrote:
>
> I'm using the serialized approach with an ergonomics approach to heap 
> management. protobuf really sucks for a lot of reasons which are easily 
> remediable if one is a protobuf maintainer, but I'm not and I don't care 
> to be, and I don't have the time or inclination to write a new serializer. 
>
> However, and to Gil's point, following the ergonomics approach, when I 
> resize an array, I drop a garbage-collectable array which I no longer 
> "own", so from the PoV of the memory management code, it forms part of 
> the "other" part of the heap, i.e. I have to subtract it from the total 
> memory before computing the proportion of the heap I'm allowed to use. 
> Over time, given that GC isn't timely, I can lose all my heap to these 
> collectable arrays, and if I follow the rule about "Never call 
> System.gc()" this actually isn't a very efficient approach either.


The importance or impact of timeliness of GC is often misunderstood. In 
fact, from a GC cpu overhead perspective, the less timely it is, the better.

Once your code no longer holds any references to an array or object, you 
shouldn't count it as part of the live set, and shouldn't treat it as 
something that holds down any memory or that you are "losing all your heap 
to". From a GC efficiency (and GC cpu cost) perspective, if the array is no 
longer reachable, it is empty (non-live) heap regardless of whether or not 
GC collects it in a timely manner. The GC will get to recycling the memory 
it used to be in eventually, and the more of it there is (as a % of Xmx 
when GC actually triggers) the more efficient that GC will be. The thing to 
keep in mind when imagining/modeling this is that most GC algorithms work 
hard to avoid spending CPU cycles on dead stuff. It's the live (reachable) 
stuff that costs (and causes GC cpu spend). The larger the empty (dead when 
GC is triggered) parts of the heap are as a % of the overall Xmx, the less 
often that CPU cost (of dealing with the live stuff) needs to be paid.

When trying to evaluate the amount of empty heap available, in order to 
e.g. figure out how to size your temporary structures or caches, the things 
to estimate are not the "current" amount of heap use or "current" amount of 
empty heap. They are the live set (the sum of getUsed() calls on 
MemoryPoolMXBean.getCollectionUsage() 

 for 
the various pools in the collector) and the empty heap after collection 
(the sum of the differences between the getMax() and getUsed() on the 
same), which will best approximate what you need to consider. It is 
critical to look at the values that reflect use levels immediately after GC 
is performed by using values obtained from 
MemoryPoolMXBean.getCollectionUsage() 
*
 as 
opposed to* using MemoryPoolMXBean.getUsage() 
.
 The 
latter simply show up as random numbers that fall somewhere between the 
live set and Xmx, depending on when you measure them. They make for nice 
plots for tracking GC activity but are generally useless for estimating 
available empty heap when trying to make sizing choices.
 

>
>
> The next best approach is to allocate all my memory in 1Mb blocks, make 
> each memory "region" be backed by some list of these blocks, so I never 
> actually hand them to the GC, and just reduce my overall memory usage by 
> dropping blocks, rather than totally resizing byte arrays. 
>
> My main disbelief is that I'm inventing all of this from scratch. It HAS 
> to have been done before, right? All I want to do is sort a load of 
> objects...! 
>
> S. 
>
> The presumed solution to this 
>
> On 11/10/18 8:51 AM, Gil Tene wrote: 
> > 
> > 
> > On Friday, November 9, 2018 at 7:08:23 AM UTC-8, Shevek wrote: 
> > 
> > Hi, 
> > 
> > I'm trying to sort/merge a very large number of objects in Java, and 
> > failing more spectacularly than normal. The way I'm doing it is 
> this: 
> > 
> > * Read a bunch of objects into an array. 
> > * Sort the array, then merge neighbouring objects as appropriate. 
> > * Re-fill the array, re-sort, re-merge until compaction is "not very 
> > successful". 
> > * Dump the array to file, repeat for next array. 
> > * Then stream all files through a final merge/combine phase. 
> > 
> > This is failing largely because I have no idea how large to make the 
> > array. Estimating the ongoing size using something like JAMM is too 
> > slow, and my hand-rolled memory estimator is too unreliable. 
> > 
> > The thing that seems to be working best is messing around with the 
> > array 
> > size in order to keep some concept of runtime.maxMemory() - 

Re: Sorting a very large number of objects

2018-11-11 Thread Shevek
I'm using the serialized approach with an ergonomics approach to heap 
management. protobuf really sucks for a lot of reasons which are easily 
remediable if one is a protobuf maintainer, but I'm not and I don't care 
to be, and I don't have the time or inclination to write a new serializer.


However, and to Gil's point, following the ergonomics approach, when I 
resize an array, I drop a garbage-collectable array which I no longer 
"own", so from the PoV of the memory management code, it forms part of 
the "other" part of the heap, i.e. I have to subtract it from the total 
memory before computing the proportion of the heap I'm allowed to use. 
Over time, given that GC isn't timely, I can lose all my heap to these 
collectable arrays, and if I follow the rule about "Never call 
System.gc()" this actually isn't a very efficient approach either.


The next best approach is to allocate all my memory in 1Mb blocks, make 
each memory "region" be backed by some list of these blocks, so I never 
actually hand them to the GC, and just reduce my overall memory usage by 
dropping blocks, rather than totally resizing byte arrays.


My main disbelief is that I'm inventing all of this from scratch. It HAS 
to have been done before, right? All I want to do is sort a load of 
objects...!


S.

The presumed solution to this

On 11/10/18 8:51 AM, Gil Tene wrote:



On Friday, November 9, 2018 at 7:08:23 AM UTC-8, Shevek wrote:

Hi,

I'm trying to sort/merge a very large number of objects in Java, and
failing more spectacularly than normal. The way I'm doing it is this:

* Read a bunch of objects into an array.
* Sort the array, then merge neighbouring objects as appropriate.
* Re-fill the array, re-sort, re-merge until compaction is "not very
successful".
* Dump the array to file, repeat for next array.
* Then stream all files through a final merge/combine phase.

This is failing largely because I have no idea how large to make the
array. Estimating the ongoing size using something like JAMM is too
slow, and my hand-rolled memory estimator is too unreliable.

The thing that seems to be working best is messing around with the
array
size in order to keep some concept of runtime.maxMemory() -
runtime.totalMemory() + runtime.freeMemory() within a useful bound.

But there must be a better solution. I can't quite think a way around
this with SoftReference because I need to dump the data to disk when
the
reference gets broken, and defeating me right now.

Other alternatives would include keeping all my in-memory data
structures in serialized form, and paying the ser/deser cost to
compare,
but that's expensive - my main overhead right now is gc. Serialization
is protobuf, although that's changeable, since it's annoying the hell
out of me (please don't say thrift - but protobuf appears to have no
way
to read from a stream into a reusable object - it has to allocate the
world every single time).


In general, whenever I see "my overhead is gc" and "unknown memory size" 
together, I see it as a sign of someone pushing heap utilization high 
and getting into the inefficient GC state. Simplistically, you should be 
able to drop the GC cost to an arbitrary % of overall computation cost 
by increasing the amount (or relative portion) of empty heap in your 
set. So GC should never be "a bottleneck" from a throughput point of 
view unless you have constraints (such as a minimum required live set 
and a maximum possible heap size) that force you towards a high 
utilization of the heap (in terms of LiveSet/HeapSize). The answer to 
such a situation is generally "get some more RAM for this problem" 
rather than put in tons of work to fit this in".


For most GC algorithms, or at least for the more costly parts of such 
algorithms, GC efficiency is roughly linear to EmptyHeap/LiveSet. Stated 
otherwise, GC cost grows with LiveSet/EmptyHeap or 
LiveSet/(HeapSize-LiveSet). As you grow the amount you try to cram into 
a heap of a given size, you increase the GC cost to the square of your 
cramming efforts. And for every doubling of the empty heap [for a given 
live set] you will generally half the GC cost.


This should [hopefully] make it obvious why using SoftReferences is a 
generally terrible idea.



Issues:
* This routine is not the sole tenant of the JVM. Other things use RAM.

You can try to establish what an "efficient enough" heap utilization 
level is for your use case (a level that keeps overall GC work as a % of 
CPU spend to e.g. below 10%), and keep your heap use to a related 
fraction of whatever heap size you get to have on the system you land on.


* This has to be deployed and work on systems whose memory config is
unknown to me.

Can anybody please give me pointers?

S.

--
You received this message because you are subscribed to the Google 
Groups "mechanical-sympathy" group.
To unsubscr

Re: Sorting a very large number of objects

2018-11-10 Thread Gil Tene


On Friday, November 9, 2018 at 7:08:23 AM UTC-8, Shevek wrote:
>
> Hi, 
>
> I'm trying to sort/merge a very large number of objects in Java, and 
> failing more spectacularly than normal. The way I'm doing it is this: 
>
> * Read a bunch of objects into an array. 
> * Sort the array, then merge neighbouring objects as appropriate. 
> * Re-fill the array, re-sort, re-merge until compaction is "not very 
> successful". 
> * Dump the array to file, repeat for next array. 
> * Then stream all files through a final merge/combine phase. 
>
> This is failing largely because I have no idea how large to make the 
> array. Estimating the ongoing size using something like JAMM is too 
> slow, and my hand-rolled memory estimator is too unreliable. 
>
> The thing that seems to be working best is messing around with the array 
> size in order to keep some concept of runtime.maxMemory() - 
> runtime.totalMemory() + runtime.freeMemory() within a useful bound. 
>
> But there must be a better solution. I can't quite think a way around 
> this with SoftReference because I need to dump the data to disk when the 
> reference gets broken, and defeating me right now. 
>
> Other alternatives would include keeping all my in-memory data 
> structures in serialized form, and paying the ser/deser cost to compare, 
> but that's expensive - my main overhead right now is gc. Serialization 
> is protobuf, although that's changeable, since it's annoying the hell 
> out of me (please don't say thrift - but protobuf appears to have no way 
> to read from a stream into a reusable object - it has to allocate the 
> world every single time). 
>

In general, whenever I see "my overhead is gc" and "unknown memory size" 
together, I see it as a sign of someone pushing heap utilization high and 
getting into the inefficient GC state. Simplistically, you should be able 
to drop the GC cost to an arbitrary % of overall computation cost by 
increasing the amount (or relative portion) of empty heap in your set. So 
GC should never be "a bottleneck" from a throughput point of view unless 
you have constraints (such as a minimum required live set and a maximum 
possible heap size) that force you towards a high utilization of the heap 
(in terms of LiveSet/HeapSize). The answer to such a situation is generally 
"get some more RAM for this problem" rather than put in tons of work to fit 
this in". 

For most GC algorithms, or at least for the more costly parts of such 
algorithms, GC efficiency is roughly linear to EmptyHeap/LiveSet. Stated 
otherwise, GC cost grows with LiveSet/EmptyHeap or 
LiveSet/(HeapSize-LiveSet). As you grow the amount you try to cram into a 
heap of a given size, you increase the GC cost to the square of your 
cramming efforts. And for every doubling of the empty heap [for a given 
live set] you will generally half the GC cost.

This should [hopefully] make it obvious why using SoftReferences is a 
generally terrible idea.
 

>
> Issues: 
> * This routine is not the sole tenant of the JVM. Other things use RAM. 
>
 
You can try to establish what an "efficient enough" heap utilization level 
is for your use case (a level that keeps overall GC work as a % of CPU 
spend to e.g. below 10%), and keep your heap use to a related fraction of 
whatever heap size you get to have on the system you land on.
 

> * This has to be deployed and work on systems whose memory config is 
> unknown to me. 
>
> Can anybody please give me pointers? 
>
> S. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting a very large number of objects

2018-11-09 Thread Alex Kashchenko

Hi,

On 11/09/2018 08:27 AM, Shevek wrote:

Hi,

I'm trying to sort/merge a very large number of objects in Java, and 
failing more spectacularly than normal. The way I'm doing it is this:


* Read a bunch of objects into an array.
* Sort the array, then merge neighbouring objects as appropriate.
* Re-fill the array, re-sort, re-merge until compaction is "not very 
successful".

* Dump the array to file, repeat for next array.
* Then stream all files through a final merge/combine phase.

This is failing largely because I have no idea how large to make the 
array. Estimating the ongoing size using something like JAMM is too 
slow, and my hand-rolled memory estimator is too unreliable.


The thing that seems to be working best is messing around with the array 
size in order to keep some concept of runtime.maxMemory() - 
runtime.totalMemory() + runtime.freeMemory() within a useful bound.


But there must be a better solution. I can't quite think a way around 
this with SoftReference because I need to dump the data to disk when the 
reference gets broken, and defeating me right now.


Other alternatives would include keeping all my in-memory data 
structures in serialized form, and paying the ser/deser cost to compare, 
but that's expensive - my main overhead right now is gc. Serialization 
is protobuf, although that's changeable, since it's annoying the hell 
out of me (please don't say thrift - but protobuf appears to have no way 
to read from a stream into a reusable object - it has to allocate the 
world every single time).


Issues:
* This routine is not the sole tenant of the JVM. Other things use RAM.
* This has to be deployed and work on systems whose memory config is 
unknown to me.


Can anybody please give me pointers?


It may help moving all data processing off-heap. With unsafe-tools [1] 
you can load the data into OffHeapStructArrayList and then sort it with 
OffHeapStructSorter (DualPivotQuicksort) and do a binary search over 
sorted array if needed.


As mentioned in a neighbour comment, you also need a serialization 
format that allows accessing the data without deserialising (some kind 
of packed structs).


On GC pressure: it may be better to avoid heap allocation in all 
processing routines that are called for every array element.



[1] 
https://groups.google.com/d/msg/mechanical-sympathy/_miYYok2Rrg/vRVerQqFCgAJ


--
-Alex

--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sorting a very large number of objects

2018-11-09 Thread Georges Gomes
Hi

That's a nice problem :)

I would probably use a serialised form that allow accessing the data
without deserialising. Like SBE. This way there is no extra allocation,
just the byte stream.

Then I would maybe try to divide and conquer: shard objecs in buckets.
Example In case of strings, a bucket for starting with A, a bucket for
starting with B, etc...
Then you can sort inside the buckets (in parallel?) And finish by
concatenate the buckets.

My 2 cents
Good luck
Georges



On Fri, Nov 9, 2018, 16:08 Shevek  Hi,
>
> I'm trying to sort/merge a very large number of objects in Java, and
> failing more spectacularly than normal. The way I'm doing it is this:
>
> * Read a bunch of objects into an array.
> * Sort the array, then merge neighbouring objects as appropriate.
> * Re-fill the array, re-sort, re-merge until compaction is "not very
> successful".
> * Dump the array to file, repeat for next array.
> * Then stream all files through a final merge/combine phase.
>
> This is failing largely because I have no idea how large to make the
> array. Estimating the ongoing size using something like JAMM is too
> slow, and my hand-rolled memory estimator is too unreliable.
>
> The thing that seems to be working best is messing around with the array
> size in order to keep some concept of runtime.maxMemory() -
> runtime.totalMemory() + runtime.freeMemory() within a useful bound.
>
> But there must be a better solution. I can't quite think a way around
> this with SoftReference because I need to dump the data to disk when the
> reference gets broken, and defeating me right now.
>
> Other alternatives would include keeping all my in-memory data
> structures in serialized form, and paying the ser/deser cost to compare,
> but that's expensive - my main overhead right now is gc. Serialization
> is protobuf, although that's changeable, since it's annoying the hell
> out of me (please don't say thrift - but protobuf appears to have no way
> to read from a stream into a reusable object - it has to allocate the
> world every single time).
>
> Issues:
> * This routine is not the sole tenant of the JVM. Other things use RAM.
> * This has to be deployed and work on systems whose memory config is
> unknown to me.
>
> Can anybody please give me pointers?
>
> S.
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Sorting a very large number of objects

2018-11-09 Thread Shevek

Hi,

I'm trying to sort/merge a very large number of objects in Java, and 
failing more spectacularly than normal. The way I'm doing it is this:


* Read a bunch of objects into an array.
* Sort the array, then merge neighbouring objects as appropriate.
* Re-fill the array, re-sort, re-merge until compaction is "not very 
successful".

* Dump the array to file, repeat for next array.
* Then stream all files through a final merge/combine phase.

This is failing largely because I have no idea how large to make the 
array. Estimating the ongoing size using something like JAMM is too 
slow, and my hand-rolled memory estimator is too unreliable.


The thing that seems to be working best is messing around with the array 
size in order to keep some concept of runtime.maxMemory() - 
runtime.totalMemory() + runtime.freeMemory() within a useful bound.


But there must be a better solution. I can't quite think a way around 
this with SoftReference because I need to dump the data to disk when the 
reference gets broken, and defeating me right now.


Other alternatives would include keeping all my in-memory data 
structures in serialized form, and paying the ser/deser cost to compare, 
but that's expensive - my main overhead right now is gc. Serialization 
is protobuf, although that's changeable, since it's annoying the hell 
out of me (please don't say thrift - but protobuf appears to have no way 
to read from a stream into a reusable object - it has to allocate the 
world every single time).


Issues:
* This routine is not the sole tenant of the JVM. Other things use RAM.
* This has to be deployed and work on systems whose memory config is 
unknown to me.


Can anybody please give me pointers?

S.

--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.