Re: Sorting a very large number of objects

2018-11-11 Thread Gil Tene


On Sunday, November 11, 2018 at 9:53:45 PM UTC-8, Shevek wrote:
>
> I'm using the serialized approach with an ergonomics approach to heap 
> management. protobuf really sucks for a lot of reasons which are easily 
> remediable if one is a protobuf maintainer, but I'm not and I don't care 
> to be, and I don't have the time or inclination to write a new serializer. 
>
> However, and to Gil's point, following the ergonomics approach, when I 
> resize an array, I drop a garbage-collectable array which I no longer 
> "own", so from the PoV of the memory management code, it forms part of 
> the "other" part of the heap, i.e. I have to subtract it from the total 
> memory before computing the proportion of the heap I'm allowed to use. 
> Over time, given that GC isn't timely, I can lose all my heap to these 
> collectable arrays, and if I follow the rule about "Never call 
> System.gc()" this actually isn't a very efficient approach either.


The importance or impact of timeliness of GC is often misunderstood. In 
fact, from a GC cpu overhead perspective, the less timely it is, the better.

Once your code no longer holds any references to an array or object, you 
shouldn't count it as part of the live set, and shouldn't treat it as 
something that holds down any memory or that you are "losing all your heap 
to". From a GC efficiency (and GC cpu cost) perspective, if the array is no 
longer reachable, it is empty (non-live) heap regardless of whether or not 
GC collects it in a timely manner. The GC will get to recycling the memory 
it used to be in eventually, and the more of it there is (as a % of Xmx 
when GC actually triggers) the more efficient that GC will be. The thing to 
keep in mind when imagining/modeling this is that most GC algorithms work 
hard to avoid spending CPU cycles on dead stuff. It's the live (reachable) 
stuff that costs (and causes GC cpu spend). The larger the empty (dead when 
GC is triggered) parts of the heap are as a % of the overall Xmx, the less 
often that CPU cost (of dealing with the live stuff) needs to be paid.

When trying to evaluate the amount of empty heap available, in order to 
e.g. figure out how to size your temporary structures or caches, the things 
to estimate are not the "current" amount of heap use or "current" amount of 
empty heap. They are the live set (the sum of getUsed() calls on 
MemoryPoolMXBean.getCollectionUsage() 

 for 
the various pools in the collector) and the empty heap after collection 
(the sum of the differences between the getMax() and getUsed() on the 
same), which will best approximate what you need to consider. It is 
critical to look at the values that reflect use levels immediately after GC 
is performed by using values obtained from 
MemoryPoolMXBean.getCollectionUsage() 
*
 as 
opposed to* using MemoryPoolMXBean.getUsage() 
.
 The 
latter simply show up as random numbers that fall somewhere between the 
live set and Xmx, depending on when you measure them. They make for nice 
plots for tracking GC activity but are generally useless for estimating 
available empty heap when trying to make sizing choices.
 

>
>
> The next best approach is to allocate all my memory in 1Mb blocks, make 
> each memory "region" be backed by some list of these blocks, so I never 
> actually hand them to the GC, and just reduce my overall memory usage by 
> dropping blocks, rather than totally resizing byte arrays. 
>
> My main disbelief is that I'm inventing all of this from scratch. It HAS 
> to have been done before, right? All I want to do is sort a load of 
> objects...! 
>
> S. 
>
> The presumed solution to this 
>
> On 11/10/18 8:51 AM, Gil Tene wrote: 
> > 
> > 
> > On Friday, November 9, 2018 at 7:08:23 AM UTC-8, Shevek wrote: 
> > 
> > Hi, 
> > 
> > I'm trying to sort/merge a very large number of objects in Java, and 
> > failing more spectacularly than normal. The way I'm doing it is 
> this: 
> > 
> > * Read a bunch of objects into an array. 
> > * Sort the array, then merge neighbouring objects as appropriate. 
> > * Re-fill the array, re-sort, re-merge until compaction is "not very 
> > successful". 
> > * Dump the array to file, repeat for next array. 
> > * Then stream all files through a final merge/combine phase. 
> > 
> > This is failing largely because I have no idea how large to make the 
> > array. Estimating the ongoing size using something like JAMM is too 
> > slow, and my hand-rolled memory estimator is too unreliable. 
> > 
> > The thing that seems to be working best is messing around with the 
> > array 
> > size in order to keep some concept of runtime.maxMemory() - 

Re: Sorting a very large number of objects

2018-11-11 Thread Shevek
I'm using the serialized approach with an ergonomics approach to heap 
management. protobuf really sucks for a lot of reasons which are easily 
remediable if one is a protobuf maintainer, but I'm not and I don't care 
to be, and I don't have the time or inclination to write a new serializer.


However, and to Gil's point, following the ergonomics approach, when I 
resize an array, I drop a garbage-collectable array which I no longer 
"own", so from the PoV of the memory management code, it forms part of 
the "other" part of the heap, i.e. I have to subtract it from the total 
memory before computing the proportion of the heap I'm allowed to use. 
Over time, given that GC isn't timely, I can lose all my heap to these 
collectable arrays, and if I follow the rule about "Never call 
System.gc()" this actually isn't a very efficient approach either.


The next best approach is to allocate all my memory in 1Mb blocks, make 
each memory "region" be backed by some list of these blocks, so I never 
actually hand them to the GC, and just reduce my overall memory usage by 
dropping blocks, rather than totally resizing byte arrays.


My main disbelief is that I'm inventing all of this from scratch. It HAS 
to have been done before, right? All I want to do is sort a load of 
objects...!


S.

The presumed solution to this

On 11/10/18 8:51 AM, Gil Tene wrote:



On Friday, November 9, 2018 at 7:08:23 AM UTC-8, Shevek wrote:

Hi,

I'm trying to sort/merge a very large number of objects in Java, and
failing more spectacularly than normal. The way I'm doing it is this:

* Read a bunch of objects into an array.
* Sort the array, then merge neighbouring objects as appropriate.
* Re-fill the array, re-sort, re-merge until compaction is "not very
successful".
* Dump the array to file, repeat for next array.
* Then stream all files through a final merge/combine phase.

This is failing largely because I have no idea how large to make the
array. Estimating the ongoing size using something like JAMM is too
slow, and my hand-rolled memory estimator is too unreliable.

The thing that seems to be working best is messing around with the
array
size in order to keep some concept of runtime.maxMemory() -
runtime.totalMemory() + runtime.freeMemory() within a useful bound.

But there must be a better solution. I can't quite think a way around
this with SoftReference because I need to dump the data to disk when
the
reference gets broken, and defeating me right now.

Other alternatives would include keeping all my in-memory data
structures in serialized form, and paying the ser/deser cost to
compare,
but that's expensive - my main overhead right now is gc. Serialization
is protobuf, although that's changeable, since it's annoying the hell
out of me (please don't say thrift - but protobuf appears to have no
way
to read from a stream into a reusable object - it has to allocate the
world every single time).


In general, whenever I see "my overhead is gc" and "unknown memory size" 
together, I see it as a sign of someone pushing heap utilization high 
and getting into the inefficient GC state. Simplistically, you should be 
able to drop the GC cost to an arbitrary % of overall computation cost 
by increasing the amount (or relative portion) of empty heap in your 
set. So GC should never be "a bottleneck" from a throughput point of 
view unless you have constraints (such as a minimum required live set 
and a maximum possible heap size) that force you towards a high 
utilization of the heap (in terms of LiveSet/HeapSize). The answer to 
such a situation is generally "get some more RAM for this problem" 
rather than put in tons of work to fit this in".


For most GC algorithms, or at least for the more costly parts of such 
algorithms, GC efficiency is roughly linear to EmptyHeap/LiveSet. Stated 
otherwise, GC cost grows with LiveSet/EmptyHeap or 
LiveSet/(HeapSize-LiveSet). As you grow the amount you try to cram into 
a heap of a given size, you increase the GC cost to the square of your 
cramming efforts. And for every doubling of the empty heap [for a given 
live set] you will generally half the GC cost.


This should [hopefully] make it obvious why using SoftReferences is a 
generally terrible idea.



Issues:
* This routine is not the sole tenant of the JVM. Other things use RAM.

You can try to establish what an "efficient enough" heap utilization 
level is for your use case (a level that keeps overall GC work as a % of 
CPU spend to e.g. below 10%), and keep your heap use to a related 
fraction of whatever heap size you get to have on the system you land on.


* This has to be deployed and work on systems whose memory config is
unknown to me.

Can anybody please give me pointers?

S.

--
You received this message because you are subscribed to the Google 
Groups "mechanical-sympathy" group.
To unsubscr