Re: Optimizing Java using D

Wanderer via Digitalmars-d Sat, 05 Jul 2014 20:23:16 -0700

On Saturday, 5 July 2014 at 16:03:17 UTC, Dmitry Olshansky wrote:

There are trade-offs. The world is not black and white and Idon't follow 'one rule everywhere'.

This is not a trade-off at all. You suggested to keep databaserecords linearly, with space gaps between records to support"tiny updates". Without serious updates, this is major waste ofspace. With them, your design won't save the day, because any gapwill be eventually consumed and without fragmentation/reordering,the storage will fail.

FYI, nowaday popular databases aren't designed this way, exactlyfor the reasons I described above. All databases I worked with(MySQL, Oracle and Firebird) hold records in very compact way,without a single byte gaps, with ability of fragmentation, andwithout total physical ordering. So at least designers of theseDBMS have agreed that your design is not practical.

Pointers are perfectly fine as long as there is no pointerarithmetic.

Wrong. Merely holding a pointer (i.e. a physical address) isunsafe already. Non-deep serialization, or any other"preservation" of such a struct and GC is unable to keep thetrack of pointers. GC moves the object around or deletes it, andyou have a tiny black hole in your app.

Which is hopelessly admitting defeat. A pair may havenon-trivial comparator. And a minor step beyond that such as apair of double and integer and it fails completely.

I said above: in any non-trivial case, use classes instead ofoverly-clever structures. And if you really, really needpremature optimization, there is java.nio and buffers. CreateByteBuffer (direct one if you need super optimized solution) andtreat slices of it as "structs". That's possible and easy toimplement, but really not needed in practice because all you getis 0.1% of memory saving and no gain in speed.

And there Java-style indirection-happy classes "shine". Forinstance modeling any complex stuff with classes alone wouldlead to things like:
House--reference->Bedroom---reference-->Bed--reference-->Pillow
Which is incredible waste of memory and speed on something sosimple.

That's not complex. That's very simple. It would become slightlymore complex if all house parts implemented the same rootinterface and basic operations, like examining what's around andfinding an object given by its path or properties, or throwing anextra pillow or two onto the same bed. All that is just a dreamfor structs.

Copy constructors or in D parlance postblits. There is also CoWand whatnot. In fact swap doesn't create a copy in D, it justdoes a bitwise swap in-place.

And here we have a tough choice for structs with pulled subdata"for efficiency": either assignment operator copies that data too(making routines like sort to perform horribly slow), or theyperform only shallow copy, causing undue sharing of data andviolating the whole struct "value" philosophy.

Disagree - the moment a cache line is loaded it doesn't matterhow much you read in this line. Even a tiny read missing acache is paid in full.

But the amount of these "missed reads" is low, so less amount ofcache is invalidated. CPU cache is not a single page that getsinvalidated as a whole, it's more like many small subpages, eachof them is treaten individually.

If you're really into the low-level mumbo jumbo, here are twomore aspects for you to consider.

1. Since indirection does not require the object contents to bemoved around while sorting, these objects can be read into cacheonce and never invalidated later - unlike "right in place"structs which invalidate CPU cache each time a swap is performed.That is a huge cache win already.

2. Don't forget that new objects are created in eden space inJava - which is essentially the stack area. Very fast, compact,sequential memory. If your array fits in eden (and that's surelytrue for the forest problem this thread has started from), thenallocating array of objects is essentially the same as allocatingarray of structs: object contents aren't "scattered in memory"but are located one-after-one without gaps between them. Thatgreatly aids caching as well. The main difference between edenand "array of structs" is that the allocation of the former neverfails (assuming there's enough memory), only gets slightly slowerin the worst case, and allocation of the latter can fail because"stack overflow error" or "too much memory fragmentation oops",even if there's enough free memory.

And the only thing worse then copying a large file is copying alarge file fragmented in tiny splinters.

You surely meant "reading" here, not "copying". Copying largefragmented file is as slow as copying large unfragmented file,because writing operations are the bottleneck here.

Re: Optimizing Java using D

Reply via email to