On Saturday, 5 July 2014 at 16:03:17 UTC, Dmitry Olshansky wrote:
There are trade-offs. The world is not black and white and I don't follow 'one rule everywhere'.

This is not a trade-off at all. You suggested to keep database records linearly, with space gaps between records to support "tiny updates". Without serious updates, this is major waste of space. With them, your design won't save the day, because any gap will be eventually consumed and without fragmentation/reordering, the storage will fail.

FYI, nowaday popular databases aren't designed this way, exactly for the reasons I described above. All databases I worked with (MySQL, Oracle and Firebird) hold records in very compact way, without a single byte gaps, with ability of fragmentation, and without total physical ordering. So at least designers of these DBMS have agreed that your design is not practical.

Pointers are perfectly fine as long as there is no pointer arithmetic.

Wrong. Merely holding a pointer (i.e. a physical address) is unsafe already. Non-deep serialization, or any other "preservation" of such a struct and GC is unable to keep the track of pointers. GC moves the object around or deletes it, and you have a tiny black hole in your app.

Which is hopelessly admitting defeat. A pair may have non-trivial comparator. And a minor step beyond that such as a pair of double and integer and it fails completely.

I said above: in any non-trivial case, use classes instead of overly-clever structures. And if you really, really need premature optimization, there is java.nio and buffers. Create ByteBuffer (direct one if you need super optimized solution) and treat slices of it as "structs". That's possible and easy to implement, but really not needed in practice because all you get is 0.1% of memory saving and no gain in speed.

And there Java-style indirection-happy classes "shine". For instance modeling any complex stuff with classes alone would lead to things like:
House--reference->Bedroom---reference-->Bed--reference-->Pillow

Which is incredible waste of memory and speed on something so simple.

That's not complex. That's very simple. It would become slightly more complex if all house parts implemented the same root interface and basic operations, like examining what's around and finding an object given by its path or properties, or throwing an extra pillow or two onto the same bed. All that is just a dream for structs.

Copy constructors or in D parlance postblits. There is also CoW and whatnot. In fact swap doesn't create a copy in D, it just does a bitwise swap in-place.

And here we have a tough choice for structs with pulled subdata "for efficiency": either assignment operator copies that data too (making routines like sort to perform horribly slow), or they perform only shallow copy, causing undue sharing of data and violating the whole struct "value" philosophy.

Disagree - the moment a cache line is loaded it doesn't matter how much you read in this line. Even a tiny read missing a cache is paid in full.

But the amount of these "missed reads" is low, so less amount of cache is invalidated. CPU cache is not a single page that gets invalidated as a whole, it's more like many small subpages, each of them is treaten individually.

If you're really into the low-level mumbo jumbo, here are two more aspects for you to consider.

1. Since indirection does not require the object contents to be moved around while sorting, these objects can be read into cache once and never invalidated later - unlike "right in place" structs which invalidate CPU cache each time a swap is performed. That is a huge cache win already.

2. Don't forget that new objects are created in eden space in Java - which is essentially the stack area. Very fast, compact, sequential memory. If your array fits in eden (and that's surely true for the forest problem this thread has started from), then allocating array of objects is essentially the same as allocating array of structs: object contents aren't "scattered in memory" but are located one-after-one without gaps between them. That greatly aids caching as well. The main difference between eden and "array of structs" is that the allocation of the former never fails (assuming there's enough memory), only gets slightly slower in the worst case, and allocation of the latter can fail because "stack overflow error" or "too much memory fragmentation oops", even if there's enough free memory.

And the only thing worse then copying a large file is copying a large file fragmented in tiny splinters.

You surely meant "reading" here, not "copying". Copying large fragmented file is as slow as copying large unfragmented file, because writing operations are the bottleneck here.

Reply via email to