dweiss commented on a change in pull request #460:
URL: https://github.com/apache/lucene/pull/460#discussion_r754613632
##########
File path: lucene/core/src/java/org/apache/lucene/util/fst/FST.java
##########
@@ -1000,6 +1027,98 @@ private void writePresenceBits(
assert bytePos - dest == numPresenceBytes;
}
+ private long estimateNodeAddress(
Review comment:
Thanks. I got the intuition right but some parts of this code were
written while I was... away (including byte reversals during serialization),
hence the uncertainty. I keep wondering if there is any other way to get those
deltas... or make the deltas refer to a different placeholder. I recall tricks
like this done back in assembly days on Amigas; basically you had a two-level
data structure - a stream of bytes + known-size "placeholders" for compacting.
data stream: byte1 byte2 byte3 ... byteN byteN+1 ...
address placeholder: (data stream@N), (data stream@M), ...
Each placeholder is a fixed-size offset - the compacting routine receives
the full data stream + placeholders so it has the ability to compute deltas
(from left to right or from right to left) and shift-compact the data stream
without knowing anything about the other data bytes. This way you sort of
postpone the delta-offset computation until you know all of the data and the
upper bound for its size, then reduce.
I'm sure you gave it some thought too - it's just what popped in my head
immediately when I saw your code. The requirement for those two extra methods
on outputs + the need to measure each node (I'd call it explicitly -
computeNodeAddress) is a bit worrying... but then - if there is no visible
slowdown then perhaps I'm just overreacting...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]