Hello!

A few quick answers:

In theory, an FST node can have inCount > 1 (when it is in the "suffix"
part of the FST), but it sounds like maybe you found an exciting bug if
indeed FST Builder fails to ever increment it.  E.g. try building some FSTs
here: http://examples.mikemccandless.com/fst.py the default FST there
already shows 2 nodes with inCount = 2.

Note that the pack method has been deleted in newer Lucene versions,
because it was sizable complexity, not very well tested and not so big
gains in practice.

I'm not sure about question 2)

Question 3) does sound like a bug, because as the code now stands it means
we cannot reference an address below nodeRefToAddress.size() which seems
like it might be problematic.

These might be real bugs!  I would strongly suggest ensuring pack() really
is working for your use case before trusting it!

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 7, 2021 at 10:14 PM guohuawu227 <[email protected]> wrote:

> I am a software developer from China. Recently I am reading the source of
> lucene of 6.6.0. I can not understand some code of FST. List them below.
>
> 1 . There is a field called 'inCounts' in the class of FST. But in my
> opinion every node of FST has only one arc pointing to it. If a node is not
> shared, it is certain it has only one arc pointing to it since it is like a
> tree. If a node is shared,it is returned directly from NodeHash so the
> number of it's coming in arc is not updated. So I just can't understand
> this attribute though it is used is 'pach' method.
>
> 2. Is about the method of 'pach' of class FST. This line 'long delta =
> newNodeAddress.get((int) arc.target) + addressError - writer.getPosition()
> - 2;' . I know it is comparing which one is smaller but I just can't
> understand why 2 is minused.
>
> 3. If a FST is packed,then there are 3 ways to save the target of an arc,
> first is the difference of the target and this arc ,second is the ordinal
> number from a hashmap if the target is among those having top N incoming
> arcs, third is absolute position of the node. But in the method
> 'readNextRealArc' , I can not understand the process of the absolute
> position while reading. Below is the code.
>
> if (packed) {
> final long pos = in.getPosition();
> final long code = in.readVLong();
> if (arc.flag(BIT_TARGET_DELTA)) {
> // record the difference
> } else if (code < nodeRefToAddress.size()) {
> // record the ordinal number from a hashmap
> } else {// Absolute position
> arc.target = code;//I can not understa this.In opinion,
> nodeRefToAddress.size() should be minused because it is summed into the
> absolute value in 'pack' method.
> }
> } else {
>
> -----------------------
> Please pardon me if I use some wrong words ,my English is not good.Thanks
> for reading the email. Looking forward to the reply.
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]

Reply via email to