[ 
https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293100#comment-13293100
 ] 

Adrien Grand commented on LUCENE-4120:
--------------------------------------

bq. It seems sort of odd to have the new .save method on ReaderImpl... can it 
be on Mutable/Impl instead, or, maybe FST does its own saving or something?

My first intent was to add this method to {{Mutable}}. The problem is that 
{{nodeRefToAddress}} needs to be a reader since it may be instantiated through 
{{PackedInts.getReader}}, but it also might need to be serialized because of 
the {{save}} method. This is why I added this method to {{Reader}}. I can 
switch this method to {{Mutable}} but this means that it won't be possible to 
{{save}} a {{FST}} read from disk anymore (maybe not a problem?). Another 
solution could be to move the serialization logic to {{FST}} but this would 
require to expose some internals of the packed integer arrays to select the 
right format ({{PACKED}} or {{PACKED_SINGLE_BLOCK}} depending on whether the 
reader/mutable is an instance of {{Packed64SingleBLock}}) but I would really 
like to avoid this as long as possible.

bq. In all the places we now pass random.nextFloat() for 
acceptableOverheadRatio (to FST.pack or MemoryPostingsFormat), shouldn't it be 
COMPACT .. FASTEST instead of 0.0 .. 1.0?

0..1 gives more chances to different implementations to be selected. 
{{FASTEST=7}} is only useful for {{bitsPerValue=1}} so that a {{Direct8}} is 
instantiated. If we used an uniformly distributed float between {{COMPACT=0}} 
and {{FASTEST=7}}, a {{Direct*}} implementation would be used more than 6/7 of 
the time when {{bitsPerValue>=4}}. For example, if {{bitsPerValue=15}}, a 
{{Direct16}} will be instantiated if {{acceptableOverheadRatio>=1/15=0.07}} and 
a {{Packed64}} otherwise. A lower upper bound for {{acceptableOverheadRatio}} 
makes the latter case more likely.

bq. [kuromoji], [getWriterByFormat], [javadocs]

Agreed, working on it.


                
> FST should use packed integer arrays
> ------------------------------------
>
>                 Key: LUCENE-4120
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4120
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-4120.patch
>
>
> There are some places where an int[] could be advantageously replaced with a 
> packed integer array.
> I am thinking (at least) of:
>  * FST.nodeAddress (GrowableWriter)
>  * FST.inCounts (GrowableWriter)
>  * FST.nodeRefToAddress (read-only Reader)
> The serialization/deserialization methods should be modified too in order to 
> take advantage of PackedInts.get{Reader,Writer}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to