[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804072#action_12804072
 ] 

Paul Elschot edited comment on LUCENE-1990 at 1/23/10 11:50 AM:
----------------------------------------------------------------

As to whether to use int or long in the interface unsigned packed int, the only 
numbers that will probably need to be long in the foreseeable future are 
docids. However this change can be delayed by not allowing an index segment to 
grow beyond 2^32 or 2^31-1docs, and by only implementing the long docids for 
multiple index segments.
So as long as it is ok to assume that an index segment can have MAXINT docs at 
most, we could use an int interface here.
Do Nutch and/or Solr already have long docids implemented on multiple index 
readers/writers or segments?

The other border is the max size of a document field. If that goes beyond 
MAXINT, the positions and maybe even the frequencies would need to be changed 
from int to long. But for now I can't think of a real use case with a document 
field that has more than MAXINT positions. That would be like a book with ten 
million pages of text. Did anyone ever run into this limitation?


      was (Author: paul.elsc...@xs4all.nl):
    As to whether to use int or long in the interface unsigned packed int, the 
only numbers that will probably need to be long in the foreseeable future are 
docids. However this change can be delayed by not allowing an index segment to 
grow beyond 2**32 or 2**31-1docs, and by only implementing the long docids for 
multiple index segments.
So as long as it is ok to assume that an index segment can have MAXINT docs at 
most, we could use an int interface here.
Do Nutch and/or Solr already have long docids implemented on multiple index 
readers/writers or segments?

The other border is the max size of a document field. If that goes beyond 
MAXINT, the positions and maybe even the frequencies would need to be changed 
from int to long. But for now I can't think of a real use case with a document 
field that has more than MAXINT positions. That would be like a book with ten 
million pages of text. Did anyone ever run into this limitation?

  
> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990-te20100122.patch, LUCENE-1990.patch, 
> LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to