[ 
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toke Eskildsen updated LUCENE-1990:
-----------------------------------

    Attachment: LUCENE-1990-te20100212.patch

I've read through the comments on LUCENE-1990 and implemented most of what has 
been suggested. The attached patch contains implementations for all the 
variants we've talked about, including aligned. There's a known bug in 
persistence for aligned64 (and probably also for aligned32) that I haven't 
stomped yet. There's also a clear need for a more elaborate unit-test with 
regard to persistence.

Other outstanding issues, as I see them, are whether or not mutable packed 
arrays should be requestable (as general purpose data structures) and how the 
factory for creating a writer should work. I have added a getMutable-method to 
the factory and not touched the return type Reader for the getReader-method. 
That way read-only users will not be tempted to try and update the received 
structure. As for the arguments to the factory, Michael McCandless suggested 
that the preferences should be expressed with (packed | aligned32 | aligned64 | 
auto). As fas as I can see, this should work. However, I've only just reached 
this conclusion and haven't had the time to implement it.

A speed-test has been added and the results from my machine can be seen below. 
In order for it to be really usable, it should be tried on other machines too.

I won't touch the code before sometime next week, but I'll keep an eye on 
LUCENE-1990 comments until then.

{code}
        bitsPerValue          valueCount            getCount    
PackedDirectByte   PackedDirectShort            Packed32     PackedAligned32    
 PackedDirectInt            Packed64     PackedAligned64    PackedDirectLong
                   1                1000            10000000                 
167                 141                 258                 242                 
172                 264                 242                 183
                   1             1000000            10000000                 
224                 232                 266                 233                 
246                 262                 238                 338
                   1            10000000            10000000                 
359                 469                 280                 278                 
508                 278                 272                 551
                   3                1000            10000000                 
168                 166                 265                 241                 
163                 262                 243                 166
                   3             1000000            10000000                 
227                 226                 261                 251                 
239                 274                 249                 330
                   3            10000000            10000000                 
406                 476                 301                 304                 
522                 300                 308                 547
                   4                1000            10000000                 
167                 168                 266                 239                 
164                 285                 239                 169
                   4             1000000            10000000                 
228                 231                 294                 274                 
262                 291                 269                 314
                   4            10000000            10000000                 
385                 480                 308                 333                 
514                 331                 315                 557
                   7                1000            10000000                 
172                 174                 278                 248                 
162                 271                 238                 177
                   7             1000000            10000000                 
224                 236                 289                 281                 
272                 278                 277                 345
                   7            10000000            10000000                 
405                 473                 389                 447                 
516                 399                 402                 553
                   8                1000            10000000                 
192                 171                 268                 242                 
174                 291                 240                 163
                   8             1000000            10000000                 
226                 232                 291                 284                 
286                 274                 265                 314
                   8            10000000            10000000                 
381                 467                 406                 428                 
512                 422                 419                 580

        bitsPerValue          valueCount            getCount   
PackedDirectShort            Packed32     PackedAligned32     PackedDirectInt   
         Packed64     PackedAligned64    PackedDirectLong
                   9                1000            10000000                 
166                 274                 241                 170                 
261                 237                 163
                   9             1000000            10000000                 
229                 299                 273                 250                 
284                 275                 327
                   9            10000000            10000000                 
483                 443                 477                 519                 
438                 455                 568
                  15                1000            10000000                 
170                 265                 239                 174                 
264                 235                 162
                  15             1000000            10000000                 
232                 285                 274                 240                 
278                 269                 339
                  15            10000000            10000000                 
473                 518                 524                 523                 
519                 521                 550
                  16                1000            10000000                 
166                 263                 236                 172                 
264                 235                 160
                  16             1000000            10000000                 
229                 285                 278                 244                 
293                 272                 332
                  16            10000000            10000000                 
470                 513                 517                 509                 
534                 529                 548

        bitsPerValue          valueCount            getCount            
Packed32     PackedAligned32     PackedDirectInt            Packed64     
PackedAligned64    PackedDirectLong
                  17                1000            10000000                 
262                 255                 177                 260                 
234                 160
                  17             1000000            10000000                 
290                 306                 273                 304                 
290                 320
                  17            10000000            10000000                 
532                 572                 533                 529                 
556                 551
                  28                1000            10000000                 
269                 256                 187                 267                 
238                 163
                  28             1000000            10000000                 
293                 295                 253                 293                 
296                 312
                  28            10000000            10000000                 
542                 567                 501                 548                 
567                 542
                  31                1000            10000000                 
260                 235                 177                 266                 
232                 158
                  31             1000000            10000000                 
292                 294                 244                 296                 
297                 328
                  31            10000000            10000000                 
552                 563                 516                 562                 
568                 548

        bitsPerValue          valueCount            getCount     
PackedDirectInt            Packed64     PackedAligned64    PackedDirectLong
                  32                1000            10000000                 
172                 263                 241                 166
                  32             1000000            10000000                 
241                 291                 297                 320
                  32            10000000            10000000                 
519                 556                 573                 546

        bitsPerValue          valueCount            getCount            
Packed64     PackedAligned64    PackedDirectLong
                  33                1000            10000000                 
264                 239                 159
                  33             1000000            10000000                 
293                 374                 319
                  33            10000000            10000000                 
559                 595                 552
                  47                1000            10000000                 
264                 242                 164
                  47             1000000            10000000                 
319                 369                 322
                  47            10000000            10000000                 
577                 601                 548
                  49                1000            10000000                 
261                 243                 162
                  49             1000000            10000000                 
323                 413                 319
                  49            10000000            10000000                 
584                 610                 551
                  63                1000            10000000                 
269                 235                 161
                  63             1000000            10000000                 
396                 369                 313
                  63            10000000            10000000                 
592                 596                 559
{code}

(Java 1.6.0_15-b03, default settings on a Dell Precision M6500: Intel i7 Q 820 
@ 1.73GHz, 8 MB level 2 cache,  dual-channel PC 1333 RAM, running Ubuntu Karmic)

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990-te20100122.patch, 
> LUCENE-1990-te20100210.patch, LUCENE-1990-te20100212.patch, 
> LUCENE-1990.patch, LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to