[
https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Toke Eskildsen updated LUCENE-1990:
-----------------------------------
Attachment: LUCENE-1990-te20100212.patch
I've read through the comments on LUCENE-1990 and implemented most of what has
been suggested. The attached patch contains implementations for all the
variants we've talked about, including aligned. There's a known bug in
persistence for aligned64 (and probably also for aligned32) that I haven't
stomped yet. There's also a clear need for a more elaborate unit-test with
regard to persistence.
Other outstanding issues, as I see them, are whether or not mutable packed
arrays should be requestable (as general purpose data structures) and how the
factory for creating a writer should work. I have added a getMutable-method to
the factory and not touched the return type Reader for the getReader-method.
That way read-only users will not be tempted to try and update the received
structure. As for the arguments to the factory, Michael McCandless suggested
that the preferences should be expressed with (packed | aligned32 | aligned64 |
auto). As fas as I can see, this should work. However, I've only just reached
this conclusion and haven't had the time to implement it.
A speed-test has been added and the results from my machine can be seen below.
In order for it to be really usable, it should be tried on other machines too.
I won't touch the code before sometime next week, but I'll keep an eye on
LUCENE-1990 comments until then.
{code}
bitsPerValue valueCount getCount
PackedDirectByte PackedDirectShort Packed32 PackedAligned32
PackedDirectInt Packed64 PackedAligned64 PackedDirectLong
1 1000 10000000
167 141 258 242
172 264 242 183
1 1000000 10000000
224 232 266 233
246 262 238 338
1 10000000 10000000
359 469 280 278
508 278 272 551
3 1000 10000000
168 166 265 241
163 262 243 166
3 1000000 10000000
227 226 261 251
239 274 249 330
3 10000000 10000000
406 476 301 304
522 300 308 547
4 1000 10000000
167 168 266 239
164 285 239 169
4 1000000 10000000
228 231 294 274
262 291 269 314
4 10000000 10000000
385 480 308 333
514 331 315 557
7 1000 10000000
172 174 278 248
162 271 238 177
7 1000000 10000000
224 236 289 281
272 278 277 345
7 10000000 10000000
405 473 389 447
516 399 402 553
8 1000 10000000
192 171 268 242
174 291 240 163
8 1000000 10000000
226 232 291 284
286 274 265 314
8 10000000 10000000
381 467 406 428
512 422 419 580
bitsPerValue valueCount getCount
PackedDirectShort Packed32 PackedAligned32 PackedDirectInt
Packed64 PackedAligned64 PackedDirectLong
9 1000 10000000
166 274 241 170
261 237 163
9 1000000 10000000
229 299 273 250
284 275 327
9 10000000 10000000
483 443 477 519
438 455 568
15 1000 10000000
170 265 239 174
264 235 162
15 1000000 10000000
232 285 274 240
278 269 339
15 10000000 10000000
473 518 524 523
519 521 550
16 1000 10000000
166 263 236 172
264 235 160
16 1000000 10000000
229 285 278 244
293 272 332
16 10000000 10000000
470 513 517 509
534 529 548
bitsPerValue valueCount getCount
Packed32 PackedAligned32 PackedDirectInt Packed64
PackedAligned64 PackedDirectLong
17 1000 10000000
262 255 177 260
234 160
17 1000000 10000000
290 306 273 304
290 320
17 10000000 10000000
532 572 533 529
556 551
28 1000 10000000
269 256 187 267
238 163
28 1000000 10000000
293 295 253 293
296 312
28 10000000 10000000
542 567 501 548
567 542
31 1000 10000000
260 235 177 266
232 158
31 1000000 10000000
292 294 244 296
297 328
31 10000000 10000000
552 563 516 562
568 548
bitsPerValue valueCount getCount
PackedDirectInt Packed64 PackedAligned64 PackedDirectLong
32 1000 10000000
172 263 241 166
32 1000000 10000000
241 291 297 320
32 10000000 10000000
519 556 573 546
bitsPerValue valueCount getCount
Packed64 PackedAligned64 PackedDirectLong
33 1000 10000000
264 239 159
33 1000000 10000000
293 374 319
33 10000000 10000000
559 595 552
47 1000 10000000
264 242 164
47 1000000 10000000
319 369 322
47 10000000 10000000
577 601 548
49 1000 10000000
261 243 162
49 1000000 10000000
323 413 319
49 10000000 10000000
584 610 551
63 1000 10000000
269 235 161
63 1000000 10000000
396 369 313
63 10000000 10000000
592 596 559
{code}
(Java 1.6.0_15-b03, default settings on a Dell Precision M6500: Intel i7 Q 820
@ 1.73GHz, 8 MB level 2 cache, dual-channel PC 1333 RAM, running Ubuntu Karmic)
> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
> Key: LUCENE-1990
> URL: https://issues.apache.org/jira/browse/LUCENE-1990
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-1990-te20100122.patch,
> LUCENE-1990-te20100210.patch, LUCENE-1990-te20100212.patch,
> LUCENE-1990.patch, LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl. EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage. FieldCache.StringIndex could as well. And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs {
> long get(long index);
> void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting. If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
> PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
