[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Toke Eskildsen updated LUCENE-1990: ----------------------------------- Attachment: LUCENE-1990-te20100212.patch I've read through the comments on LUCENE-1990 and implemented most of what has been suggested. The attached patch contains implementations for all the variants we've talked about, including aligned. There's a known bug in persistence for aligned64 (and probably also for aligned32) that I haven't stomped yet. There's also a clear need for a more elaborate unit-test with regard to persistence. Other outstanding issues, as I see them, are whether or not mutable packed arrays should be requestable (as general purpose data structures) and how the factory for creating a writer should work. I have added a getMutable-method to the factory and not touched the return type Reader for the getReader-method. That way read-only users will not be tempted to try and update the received structure. As for the arguments to the factory, Michael McCandless suggested that the preferences should be expressed with (packed | aligned32 | aligned64 | auto). As fas as I can see, this should work. However, I've only just reached this conclusion and haven't had the time to implement it. A speed-test has been added and the results from my machine can be seen below. In order for it to be really usable, it should be tried on other machines too. I won't touch the code before sometime next week, but I'll keep an eye on LUCENE-1990 comments until then. {code} bitsPerValue valueCount getCount PackedDirectByte PackedDirectShort Packed32 PackedAligned32 PackedDirectInt Packed64 PackedAligned64 PackedDirectLong 1 1000 10000000 167 141 258 242 172 264 242 183 1 1000000 10000000 224 232 266 233 246 262 238 338 1 10000000 10000000 359 469 280 278 508 278 272 551 3 1000 10000000 168 166 265 241 163 262 243 166 3 1000000 10000000 227 226 261 251 239 274 249 330 3 10000000 10000000 406 476 301 304 522 300 308 547 4 1000 10000000 167 168 266 239 164 285 239 169 4 1000000 10000000 228 231 294 274 262 291 269 314 4 10000000 10000000 385 480 308 333 514 331 315 557 7 1000 10000000 172 174 278 248 162 271 238 177 7 1000000 10000000 224 236 289 281 272 278 277 345 7 10000000 10000000 405 473 389 447 516 399 402 553 8 1000 10000000 192 171 268 242 174 291 240 163 8 1000000 10000000 226 232 291 284 286 274 265 314 8 10000000 10000000 381 467 406 428 512 422 419 580 bitsPerValue valueCount getCount PackedDirectShort Packed32 PackedAligned32 PackedDirectInt Packed64 PackedAligned64 PackedDirectLong 9 1000 10000000 166 274 241 170 261 237 163 9 1000000 10000000 229 299 273 250 284 275 327 9 10000000 10000000 483 443 477 519 438 455 568 15 1000 10000000 170 265 239 174 264 235 162 15 1000000 10000000 232 285 274 240 278 269 339 15 10000000 10000000 473 518 524 523 519 521 550 16 1000 10000000 166 263 236 172 264 235 160 16 1000000 10000000 229 285 278 244 293 272 332 16 10000000 10000000 470 513 517 509 534 529 548 bitsPerValue valueCount getCount Packed32 PackedAligned32 PackedDirectInt Packed64 PackedAligned64 PackedDirectLong 17 1000 10000000 262 255 177 260 234 160 17 1000000 10000000 290 306 273 304 290 320 17 10000000 10000000 532 572 533 529 556 551 28 1000 10000000 269 256 187 267 238 163 28 1000000 10000000 293 295 253 293 296 312 28 10000000 10000000 542 567 501 548 567 542 31 1000 10000000 260 235 177 266 232 158 31 1000000 10000000 292 294 244 296 297 328 31 10000000 10000000 552 563 516 562 568 548 bitsPerValue valueCount getCount PackedDirectInt Packed64 PackedAligned64 PackedDirectLong 32 1000 10000000 172 263 241 166 32 1000000 10000000 241 291 297 320 32 10000000 10000000 519 556 573 546 bitsPerValue valueCount getCount Packed64 PackedAligned64 PackedDirectLong 33 1000 10000000 264 239 159 33 1000000 10000000 293 374 319 33 10000000 10000000 559 595 552 47 1000 10000000 264 242 164 47 1000000 10000000 319 369 322 47 10000000 10000000 577 601 548 49 1000 10000000 261 243 162 49 1000000 10000000 323 413 319 49 10000000 10000000 584 610 551 63 1000 10000000 269 235 161 63 1000000 10000000 396 369 313 63 10000000 10000000 592 596 559 {code} (Java 1.6.0_15-b03, default settings on a Dell Precision M6500: Intel i7 Q 820 @ 1.73GHz, 8 MB level 2 cache, dual-channel PC 1333 RAM, running Ubuntu Karmic) > Add unsigned packed int impls in oal.util > ----------------------------------------- > > Key: LUCENE-1990 > URL: https://issues.apache.org/jira/browse/LUCENE-1990 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Priority: Minor > Attachments: LUCENE-1990-te20100122.patch, > LUCENE-1990-te20100210.patch, LUCENE-1990-te20100212.patch, > LUCENE-1990.patch, LUCENE-1990_PerformanceMeasurements20100104.zip > > > There are various places in Lucene that could take advantage of an > efficient packed unsigned int/long impl. EG the terms dict index in > the standard codec in LUCENE-1458 could subsantially reduce it's RAM > usage. FieldCache.StringIndex could as well. And I think "load into > RAM" codecs like the one in TestExternalCodecs could use this too. > I'm picturing something very basic like: > {code} > interface PackedUnsignedLongs { > long get(long index); > void set(long index, long value); > } > {code} > Plus maybe an iterator for getting and maybe also for setting. If it > helps, most of the usages of this inside Lucene will be "write once" > so eg the set could make that an assumption/requirement. > And a factory somewhere: > {code} > PackedUnsignedLongs create(int count, long maxValue); > {code} > I think we should simply autogen the code (we can start from the > autogen code in LUCENE-1410), or, if there is an good existing impl > that has a compatible license that'd be great. > I don't have time near-term to do this... so if anyone has the itch, > please jump! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org