[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636799#action_12636799 ]
Michael McCandless commented on LUCENE-1410: -------------------------------------------- bq. As for decompression speed, please remember that the patching code (that decodes the exceptions into the output) has not yet been optimized at all. But this is what I find so weird: for prx, if I fix the encoding at 6 bits, which generates a zillion exceptions, we are ~31% faster than decoding vInts, and the pfor file size is much bigger (847 MB vs 621 MB) than vInt. But if instead I use the bit size as returned by getNumFrameBits(), which has far fewer exceptions, we are 9.0% slower and file size is a bit smaller than vInt. Exception processing seems to be very fast, or, it's the non-unrolled (ForDecompress.decodeAnyFrame) that's slow which could very well be. bq. The lucene .frq file contains the docids as deltas and the frequencies but with a special encoding in case the frequency is one. I'd rather try compressing the real delta docids and the real frequencies than the encoded versions. I'll try that. I bet if we had two separate streams (one for the docIDs and one for the corresponding freqs) we'd get even better pFor compression. If we did that "for real" in Lucene that'd also make it fast to use a normally-indexed field for pure boolean searching (you wouldn't have to index 2 different fields as you do today to gain the performance at search time). In fact, for AND queries on a normal Lucene index this should also result in faster searching since when searching for the common docIDs you at first don't need the freq data. Marvin has been doing some performance testing recently and seems to have concluded that keeping prx and frq as separate files (like Lucene does today but KinoSearch does not) gives better performance. Pushing that that same trend further, I think it may very well make sense to separate docID and frq as well. bq. A: there is also the influence of the reduction of data to be fetched (via various caches) from the index. As reported in the articles, PFor strikes a nice optimum between decompression speed and fetching speed from (fast) disks. True, but we are seeing just a bit of compression vs Lucene's current encoding. I think splitting out frq from docID may show better compression. {quote} >: I was thinking local CPU's native asm. A: I'd try a C version first. Iirc gcc has a decent optimizer for bit ops, but it's been a while for me that I used C. {quote} Well... that would just make me depressed (if from Javaland the CPU level benefits of pFor don't really "survive", but from C-land they do) ;) But yes I agree. bq. For the record, to show the variation in decompression speeds for different numbers of frame bits without exceptions, here is my current output from TestPFor: I have similar results (up to 7 bits -- can you post your new TestPFor.java?). The even-byte sizes (16, 24) have very sizable leads over the others. The 9-bit size is fantastically slow; it's insane that unrolling it isn't helping. Seems like we will need to peek at asm to understand what's going on at the "bare metal" level.... > PFOR implementation > ------------------- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Other > Reporter: Paul Elschot > Priority: Minor > Attachments: LUCENE-1410b.patch, TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]