[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2662: - Affects Version/s: (was: Realtime Branch) Fix Version/s: (was: Realtime Branch) > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Simon Willnauer >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, > LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2662: Attachment: LUCENE-2662.patch This patch fixes nulling out the recycled but not reused byte blocks in RecyclingByteBlockAllocator. I thing we are ready to go I will commit to trunk soon. I don't think we need a CHANGES.TXT here - at least I can not find any section this refactoring would fit to. simon > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Realtime Branch, 4.0 >Reporter: Jason Rutherglen >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch, 4.0 > > Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, > LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2662: Attachment: LUCENE-2662.patch Next iteration - seems to be very close! I have applied the following changes: * introduces a AtomicLong to track bytesUsed in DocumetnsWriter, TermsHashPerField, ByteRefHash and RecyclingByteBlockAllocator * Factored out a BytesStartArray class from BytesRefHash that manages the int[] holding the bytesStart offsets. TermsHashPerField subclasses and manages the ParallelPostingsArray through it. * remove remaining no-commits * made RecyclingbyteBlockAllocator synced by default (we use synchronized methods for it now) I run a quick Wikipedia 100k docs benchmark against trunk vs. LUCENE-2662 and the results are promising. |version|rec/sec|elapsed sec|avgUsedMem| |LUCENE-2662|717.30|139.41|536,682,592| |trunk| 682.66|146.49|546,065,344| I will run the 10M benchmark once I get back to this. > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Realtime Branch, 4.0 >Reporter: Jason Rutherglen >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch, 4.0 > > Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, > LUCENE-2662.patch, LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2662: Attachment: LUCENE-2662.patch We are almost there. I factored out ByteRefHash out of TermsHashPerField just having two "nocommit" parts left in the code I need to find a solution for. * there needs to be a way to communicate the byte usage up to DocumentsWriter which I haven't explored yet * textStarts in ParallelPostingsArray needs to be replaced since it is already maintained in ByteRefHash. I will need to look closer into that but suggestions are welcome. One way to do it would be to attach a reference to BRH instead of the textStart - but that is a naive suggestion since I haven't looked into that in more detail. All tests are passing so far and TermsHashPerField looks somewhat cleaner. I will work on fixing those nocommits and run some indexing perf test against the patch. > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Realtime Branch, 4.0 >Reporter: Jason Rutherglen >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch, 4.0 > > Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, > LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2662: Attachment: LUCENE-2662.patch Attaching my current state for feedback and iteration. * factored out ByteBlockAllocator from DocumentsWriter * moved ByteBlockPool to o.a.l.util * added RecyclingByteBlockAllocator which can be used with or without synchronization. IMO the DummyConcurrentLock will be optimized away so that his might be super low cost. - feedback for that would more than welcome. * addressed all the comments from mike - thanks again * added more tests * cut over constants from DocumentsWriter to ByteBlockPool TermsHashPerField is next feedback welcome. simon > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Realtime Branch, 4.0 >Reporter: Jason Rutherglen >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch, 4.0 > > Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2662: Attachment: LUCENE-2662.patch This patch contains a slightly different version of BytesHash (renamed it to BytesRefHash but that is to be discussed - while writing this I actually think BytesHash is the better name). BytesRefHash is now final and does not create Entry objects anymore. Internally it maintains two integer arrays one acting as the hash buckets and the other one contain the bytes-start offset in the ByteBlockPool. Each added entry is assigned to an increasing ordinal since this is what Entry is used in almost all use-cases (in CSF though). For TermsHashPerField this is also "native" since is uses the same kind of referencing system. These changes keep this class as efficient as possible, keeping GC costs low and allows JIT to do better optimizations. IMO this class is super performance critical and since we recently refactored indexing towards parallel arrays adding another "object" array might not be the way to go anyway. I also incorporated robers comments - thanks for the review anyway. I guess that is the first step towards factoring it out of TermsHashPerField, the next question is are we gonna do that in a different issue and get this committed first? comments / review welcome!! One more thing, I did not move ByteBlockPool to o.a.l.utils but I thing it belongs there, thoughts? > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Realtime Branch, 4.0 >Reporter: Jason Rutherglen >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch, 4.0 > > Attachments: LUCENE-2662.patch, LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2662: Fix Version/s: 4.0 Affects Version/s: 4.0 > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Realtime Branch, 4.0 >Reporter: Jason Rutherglen >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch, 4.0 > > Attachments: LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2662: - Attachment: LUCENE-2662.patch We need unit tests and a base implementation as BytesHash is abstract... > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Realtime Branch >Reporter: Jason Rutherglen >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2662.patch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2662) BytesHash
[ https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2662: - Priority: Minor (was: Major) > BytesHash > - > > Key: LUCENE-2662 > URL: https://issues.apache.org/jira/browse/LUCENE-2662 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: Realtime Branch >Reporter: Jason Rutherglen >Priority: Minor > Fix For: Realtime Branch > > > This issue will have the BytesHash separated out from LUCENE-2186 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org