[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061836#comment-13061836 ] Simon Willnauer commented on LUCENE-2793: - I fixed the two minor things from above, created two followup issues (LUCENE-3292 & LUCENE-3293) for the remaining TODOs and will go ahead reintegrating the branch now. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793_final.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061439#comment-13061439 ] Michael McCandless commented on LUCENE-2793: Looks good! +1 to land it! Just a few things: * Shouldn't WindowsDirectory also call BII.bufferSize(context) and do the same Math.max it used to do? * Should VarGapTermsIndexReader should pass READONCE context down when it opens/reads the FST? Hmm, though, it should just replace the ctx passed in, ie if we are merging vs reading we want to differentiate. Let's open separate issue for this and address post merge? * Can you open an issue for this one: "// TODO: context should be part of the key used to cache that reader in the pool."? This is pretty important, else you can get NRT readers with too-large buffer sizes because the readers had been opened for merging first. * Extra space in SegmentInfo.java: IOContext.READONCE ); > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793_final.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060538#comment-13060538 ] Michael McCandless commented on LUCENE-2793: +1 > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060537#comment-13060537 ] Simon Willnauer commented on LUCENE-2793: - bq. Made the necessary changes and hopefully addressed all the nocommits. varun, I still see lots of nocommits here. Would be good if you could address them this week. You don't need to solve them but discuss them here with us. you can do that in a patch and add your comments to the parts where you are not sure how to resolve. I would like to commit the patches this week so we can merge to trunk soonish. Simon > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059527#comment-13059527 ] Michael McCandless commented on LUCENE-2793: I think BufferedIndexInput doesn't need a set/getMergeBufferSize? Ie, BII only knows its bufferSize, regardless of the context from its parent. Otherwise I think your patch is good: today on trunk we hardwire the 4 KB buffer size for merges, which is the same thing your patch is doing; the only difference is the constant MERGE_BUFFER_SIZE has moved from IW to BII, and each Dir impl now has the "if". As a future improvement we can add a set/getMergeBufferSize to each Dir impl... > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058116#comment-13058116 ] Michael McCandless commented on LUCENE-2793: To address the nocommits about losing the larger buffer size during merging, should we add set/getMergeBufferSize and set/getDefaultBufferSize to those Dir impls that do buffering? (And default to what they are today on trunk, I think 1 KB and 4 KB?) > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057911#comment-13057911 ] Simon Willnauer commented on LUCENE-2793: - Varun, the latest patch is committed. I added some minor cleanups and removed invalid nocommits. We are in good shape already > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057903#comment-13057903 ] Simon Willnauer commented on LUCENE-2793: - Varun this patch looks great. I am about to commit it. Can you now work through the nocommits, fix them or post questions here? simon > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056449#comment-13056449 ] Simon Willnauer commented on LUCENE-2793: - Varun, patch looks great! here are some comments: * in LuceneTestCase#newIOContext(Random) your switch statement misses necessary break; statements no matter what you will always get a flush context. I think you should also randominze the numbers for numDocs and sizeInBytes. nobody should rely on these number they are just best effort. * in o.a.l.i.values.Bytes#getValues, o.a.l.i.StoredFieldsWriter#initFieldsWriter, o.a.l.i.SegmentMerger#mergeVectors, MemoryCodec#fieldsProducer, MockSingleIntIndexOutput#MockSingleIntIndexOutput you can remove the nocommit * the public members in FlushInfo should be final * the MergeInfo ctors javadoc should either describe the arguments or omit them. In this case I think you can simply omit them since they are pretty self explained. I think what we need on MergeInfo as well as on FlushInfo is a javadoc comment that says that the values are estimates not necessarily real / correct values. * on private IOContext (Context context, MergeInfo mergeInfo ) I don't understand the assert why is there a context != Flush? * in MockDirectoryWrapper there is a nocommit that says randomize the IOContext. Maybe we should put the newIOContext to _TestUtils and delegate to this from LuceneTestCase? Overall I think we are really close here. Once those comments are fixed I will commit the patch to the branch and we go through the remaining nocommits. Once this is done we should close this and create a new issue to fix all the buffer sizes just like LUCENE-3248. good job varun > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055649#comment-13055649 ] Simon Willnauer commented on LUCENE-2793: - bq. Should IOContext and MergeInfo be in oal.store not .index? +1 bq. I think SegmentMerger should receive an IOCtx from its caller, and yeah I think we should pass the IOContext in via the ctor. Yet, for IW#addIndexes you can simply build a best effort IOContext like: {code} for (IndexReader indexReader : readers) { numDocs += indexReader.numDocs(); } final IOContext context = new IOContext(new MergeInfo(numDocs, -1, true, false)); } bq. I think on flush IOContext should include num docs and estimated +1 I think that is good no? bq. Somehow, lucene/contrib/demo/data is deleted on the branch. We should check if anything else is missing! oh man... I will check you use new IOContext(Context.FLUSH) and new IOContext(Context.READ) in your patch but we have some static like IOContext.READ maybe we need FLUSH too? for the tests I think we should start randomizing the IOContext. I think you should add a newIOContext(Random random) to LuceneTestCase and get the context from there in a unit test. At the end of the day we should see same behavior whatever context you pass in right? simon > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054359#comment-13054359 ] Michael McCandless commented on LUCENE-2793: I took quick look @ the branch -- it's looking good! Some small stuff: * Should IOContext and MergeInfo be in oal.store not .index? * I think SegmentMerger should receive an IOCtx from its caller, and then apss that to all the IO ops it invokes? But the code has a nocommit about tripping an assert -- which one? * I think on flush IOContext should include num docs and estimated segment size (we can roughly pull this from RAM used for the segment), but we should include comment that this is only approx. * Somehow, lucene/contrib/demo/data is deleted on the branch. We should check if anything else is missing! > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051857#comment-13051857 ] Simon Willnauer commented on LUCENE-2793: - I created a branch for this issue and follow up issues here: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE2793/ > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051665#comment-13051665 ] Simon Willnauer commented on LUCENE-2793: - hey varun thanks for the new patch, some comments: * the nocommit in IW you should maybe add a second ctor to MergeInfo that takes arguments and then use something like this in IW: final IOContext context = new IOContext(new MergeInfo(info.docCount, info.sizeInBytes(true), true, false)); * It seems kind of odd to always prefix MergeInfo with OneMerge so maybe move it into its own file * regarding the read buffer problem, can you simply use BufferedIndexInput.BUFFER_SIZE to initialize it and put a TODO / nocommit on top * You should look into contrib/misc there are some compile errors in AppendingCodec as well as in solr land. * I think in Directory the openInput method should be abstract: public abstract IndexInput openInput(String name, IOContext context) throws IOException; and FSDirectory should not specify this method at all. Currently your code would produce a stack overflow since it calls itself. If I fix the nocommits in your patch test-core passes without problems. Looking good man we are getting closer! Simon > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050444#comment-13050444 ] Simon Willnauer commented on LUCENE-2793: - hey varun patch looks close! here are some comments: * the assert context == Context.MERGE should be assert context != Context.MERGE || mergeInfo != null; * can you move that assert into IOContext(Context, MergeInfo) and let other related constructors call this(context, mergeInfo) instead of initializing all members themself? * I think there should be a public static final IOContext READONCE = new IOContext(true); then you can make the corresponding constructor private. I think the context should be Context.READ instead of default in that case right? * IOContext(MergePolicy.OneMerge) seems to be unnecessary. I think you should add a method to OneMerge to get a MergeInfo from it and only have a MergeInfo ctor. Then you can move MergeInfo into OneMerge too. * PerFieldCodecWrapper still seems to be deleted * In IndexReader IOContext context=null; should be IOContext context= new IOContext(READ); no? * no commit should be nocommit - we have a script on jenkins that checks this :) * I still see some whitespace problems in SegmentWriteState.java * I think IOContext.DEFAULT_IOCONTEXT should be IOContext.DEFAULT since IOContext is implicit I am waiting for you fixing the tests before I review further. Yet, what is missing is still the decision what buffer size to used down in direcotries etc. good work so far! > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049293#comment-13049293 ] Michael McCandless commented on LUCENE-2793: LUCENE-3203 is another example where a Dir needs the IOContext so it can optionally rate limit the bytes/second if it's a merge. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048285#comment-13048285 ] Michael McCandless commented on LUCENE-2793: bq. Can OneMerge extend MergeInfo? I think this is risky because it then makes IOContext an unexpectedly heavy object, since OneMerge holds SegmentReaders used for merging. There are already some places that save away an IOContext instance as a member (to be used later), and if something ever does this for a MERGE then we can unexpectedly hold keep a lot of garbage alive. I think, instead, when the context is MERGE, we should make a new (lightweight) MergeInfo instance that pulls the necessary details from the OneMerge, ie fully decouples from the OneMerge instance. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048283#comment-13048283 ] Michael McCandless commented on LUCENE-2793: Patch is looking good! * I thinkt the Context enum values should be ALL_CAPS * FieldsWriter's ctor should take an IOContext? Generally low level places shouldn't create a new IOContext; they should be passed one (though there are exceptions... eg I think it's fine that SegmentInfos.run uses DEFAULT). * Same for values/IntsImpl, values/Bytes, values/Floats, SegmentReader.get, TermVectorsTermsReader, TermVectorsWriter, DefaultSegmentInfosWriter, DefaultSegmentInfosReader, FixedGapTermsIndexReader, VariableGapTermsIndexReader, NormsWriter, BitVector (read & write) * I think MP.OneMerge should not extend the new MergeInfo; I'm worried about cases where classes hang onto the IOContext (eg, various places save the IOContxt away as a member) because this could then risk holding refs to the SegmentReaders created for merging. I think it's less risky to fully decouple the MergeInfo from OneMerge. Then, maybe MergeInfo should be a static inner class inside IOContext? * IndexWriter.ReaderPool.get should pass the IOCtx down to SegmentReader.get * IndexWriter.prepareFlushedSegment should be FLUSH not DEFAULT context. * Can you add assert inside IOContext: if the ctx is MERGE then the MergeInfo must not be null (ie in the ctor that takes only a Context). * MergeInfo needs a few more fields (eg optimize, isExternal) * When IndexWriter.addIndexes makes the MERGE context it should pass a MergeInfo with isExternal true * In IndexWriter.mergeMiddle, you should make a single IOCtx(MERGE) up front and pass to all readerPool.get calls * At the end of IndexWriter.mergeMiddle, when we open the mergedReader... we should not pass MERGE context here, somehow, because this open is very different than when we open the to-be-merged segments. IE, we are opening the merged segment. Hmm, maybe, we can add a new flag to the MergeInfo, "isMergedSegment"? Alternatively... we use Context.READ, but then I think we need something else that states what "READ" this really is -- eg NRT reader, merged segment reader, "normal" (IR.open) reader? * TermVectorsTermsWriter should be a Context.FLUSH * SegmentWriteState.context can be final * Somehow, PerFieldCodecWrapper.java shows as deleted... * MergeInfo.java needs copygirht header & javadoc. * Need whitespace around '=', eg "this.context=context;" should be "this.context = context;" > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047970#comment-13047970 ] Simon Willnauer commented on LUCENE-2793: - hi varun, looks good, some quick comments, * Can OneMerge extend MergeInfo? * Can we introduce a static IOContext instance like public static final IOContext DEFAULT = new IOContext() and use this everywhere where we need to use the default. All those object creations are unnecessary. * Can IOContext() call this(false) instead of manually setting the values * I think IOContext should take MergeInfo instead of OneMerge if you extend MergeInfo in OneMerge you can just pass OneMerge and you are done. I will have a closer look tomorrow simon > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047234#comment-13047234 ] Simon Willnauer commented on LUCENE-2793: - Varun, your patch doesn't apply cleanly to my latest trunk. i think you should update your local copy again! I have trouble to understand how you use MergeInfo etc. I figure there might be a misunderstanding so here is what I had in mind roughly: {code} Index: lucene/src/java/org/apache/lucene/index/MergePolicy.java === --- lucene/src/java/org/apache/lucene/index/MergePolicy.java(revision 1134335) +++ lucene/src/java/org/apache/lucene/index/MergePolicy.java(working copy) @@ -64,7 +64,7 @@ * subset of segments to be merged as well as whether the * new segment should use the compound file format. */ - public static class OneMerge { + public static class OneMerge extends MergeInfo { SegmentInfo info; // used by IndexWriter boolean optimize; // used by IndexWriter @@ -72,25 +72,26 @@ long mergeGen; // used by IndexWriter boolean isExternal; // used by IndexWriter int maxNumSegmentsOptimize; // used by IndexWriter -public long estimatedMergeBytes; // used by IndexWriter List readers;// used by IndexWriter List readerClones; // used by IndexWriter -public final List segments; -public final int totalDocCount; +public final List segments = new ArrayList(); boolean aborted; Throwable error; boolean paused; public OneMerge(List segments) { + super(getSegments(segments)); +} + +private static int getSegments(List segments) { if (0 == segments.size()) throw new RuntimeException("segments must include at least one segment"); // clone the list, as the in list may be based off original SegmentInfos and may be modified - this.segments = new ArrayList(segments); int count = 0; for(SegmentInfo info : segments) { count += info.docCount; } - totalDocCount = count; + return count; } /** Record that an exception occurred while executing {code} > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046635#comment-13046635 ] Michael McCandless commented on LUCENE-2793: bq. It seems that we don't need to provide IOContext to FieldInfos and SegmentInfo since we are reading them into memory anyway. I think you can just use a default context here without changing the constructors. Same is true for SegmentInfo I think we should pass down "readOnce=true" for these cases? EG some kind of caching dir (or something) would know not to bother caching such files... Same for del docs, terms index, doc values (well, sometimes), etc. bq. it seems that we should communicate the IOContext to the codec somehow. I suggest we put IOContext to SegmentWriteState and SegmentReadState that way we don't need to change the Codec interface and clutter it with internals. This would also fix mikes comment for FieldsConsumer etc. +1 that's great. bq. I really don't like OneMerge I think we should add an abstract class (maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now. If we can't include OneMerge, and I agree it'd be nice not to, I think we should try hard to pull stuff out of OneMerge that may be of interest to a Dir impl? Maybe: * estimatedTotalSegmentSizeBytes * docCount * optimize/expungeDeletes * isExternal (so Dir can know if this is addIndexes vs "normal" merging) bq. Regarding the IOContext class I think we should design for what we have right now and since SegementInfo is not used anywhere (as far as I can see) we should add it once we need it. OneMerge should not go in there but rather the interface / abstract class I talked about above. I agree, let's wait until we have a need. In fact... SegmentInfo for flush won't work: we go and open all files for flushing, write to them, close them, and only then do we make the SegmentInfo. So it seems like we should also have some abtracted stuff about the to-be-flushed segment? Maybe for starters the estimatedSegmentSizeBytes? EG, NRTCachingDir could use this to decide whether to cache the new segment (today it fragile-ly relies on the app to open new NRT reader frequently enough). > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046403#comment-13046403 ] Simon Willnauer commented on LUCENE-2793: - Hey varun, here are some more comments for the latest complete patch: * We should have a static instance for IOContext with Context.Other which you can use in BitVector / CheckIndex for instance Maybe IOContext#DEFAULT_CONTEXT * It seems that we don't need to provide IOContext to FieldInfos and SegmentInfo since we are reading them into memory anyway. I think you can just use a default context here without changing the constructors. Same is true for SegmentInfos * This is unrelated to your patch but in PreFlexFields we should use IndexFileNames.segmentFileName(info.name, "", PreFlexCodec.FREQ_EXTENSION) and IndexFileNames.segmentFileName(info.name, "", PreFlexCodec.PROX_EXTENSION) instead of info.name + ".frq" and info.name + ".prx" * it seems that we should communicate the IOContext to the codec somehow. I suggest we put IOContext to SegmentWriteState and SegmentReadState that way we don't need to change the Codec interface and clutter it with internals. This would also fix mikes comment for FieldsConsumer etc. * TermVectorsWriter is only used in Merges so maybe it should also get a Context.Merge for consistency? * I really don't like OneMerge :) I think we should add an abstract class (maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now. * small typo in RamDirectory, there is a space missing after the second file here: dir.copy(this, file, file,context); * SegmentReader should also use the static Default IOContext - make sure its used where needed :) Regarding the IOContext class I think we should design for what we have right now and since SegementInfo is not used anywhere (as far as I can see) we should add it once we need it. OneMerge should not go in there but rather the interface / abstract class I talked about above. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045455#comment-13045455 ] Michael McCandless commented on LUCENE-2793: {quote} I think we'll need a NativeMMapDir as well as NativeDir (or NativeUnix/WindowsDir), because mmap can also take flags giving hints about access patterns. I'll open a new issue... {quote} I opened LUCENE-3178. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045354#comment-13045354 ] Michael McCandless commented on LUCENE-2793: Patch looks great! Comments: * I think IOCtx should have ctor taking only OneMerge, which would set the OneMerge and set the context as Merge? * Likewise, a ctor taking a SegmentInfo to mean context = Flush? * And finally a default ctor that maps to Other (or Default or Unknown or Unspecified or something) * You don't need to create the IOCtx in IW.maybeMerge? * Maybe we want a "readOnce" boolean in IOCtx? When we read del docs, norms, terms index, doc values, segments file / segments.gen, we would set this? (And UnixDir would send eg NO_REUSE down to the OS). * I think we'll need a NativeMMapDir as well as NativeDir (or NativeUnix/WindowsDir), because mmap can also take flags giving hints about access patterns. I'll open a new issue... * Why does SegmentCoreReaders hang onto the IOCtx? Seems like classes shouldn't hang onto it... (also: PreFlexFields). * Hmm does createOutput even need an IOCtx...? What would a dir do with this? I suppose if it's a merge and we had io prioritization (someday) we could set lower prio... OK let's keep it. * I think Codec.fieldsProducer/Consumer should take an IOCtx? * Still need to fix IW's ReaderPool to key off of IOCtx.Context plus the info. Maybe put a // nocommit in there so we remember...? Eg, where you commented out the readBufferSize = ... inside ReaderPool.get is a good place. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045322#comment-13045322 ] Simon Willnauer commented on LUCENE-2793: - Very very quick review: * I think OneMerge should not be required for createing a IOContext we maybe should add a default ctor. * IOContext.Other is confusing I think. If a IOContext doesn't make lots of sense somewhere we should not need to pass it in. Can't we simply have overloaded methods? maybe I just don't like the name, maybe use DEFAULT? * IOContext seem to pretty straight forward you either read or write but it seems to be confused with high level operations like Merge and Flush. Either with go on a high level or we only have read and write here. Since read / write is implicit (you either pull input or output) we should make this high-level only. So maybe we have Query or Search instead of Read here? Maybe it makes sense to specify stuff like Consume or Sequential here too some high level APIs define sequential access so I think it does not conflict? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042824#comment-13042824 ] Michael McCandless commented on LUCENE-2793: Patch is looking good! Some feedback: * I think CompoundFileReader should not store the IOCtx passed to the ctor? It should just fwd to the dir.openInput...? It used to have to store the readBufferSize in case openInput was called w/o a readBufferSize, but we now require IOCtx to openInput. * Is IOCtx never allowed to be null? (I think so?). In which case we should jdoc this, and Dir impls can rely on it. * I don't think NRTCachingDir.unCache should take an IOCtx? * I think NRTCachineDir.doCacheWrite should take the IOCtx, and, we should remove all the merge thread hacks that it now does. * The super(bufferSize) call is needed in SimpleFSDir (and others). I think what must happen here is SimpleFSDir look @ the IOCtx and decides what buffer size to use (the logic in IW should be moved down into here), with getters/setters to be able to change read/write buffer size for merging vs reading/flushing. * IOContext.Context should also have Flush? And maybe rename Search -> Read? * IndexWriter ctor shouldn't take an incoming IOCtx; instead, IndexWriter should create a new IOCtx each time it opens reader / flushes segments / does merging. * The ReaderPool.get* methods (inside IndexWriter) should take an IOCtx, and the .context from that IOCtx should be part of the key used to cache that reader in the pool. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041642#comment-13041642 ] Chris Male commented on LUCENE-2793: {quote} So, if we really don't want to include these classes, and instead pick & choose what "digested" properties to include... then we risk some Dir impl not having access to something "interesting" because we forgot to include it, which means they'll have to hack something up, like NRTCachingDir does now in coupling threads to OneMerge instances. But, maybe the restrictions won't be so bad in practice... if we can try to brainstorm what Dirs may want to "know" about the full context and include that in IOCtx. {quote} I think the alternatives to this are definitely worse. I've been trying to think of some sort of intermediary component that could decouple the interests of Directories from the sources of that info, but it just seems to make everything way more complicated. I agree it seems best to stick to what Dirs want to know, present that as the IOCtx, and then work out the sources of that information. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041617#comment-13041617 ] Michael McCandless commented on LUCENE-2793: bq. If bean or copy-on-write builder I need to see what properties should be on it. so lets keep that for later. OK let's defer this decision for now. I just see "builder pattern" being too much of a hammer in search of nails... {quote} I really don't like the idea of exploiting OneMerge to the Directory. We might introduce some other internface that holds enough metadata to get the info and let OneMerge implement this but we should not pass it down to the dir. Same is true for SI I don't want directory mess around with those classes. {quote} I agree, it's spooky letting such a low level class (Dir) have access to high level stuff (OneMerge, SegmentInfo)... but it's really the only way to ensure we don't "miss" useful context about this file? Ie these high level classes already encompass the full context for a given file. So, if we really don't want to include these classes, and instead pick & choose what "digested" properties to include... then we risk some Dir impl not having access to something "interesting" because we forgot to include it, which means they'll have to hack something up, like NRTCachingDir does now in coupling threads to OneMerge instances. But, maybe the restrictions won't be so bad in practice... if we can try to brainstorm what Dirs may want to "know" about the full context and include that in IOCtx. For example, just by adding estimatedFullSegmentSizeBytes, for both Merge and Write (hmm: maybe Flush?), NRTCachingDir can cutover to IOCtx. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041560#comment-13041560 ] Simon Willnauer commented on LUCENE-2793: - bq. Why not use a normal struct/bean like class here? What I had in mind here is a more sophisticated IOContext. I'd like to see some default IOCtx for Merge, Search, Write etc. that folks can simply modify if needed. For that to work it should be immutable. The question here is if we go the high level way like you suggested and make the most of the properties private to the dir (like buffersize etc) or is we should make it available to the user of the API. Some Dirs might have different defaults than others though. I can see why this should be private to the directory and it might make more sense to simply indicate what the input / output is used for and let the directory figure out the details. So yeah +1 to keep it high level. If bean or copy-on-write builder I need to see what properties should be on it. so lets keep that for later. {quote}Ideally, I would also like to see details about the merge/reader/writer "context", eg for merging I'd like to see the OneMerge instance, for Reader/Writer maybe a SegmentInfo instance? {quote} I really don't like the idea of exploiting OneMerge to the Directory. We might introduce some other internface that holds enough metadata to get the info and let OneMerge implement this but we should not pass it down to the dir. Same is true for SI I don't want directory mess around with those classes. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041248#comment-13041248 ] Michael McCandless commented on LUCENE-2793: bq. I removed bufferSize from all the Directory implementations and made it part of IOContext. I think it shouldn't even be part of IOCtx? Ie this is private to the Dir impl? Another thing we have to do is add IOCtx into the cache key used in IndexWriter's ReaderPool inner class. Ie, a reader opened for merging cannot be later shared with a reader opened for searching. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041225#comment-13041225 ] Michael McCandless commented on LUCENE-2793: bq. Some kind of a copy-on-write builder pattern Why not use a normal struct/bean like class here? bq. I wonder if it makes sense to bind the IO Context instance to the directory and add a factory to the directory like Directory#getIOContext(Merge|Search|Index) to enable the directory impl to set platform specific defaults Hmm, but, if the IOCtxt is 'high level', and then details (bufferSize, seq/direct flags, etc.) are private to the dir impl, I think we (Lucene) can create the IOCtx and pass it down? Ie this is a one-way communication, I think (Lucene -> Dir impl). > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041224#comment-13041224 ] Michael McCandless commented on LUCENE-2793: Great to see your patch here Varun! I think we should start with only high level details in the IOContext? Ie, the Merge/Search(Reader?)/Writer is great, but I think low level stuff (bufferSize, sequential/direct) should stay private to the Dir impl? Ideally, I would also like to see details about the merge/reader/writer "context", eg for merging I'd like to see the OneMerge instance, for Reader/Writer maybe a SegmentInfo instance? This would then make Dir impls like NRTCachingDirectory (LUCENE-3092) "clean" (vs the sneaky ConcurrentMergeScheduler entangling it now must do), though, in that particular case we could accomplish this by only adding an estimatedSegmentSizeBytes to the IOCtx. We should remove the bufferSize that now plumbs all up and down the APIs, and replace it with IOCtx? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040906#comment-13040906 ] Simon Willnauer commented on LUCENE-2793: - bq. I already posted a patch to this issue a while back... Jason I appreciate that you are still around on this issue. Varun is doing his GSoC project on this issue and others so there might be some duplication here and there or similarities with you patch but in this case this is ok though. He needs to get started as well as getting a feeling how things work here. So I hope you don't mind. I can only encourage you to help reviewing and commenting! bq. If this is the right way to go ? Thanks for the patch man! Good to get started eventually. here are some comments for you patch: * When I look at IOContext I think it should be a little more sophisticated. What I have in mind is something similar to ScorerContext in org.apache.lucene.search.Weight.java. Some kind of a copy-on-write builder pattern that lets us provide some defaults for Merging, Searching and Indexing so by default we can reuse the same instance in many places. If we follow a this copy on write model we could also make this class non-final to let people customize the context if they needs to. * I am not sure if we really need the enumeration in IO Context unless me make decisions based on what an input / output is used for. IMO it might make more sense to have default instances for Searching Indexing and Merging and set the flags like SEQUENTIAL and the buffers to good defaults and only tweak them when we are aware of the context ie. when we pull the input / output. The most of the usecases need good defaults and only some need tweaks but if we hold the use-case information on IOContext some directories might want to be very smart and this might duplicate code. * I wonder if it makes sense to bind the IO Context instance to the directory and add a factory to the directory like Directory#getIOContext(Merge|Search|Index) to enable the directory impl to set platform specific defaults. I think that would make things easier and customization would be straight forward. * You should add a space after a comma in the arguments list like here: int bufferSize,IOContext context) * Enum elements should be CamelCase and start with a leading Uppercase character > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040850#comment-13040850 ] Varun Thacker commented on LUCENE-2793: --- I should have seen the patch! I have taken this up as a GSoC project. I'll try to use that patch too if it's ok. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040845#comment-13040845 ] Jason Rutherglen commented on LUCENE-2793: -- I already posted a patch to this issue a while back, https://issues.apache.org/jira/secure/attachment/12468030/LUCENE-2793.patch It seems we're looping here. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033279#comment-13033279 ] Earwin Burrfoot commented on LUCENE-2793: - As mentioned @LUCENE-3092, it would be nice not to include the OneMerge, but some meaningful value like 'expectedSize', 'expectedSegmentSize' or whatnot, that would work both for merges *and* flushes, and also won't introduce needless dependency on MergePolicy. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032992#comment-13032992 ] Michael McCandless commented on LUCENE-2793: For LUCENE-3092, it would be nice if the IOContext would optionally include the OneMerge if in fact the file is being opened for merging... this lets the Dir impl be smart if it wants to customize its behavior according to overall properties of the merge (eg total merged bytes). > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015790#comment-13015790 ] Simon Willnauer commented on LUCENE-2793: - bq. Hi. I would be interested in taking this up as a GSOC project . Are there any resource that I can read to understand the problem in depth ? Hey, I just marked it as GSoC-able :) so you are free to take it. I would also volunteer to mentor on this issue if mike doesn't want to take it though. Let me try to explain you quickly what this issue is about: Lucene uses Directory as an abstraction on top of the filesystem or RAM or any other storage to write the index data. Yet, Lucene has different "stages" where we have different requirements to the underlying storage / directory. When you index documents you eventually flush the index to disk and continue indexing until you created enough "segments" on disk that you need to merged them. This operations should if possible not pollute the FS cache since its really just housekeeping. With Java such its currently not possible to use some flags like DIRECT / SEQUENTIAL, this is what we have the native DirectIODirectory implementations in contrib for. Yet, for reading stuff from the index while searching we want to have the FS cache helping us as much as possible so this has again different requriements. What we currently do is that we pass different read buffer sizes to the directory to improve performance. All those kinds of information should be passed to the directory on a IndexInput / IndexOutput (similar to Input and OutputStream just tailored & enhanced for Lucene) basis and this is what this issue is about. hope that helps. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015785#comment-13015785 ] Varun Thacker commented on LUCENE-2793: --- Hi. I would be interested in taking this up as a GSOC project . Are there any resource that I can read to understand the problem in depth ? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980740#action_12980740 ] Robert Muir commented on LUCENE-2793: - {quote} I'm perfectly OK with that approach (having some module FSDir checks). I also feel uneasy having JNI in core. What I don't want to see, is Directory impls that you can't use on their own. If you can only use it for merging, then it's not a Directory, it breaks the contract! - move the code elsewhere. {quote} Right, i think we all agree we want to fix the DirectIOLinuxDirectory into being a 'real' directory? As i said before, from a practical perspective, it could be named LinuxDirectory, extend NIOFS, and when openInput(IOContext=Merge) it opens its special input. but personally i don't care how we actually implement it 'becoming a real directory'. this is another issue, unrelated to this one really. this issue is enough and should stand on its own... we should be able to do enough nice things here without dealing with JNI: improving our existing directory impls to use larger buffer sizes by default when merging, etc (like in your example). > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980736#action_12980736 ] Earwin Burrfoot commented on LUCENE-2793: - {quote} As I said before though, i wouldn't mind if we had something more like a 'modules/native' and FSDirectory checked, if this was available and automagically used it... but I can't see myself thinking that we should put this logic into fsdir itself, sorry. {quote} I'm perfectly OK with that approach (having some module FSDir checks). I also feel uneasy having JNI in core. What I don't want to see, is Directory impls that you can't use on their own. If you can only use it for merging, then it's not a Directory, it breaks the contract! - move the code elsewhere. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980732#action_12980732 ] Earwin Burrfoot commented on LUCENE-2793: - bq. Because in your example code above, it looks like it's added to Directory itself. bq. My problem with your sample code is that it appears that the .setBufferSize method is on Directory itself. Ohoho. My fault, sorry. It should look like: {code} RAMDirectory ramDir = new RAMDirectory(); ramDir.setBufferSize(whatever) // Compilation error! ramDir.createIndexInput(name, context); NIOFSDirectory fsDir = new NIOFSDirectory(); fsDir.setBufferSize(IOContext.NORMAL_READ, 1024); fsDir.setBufferSize(IOContext.MERGE, 4096); fsDir.createIndexInput(name, context) {code} > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980731#action_12980731 ] Robert Muir commented on LUCENE-2793: - bq. The proper way is to test out the things, and then move DirectIO code to the only place it makes sense in - FSDir? Probably make it switch on/off-able, maybe not. I'm not sure it should be there... at least not soon. its not even something you can implement in pure java? we definitely have to keep it still simple and possible for people to use the java library in a platform-indepedent way. its also a bit dangerous, whenever JNI is involvedeven if its working. So I think its craziness, to put this direct-io stuff in fsdirectory itself. As I said before though, i wouldn't mind if we had something more like a 'modules/native' and FSDirectory checked, if this was available and automagically used it... but I can't see myself thinking that we should put this logic into fsdir itself, sorry. bq. Sample code My problem with your sample code is that it appears that the .setBufferSize method is on Directory itself. Again i disagree with this because: * its useless to certain directories like MMapDirectory * its dangerous in the direct-io case (different platforms have strict requirements that things be sector-aligned etc, see the mac case where it actually 'works' if the buffer isnt, but is just slow). I definitely don't like the confusion regarding buffersizes now. A very small % of the time its actually meaningful and should be respected, but most of the time the value is completely bogus. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980727#action_12980727 ] Shai Erera commented on LUCENE-2793: I assume you mean setBufferSize(IOContext, size) should be added to specific Directory impls, and not Directory? Because in your example code above, it looks like it's added to Directory itself. Though we can add it to Directory as well, and do nothing there. It simplifies matters as you don't need to check whether the Dir you receive supports setting buffer size (in case you're not the one creating it). At any rate, this looks like it can work too. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980649#action_12980649 ] Earwin Burrfoot commented on LUCENE-2793: - What's with ongoing crazyness? :) bq. DirectIOLinuxDirectory First you introduce a kind of directory that is utterly useless except certain special situations. Then, instead of fixing the directory/folding its code somewhere normal, you try to workaround by switching between directories. What's the point of using abstract classes or interfaces, if you leak their implementation's logic all over the place? Or making DIOLD wrap something. Yeah! Wrap my RAMDir! bq. bufferSize This value is only meaningful to a certain subset of Directory implementations. So the only logical place we want to see this value set - is these very impls. Sample code: {code} Directory ramDir = new RAMDirectory(); ramDir.createIndexInput(name, context); // See, ma? No bufferSizes, they are pointless for RAMDir Directory fsDir = new NIOFSDirectory(); fsDir.setBufferSize(IOContext.NORMAL_READ, 1024); fsDir.setBufferSize(IOContext.MERGE, 4096); fsDir.createIndexInput(name, context) // See, ma? The only one who's really concerned with 'actual' buffer size is this concrete Directory impl // All client code is only concerned with the context. // It's NIOFSDirectory's business to give meaningful interpretation for IOContext and assign the buffer sizes. {code} You don't need custom Directory impls to make DIOLD work, you should freakin' fix it. The proper way is to test out the things, and then move DirectIO code to the only place it makes sense in - FSDir? Probably make it switch on/off-able, maybe not. You don't need custom Directory impls to set buffer sizes (neither cast to BufferedIndexInput!), you should add the setting to these Directories, which make sense of it. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980625#action_12980625 ] Shai Erera commented on LUCENE-2793: bq. No you dont... you can also call the .setBufferSize like the skiplist reader does. I'd still need to impl my Directory though right? That's the overkill I'm trying to avoid. But, I've been thinking about how would a merge/search code, which can only tell the Directory it's in SEARCH / MERGE context, get the buffer size the application wanted to use in that context. I don't think it has a way to do so without either using a static class, something we try to avoid, or propagating those settings down everywhere, which does not make sense either. So, Robert, I think you're right -- bufferSize should not exist on IOContext. A custom Directory impl seems unavoided. So I think it'd be good if we can create a BufferedDirectoryWrapper which wraps a Directory and offers a BufferedIndexInput/OutputWrapper which delegate the necessary calls to the wrapped Directory. We've found ourselves needing to implement that kind of Directory several times already, and it'd be good if Lucene can offer one. That Directory would then take IOContext -> bufferSize map/properties/whatever and can take that into account in openInput/createOutput. If users will need to impl Directory wrapping, if they want to control buffer sizes, I suggest we make that as painless as possible. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980564#action_12980564 ] Robert Muir commented on LUCENE-2793: - bq. But your code example tells me that if I want to control the buffer size used by Lucene (whether it's done intelligently or not today is what we try to fix here), I need to create my own Dir impl. That seems an overkill to me, for just controlling the bufferSize !? No you dont... you can also call the .setBufferSize like the skiplist reader does. {noformat} IndexInput input = dir.openInput() if (input instanceof BufferedIndexInput) ...setBufferSize(whatever_you_want); {noformat} Because after all, buffersize only makes sense on Buffered*. It makes no sense to be on Directory's openInput, e.g. it makes no sense for directories like MMapDirectory. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980561#action_12980561 ] Shai Erera commented on LUCENE-2793: Robert, nothing is too difficult to understand. But your code example tells me that if I want to control the buffer size used by Lucene (whether it's done intelligently or not today is what we try to fix here), I need to create my own Dir impl. That seems an overkill to me, for just controlling the bufferSize !? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980559#action_12980559 ] Shai Erera commented on LUCENE-2793: bq. This way the bufferSize is settable by the user, which I don't think is easily possible today (perhaps the main motivation for this issue?) Exactly ! That's what I thought of this issue in the first place - allow the app to more easily control the size of the buffers used by Lucene. That that IOContext will allow me to set different bufferSizes per the work that Lucene's doing (i.e. merge, search) is a bonus for me :). I still think bufferSize can be used as a hint by the specific Directory impl, but it should be there. If we want to allow control of different buffer sizes per operation, then maybe we can do the following: * Set a default buffer size on IOContext. * Add a setBufferSize(OpType type, int/long bufferSize) * Add getBufferSize(OpType) -- returns the set buffer size, or the default if not set. I don't think it's too complicated and gives enough control to the application? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980557#action_12980557 ] Robert Muir commented on LUCENE-2793: - bq. If we drop bufferSize, how will I be able to tell Lucene I'm willing to spare, say, 1 MB buffer for this IndexInput/Output? Am I supposed to create my own Directory and open II/IO, depending on the IOContext.Mode and decide on the buffer size then? with .setBufferSize. The only place in lucene that intelligently uses buffersize (Skiplist reading) sets it this way, not with the ctor param. otherwise, i think you are confused if you think lucene actually passes any intelligent value for this ctor parameter. it doesn't... it usually passes the static BufferedIndexInput.BUFFER_SIZE bq. If so, then I don't understand how this would work (think that's what Jason is asking) - if I impl MyDirectory (extending Directory? FSDirectory?) which is probably going to be a wrapper Directory, what II/IO impl should I invoke? I'll need to extend BufferedIndexInput/Output, impl its abstract methods, just for delegating to the wrapped Directory's II/IO? I don't understand whats so terribly difficult about: {noformat} if (context == MERGING) return mySpecialmergingInput(); else return super.openInput(...); {noformat} > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980555#action_12980555 ] Jason Rutherglen commented on LUCENE-2793: -- In the patch the bufferSize is an int member of IOContext. We could feasibly go to a map of attributes model if we think there'll be a lot more options in the future? Or the IOContext class can be subclassed. Also, in the patch the IOContext for searching and merging is set in IWC, then should be reused. This way the bufferSize is settable by the user, which I don't think is easily possible today (perhaps the main motivation for this issue?). > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980551#action_12980551 ] Shai Erera commented on LUCENE-2793: If we drop bufferSize, how will I be able to tell Lucene I'm willing to spare, say, 1 MB buffer for this IndexInput/Output? Am I supposed to create my own Directory and open II/IO, depending on the IOContext.Mode and decide on the buffer size then? If so, then I don't understand how this would work (think that's what Jason is asking) - if I impl MyDirectory (extending Directory? FSDirectory?) which is probably going to be a wrapper Directory, what II/IO impl should I invoke? I'll need to extend BufferedIndexInput/Output, impl its abstract methods, just for delegating to the wrapped Directory's II/IO? If I misunderstood the intentions, I'd appreciate if you can clarify them. But if not, I think IOContext should include a bufferSize hint. If the Directory does not intend to do anything special for 'merging', then it can take bufferSize into account. If however it's a Linux Directory that wants to set the O_DIRECT, then it can ignore bufferSize. But I think it's a useful hint. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980546#action_12980546 ] Jason Rutherglen commented on LUCENE-2793: -- bq. The directory is the factory for opening the correct openInput/createOutput based on its parameters, this just needs to be one. Right, however which Directory impl is going to do this? Today they're tied to their specific implementations of IndexInputs and IndexOutputs, eg, NIOFSIndexInput, DirectIOLinuxIndexInput, etc. In order to get an NIOFSIndexInput we need to instantiate NIOFSDirectory, however if we also want to use DirectIOLinuxDirectory, then (in today's model) we need to instantiate it as well. To create an index or output, all we need is a parent FSDIrectory and the rest is encapsulated in the underlying input/output implementation. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980543#action_12980543 ] Robert Muir commented on LUCENE-2793: - bq. That's how the patch works. If that doesn't look good, the other option is placing the logic in FSDirectory and removing the FSDirectory subclasses? We don't need to make any changes to Directory itself here, other than openInput/createOutput taking IOContext instead of bufferSize. The directory is the factory for opening the correct openInput/createOutput based on its parameters, this just needs to be one. As Mike said, the important thing is to not re-use an input that was opened with IOContext=Merging for search. If you ask the directory to openInput with IOContext=merging, thats all it should be used for, because that Directory could be implementing that with O_DIRECT or similar, which is slow for searching. we can muck with DirectIOLinuxDirectory later on another issue (e.g. make it a directory wrapper that just overrides openInput and only does something interesting when IOContext=Merging, or NIOFSDirectory subclass, or whatever we end up deciding). > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980533#action_12980533 ] Jason Rutherglen commented on LUCENE-2793: -- Ok, lets remove bufferSize. bq. its the directories job to then take that IOContext and create an appropriate IndexOutput. That's how the patch works. If that doesn't look good, the other option is placing the logic in FSDirectory and removing the FSDirectory subclasses? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980470#action_12980470 ] Robert Muir commented on LUCENE-2793: - bq. In fact, I suggest dropping bufferSize altogether. +1 https://issues.apache.org/jira/browse/LUCENE-2793?focusedCommentId=12966963&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12966963 bq. So my bet is - introduce IOContext as a simple Enum, change bufferSize parameter on createInput/Output to IOContext, done. I agree, its the directories job to then take that IOContext and create an appropriate IndexOutput. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980458#action_12980458 ] Earwin Burrfoot commented on LUCENE-2793: - In fact, I suggest dropping bufferSize altogether. As far as I can recall, it was introduced as a precursor to IOContext and can now be safely replaced. Even if we want to give user control over buffer size for all streams, or only those opened in specific IOContext, he can pass these numbers as config parameters to his Directory impl. That makes total sense, as: 1. IndexWriter/IndexReader couldn't care less about buffer sizes, it just passes them to the Directory. It's not their concern. 2. A bunch of Directories doesn't use said bufferSize at all, making this parameter not only private Directory affairs, but even further - implementation-specific. So my bet is - introduce IOContext as a simple Enum, change bufferSize parameter on createInput/Output to IOContext, done. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980454#action_12980454 ] Earwin Burrfoot commented on LUCENE-2793: - {quote} bq. You get IOFactory from Directory That's for the default, the main use is the static IOFactory class. {quote} You lost me here. If you got A from B, you don't have to pass B again to invoke A, if you do - that's 99% a design mistake. But still, my point was that you don't need IOFactory at all. bq. Right, however we're basically trying to intermix Directory's, which doesn't work when pointed at the same underlying File. I thought about a meta-Directory that routes based on the IOContext, however we'd still need a way to create an IndexInput and IndexOutput, from different Directory implementations. What Directories are you trying to intermix? What for? I thought the only thing done in that issue is an attempt to give Directory hints as to why we're going to open its streams. A simple enum IOContext and extra parameter on createOutput/Input would suffice. But with Lucene's micromanagement attitude, an enum turns into slightly more complex thing, with bufferSizes and whatnot. Still - no need for mixing Directories. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980435#action_12980435 ] Jason Rutherglen commented on LUCENE-2793: -- bq. You get IOFactory from Directory That's for the default, the main use is the static IOFactory class. bq. we already have an IOFactory, it's called Directory Right, however we're basically trying to intermix Directory's, which doesn't work when pointed at the same underlying File. I thought about a meta-Directory that routes based on the IOContext, however we'd still need a way to create an IndexInput and IndexOutput, from different Directory implementations. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980400#action_12980400 ] Earwin Burrfoot commented on LUCENE-2793: - Looks crazy. In a -bad- tangled way. You get IOFactory from Directory, put into IOContext, and then invoke it, passing it (wow!) an IOContext and a Directory. What if you pass totally different Directory? Different IOContext? It blows up eerily. And there's no justification for this - we already have an IOFactory, it's called Directory! It just needs an extra parameter on its factory methods (createInput/Output), that's all. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > Attachments: LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979740#action_12979740 ] Jason Rutherglen commented on LUCENE-2793: -- bq. Basically the IOContext needs to become part of the cache key uses in IW's ReaderPool? Great, I'll implement this. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979738#action_12979738 ] Michael McCandless commented on LUCENE-2793: Sorry, I think we should also do that as part of this issue. Basically the IOContext needs to become part of the cache key uses in IW's ReaderPool? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979736#action_12979736 ] Jason Rutherglen commented on LUCENE-2793: -- {quote}this issue only adds the IOContext, threading it down to when you open an input / create an output{quote} Does this mean we're not implementing this part? {quote}This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different - we could in theory share del docs, norms, etc, if that were somehow possible.{quote} > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979731#action_12979731 ] Michael McCandless commented on LUCENE-2793: Yes please! But, this issue only adds the IOContext, threading it down to when you open an input / create an output. That context should hold enough "information" to allow the Dir impl to make decisions like buffer sizes and avoiding buffer cache, etc. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979728#action_12979728 ] Jason Rutherglen commented on LUCENE-2793: -- Shall I take this one? With the plan being to add config options to IWC so that IW uses the DirectIOLinuxDirectory (and it's variants) only for merging? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970530#action_12970530 ] Jason Rutherglen commented on LUCENE-2793: -- Perhaps we can add the ability to throttle Directory level IO to this issue. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966963#action_12966963 ] Robert Muir commented on LUCENE-2793: - There is another problem we should solve here, and that is the buffersize problem. This is totally broken at the moment for custom directories, here's an example. I wanted to set the buffersize by default to 4096 (since i measured this is like a 20% improvement for my directory impl). looking at the apis you would think that you simply override the openInput that takes no buffer size like this: {noformat} @Override public IndexInput openInput(String name) throws IOException { return openInput(name, 4096); } {noformat} unfortunately this doesnt work at all! instead you have to do something like this for it to actually "work": {noformat} @Override public IndexInput openInput(String name, int bufferSize) throws IOException { ensureOpen(); return new IndexInput(name, Math.max(bufferSize, 4096)); } {noformat} The problem is, throughout lucene's APIs, the directory's "default" is never used, instead the static BufferedIndexInput.BUFFER_SIZE is used everywhere... eg SegmentReader.get: {noformat} public static SegmentReader get(boolean readOnly, SegmentInfo si, int termInfosIndexDivisor) throws CorruptIndexException, IOException { return get(readOnly, si.dir, si, BufferedIndexInput.BUFFER_SIZE, true, termInfosIndexDivisor); } {noformat} So I think lucene's apis should never specify buffersize, we should remove it completely from the codecs api, and it should be *replaced* with IOContext. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966566#action_12966566 ] Uwe Schindler commented on LUCENE-2793: --- And MMapDir should return a simple/optimized direct FSIndexInput also. Because MMapping for merging is just wasted resources/address space. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966564#action_12966564 ] Robert Muir commented on LUCENE-2793: - Sounds great, at least in theory we could do the DIRECT thing on windows too for this case. and we can certainly give the sequential hint. in general hints like this are probably good for custom dir impls too, we never know what they are doing but its good to somehow inform them of our intentions so they have a chance to be optimal. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Michael McCandless > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org