[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966963#action_12966963
]
Robert Muir commented on LUCENE-2793:
-------------------------------------
There is another problem we should solve here, and that is the buffersize
problem.
This is totally broken at the moment for custom directories, here's an example.
I wanted to set the buffersize by default to 4096 (since i measured this is
like a 20% improvement for my directory impl).
looking at the apis you would think that you simply override the openInput that
takes no buffer size like this:
{noformat}
@Override
public IndexInput openInput(String name) throws IOException {
return openInput(name, 4096);
}
{noformat}
unfortunately this doesnt work at all! instead you have to do something like
this for it to actually "work":
{noformat}
@Override
public IndexInput openInput(String name, int bufferSize) throws IOException {
ensureOpen();
return new IndexInput(name, Math.max(bufferSize, 4096));
}
{noformat}
The problem is, throughout lucene's APIs, the directory's "default" is never
used, instead the static BufferedIndexInput.BUFFER_SIZE is used everywhere...
eg SegmentReader.get:
{noformat}
public static SegmentReader get(boolean readOnly, SegmentInfo si, int
termInfosIndexDivisor) throws CorruptIndexException, IOException {
return get(readOnly, si.dir, si, BufferedIndexInput.BUFFER_SIZE, true,
termInfosIndexDivisor);
}
{noformat}
So I think lucene's apis should never specify buffersize, we should remove it
completely from the codecs api, and it should be *replaced* with IOContext.
> Directory createOutput and openInput should take an IOContext
> -------------------------------------------------------------
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Store
> Reporter: Michael McCandless
>
> Today for merging we pass down a larger readBufferSize than for searching
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold
> the buffer size, but then could hold other flags like DIRECT (bypass OS's
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for
> merging is not then used for searching, and vice/versa. Really, it's only
> all the open file handles that need to be different -- we could in theory
> share del docs, norms, etc, if that were somehow possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]