[jira] Issue Comment Edited: (LUCENE-2126) Split up IndexInput and IndexOutput into DataInput and DataOutput

Michael Busch (JIRA) Sat, 12 Dec 2009 17:22:43 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789834#action_12789834
 ]


Michael Busch edited comment on LUCENE-2126 at 12/13/09 1:22 AM:
-----------------------------------------------------------------

I disagree with you here: introducing DataInput/Output makes IMO the API 
actually easier for the "normal" user to understand.

I would think that most users don't implement IndexInput/Output extensions, but 
simply use the out-of-the-box Directory implementations, which provide 
IndexInput/Output impls. Also, most users probably don't even call the 
IndexInput/Output APIs directly. 

{quote}
Do nothing and assume that the sort of advanced user who writes a posting
codec won't do something incredibly stupid like call indexInput.close().
{quote}

Writing a posting code is much more advanced compared to using 2125's features. 
Ideally, a user who simply wants to store some specific information in the 
posting list, such as a boost, a part-of-speech identifier, another VInt, etc. 
should with 2125 only have to implement a new attribute including the 
serialize()/deserialize() methods. People who want to do that don't need to 
know anything about Lucene's API layer. They only need to know the APIs that 
DataInput/Output provide and will not get confused with methods like seek() or 
close(). For the standard user who only wants to write such an attribute it 
should not matter how Lucene's IO structure looks like - so even if we make 
changes that go into Lucy's direction in the future (IndexInput/Output owning a 
filehandle vs. the need to extend them) the serialize()/deserialize() methods 
of attribute would still work with DataInput/Output.

I bet that a lot of people who used the payload feature before took a 
ByteArrayOutputStream together with DataOutputStream (which implements Java's 
DataOutput) to populate the payload byte array. With 2125 Lucene will provide 
an API that is similar to use, but more efficient as it remove the byte[] array 
indirection and overhead.

I'm still +1 for this change. Others?

      was (Author: michaelbusch):
    I disagree with you here: introducing DataInput/Output makes IMO the API 
actually easier for the "normal" user to understand.

I would think that most users don't implement IndexInput/Output extensions, but 
simply use the out-of-the-box Directory implementations, which provide 
IndexInput/Output impls. Also, most users probably don't even call the 
IndexInput/Output APIs directly. 

{quote}
Do nothing and assume that the sort of advanced user who writes a posting
codec won't do something incredibly stupid like call indexInput.close().
{quote}

Writing a posting code is much more advanced compared to using 2125's features. 
Ideally, a user who simply wants to store some specific information in the 
posting list, such as a boost, a part-of-speech identifier, another VInt, etc. 
should with 2125 only have to implement a new attribute including the 
serialize()/deserialize() methods. People who want to do that don't need to 
know anything about Lucene's API layer. They only need to know the APIs that 
DataInput/Output provide and will not get confused with methods like seek() or 
close(). For the standard user who only wants to write such an attribute it 
should not matter how Lucene's IO structure looks like - so even if we make 
changes that go into Lucy's direction in the future (IndexInput/Output owning a 
filehandling vs. the need to extend them) the serialize()/deserialize() methods 
of attribute would still work with DataInput/Output.

I bet that a lot of people who used the payload feature before took a 
ByteArrayOutputStream together with DataOutputStream (which implements Java's 
DataOutput) to populate the payload byte array. With 2125 Lucene will provide 
an API that is similar to use, but more efficient as it remove the byte[] array 
indirection and overhead.

I'm still +1 for this change. Others?
  
> Split up IndexInput and IndexOutput into DataInput and DataOutput
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2126
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2126
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Flex Branch
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: lucene-2126.patch
>
>
> I'd like to introduce the two new classes DataInput and DataOutput
> that contain all methods from IndexInput and IndexOutput that actually
> decode or encode data, such as readByte()/writeByte(),
> readVInt()/writeVInt().
> Methods like getFilePointer(), seek(), close(), etc., which are not
> related to data encoding, but to files as input/output source stay in
> IndexInput/IndexOutput.
> This patch also changes ByteSliceReader/ByteSliceWriter to extend
> DataInput/DataOutput. Previously ByteSliceReader implemented the
> methods that stay in IndexInput by throwing RuntimeExceptions.
> See also LUCENE-2125.
> All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Issue Comment Edited: (LUCENE-2126) Split up IndexInput and IndexOutput into DataInput and DataOutput

Reply via email to