[jira] Commented: (LUCENE-2126) Split up IndexInput and IndexOutput into DataInput and DataOutput

Marvin Humphrey (JIRA) Wed, 09 Dec 2009 06:09:44 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788098#action_12788098
 ]


Marvin Humphrey commented on LUCENE-2126:
-----------------------------------------

> These methods should only be able to call the read/write methods (which this
> issue moves to DataInput/Output), but not methods like close(), seek() etc..

Ah, so that's what it is.  

In that case, let me vote my (non-binding) -1.  I don't believe that the
enforcement of such a restriction justifies the complexity cost of adding a
new class to the public API.

First, adding yet another class to the hierarchy steepens the learning curve
for users and contributors.  If you aren't in the rarefied echelon of
exceptional brilliance occupied by people named Michael who work for IBM :),
the gradual accumulation of complexity in the Lucene code base matters.  Inch
by inch, things move out of reach.

Second, changing things now for what seems to me like a minor reason makes it
harder to refactor the class hierarchy in the future when other, more
important reasons are inevitably discovered.

For LUCENE-2125, I recommend two possible options. 

  * Do nothing and assume that the sort of advanced user who writes a posting 
    codec won't do something incredibly stupid like call indexInput.close().
  * Add a note to the docs for writing posting codecs indicating which sort of
    of IO methods you ought not to call.

> once we see a need to allow users to extend DataInput/Output outside of
> Lucene we can go ahead and make the additional changes that are mentioned in
> your in my comments here.

In Lucy, there are three tiers of IO usage:

    * For low-level IO, use FileHandle.
    * For most applications, use InStream's encoder/decoder methods.
    * For performance-critical inner-loop material (e.g. posting decoders, 
      SortCollector), access the raw memory-mapped IO buffer using
      InStream_Buf()/InStream_Advance_Buf() and use static inline functions 
      such as NumUtil_decode_c32 (which does no bounds checking) from
      Lucy::Util::NumberUtils.

While you can extend InStream to add a codec, that's not generally the best
way to go about it, because adding a method to InStream requires that all of
your users both use your InStream class and use a subclassed Folder which
overrides the Folder_Open_In() factory method (analogous to 
Directory.openInput()).  Better is to use the extension point provided by
InStream_Buf()/InStream_Advance_Buf() and write a utility function which
accepts an InStream as an argument.

I don't expect and am not advocating that Lucene adopt the same IO hierarchy
as Lucy, but I wanted to provide an example of other reasons why you might
change things.  (What I'd really like to see is for Lucene to come up with
something *better* than the Lucy IO hierarchy.)  

One of the reasons Lucene has so many backwards compatibility headaches is
because the public APIs are so extensive and thus constitute such an elaborate
set of backwards compatibility promises.  IMO, DataInput and DataOutput do 
not offer sufficient benefit to compensate for the increased intricacy they add 
to that backwards compatibility contract.


> Split up IndexInput and IndexOutput into DataInput and DataOutput
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2126
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2126
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Flex Branch
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: lucene-2126.patch
>
>
> I'd like to introduce the two new classes DataInput and DataOutput
> that contain all methods from IndexInput and IndexOutput that actually
> decode or encode data, such as readByte()/writeByte(),
> readVInt()/writeVInt().
> Methods like getFilePointer(), seek(), close(), etc., which are not
> related to data encoding, but to files as input/output source stay in
> IndexInput/IndexOutput.
> This patch also changes ByteSliceReader/ByteSliceWriter to extend
> DataInput/DataOutput. Previously ByteSliceReader implemented the
> methods that stay in IndexInput by throwing RuntimeExceptions.
> See also LUCENE-2125.
> All tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2126) Split up IndexInput and IndexOutput into DataInput and DataOutput

Reply via email to