Le Dimanche 11 Mars 2007 22:41, Michael Busch a écrit : > Hi Grant, > > I certainly agree that it would be great if we could make some progress > and commit the payloads patch soon. I think it is quite independent from > FI. FI will introduce different posting formats (see Wiki: > http://wiki.apache.org/lucene-java/FlexibleIndexing). Payloads will be > part of some of those formats, but not all (i. e. per-position payloads > only make sense if positions are stored). > > The only concern some people had was about the API the patch introduces. > It extends Token and TermPositions. Doug's argument was, that if we > introduce new APIs now but want to change them with FI, then it will be > hard to support those APIs. I think that is a valid point, but at the > same time it slows down progress to have to plan ahead in too many > directions. That's why I'd vote for marking the new APIs as experimental > so that people can try them out at own risk. > If we could agree on that approach then I'd go ahead and submit an > updated payloads patch in the next days, that applies cleanly on the > current trunk and contains the additional warnings in the javadocs. > > > In regard of FI and 662 however I really believe we should split it up > and plan ahead (in a way I mentioned already), so that we have more > isolated patches. It is really great that we have 662 already (Nicolas, > thank you so much for your hard work, I hope you'll keep working with us > on FI!!). We'll probably use some of that code, and it will definitely > be helpful.
thanks ! :) About the code split you are talking about, I definitively agree. Here is what will contain the three parts : 1) index format concept : - there is an interface defining it, just for now handling the filename extensions. - modify the directory abstract class and the implementations to be the container of the index format. - modify the SegmentInfos class to do some check about the opened index format and the index format defined in the Directory class. - modify the writer to make it check format conflits while adding raw indexes 2) extensibility of the store reader/writer : - add to the previous interface some new entry points : a FieldsReader and a FieldsWriter. - split the current FieldsReader and FieldsWriter in two parts : the part which will be still handled by Lucene, and the extendable ones which will be instanciated by a DefaultIndexFormat. - split the implementation of Field in two parts : the Field and a FieldData, so the user will be able to define his custom field-data java object. 3) New: extensibility of the posting reader/writer this is just a draft for now, but here is what was done : - move Posting from a inner class to a public class - make TermInfo handling a pool of "pointers" : the default implementation has two, the frq one and the prx one. - extract the posting writing from DocumentWriter into a DefaultPostingWriter. I can provide a patch for the first step. cheers, Nicolas > > Michael > > Grant Ingersoll wrote: > > Hi Michael, > > > > This is very good. I know 662 is different, just wasn't sure if > > Nicolas patch was meant to be applied after 662, b/c I know we had > > discussed this before. > > > > I do agree with you about planning this out, but I also know that > > patches seem to motivate people the best and provide a certain > > concreteness to it all. I mostly started asking questions on these > > two issues b/c I wanted to spur some more discussion and see if we can > > get people motivated to move on it. > > > > I was hoping that I would be able to apply each patch to two different > > checkouts so I could start seeing where the overlap is and how they > > could fit together (I also admit I was procrastinating on my ApacheCon > > talk...). In the new, flexible world, the payloads implementation > > could be a separate implementation of the indexing or it could be part > > of the core/existing file format implementation. Sometimes I just > > need to get my hands on the code to get a real feel for what I feel is > > the best way to do it. > > > > I agree about the XML storage for Index information. We do that in > > our in-house wrapper around Lucene, storing info about the language, > > analyzer used, etc. We may also want a binary index-level storage > > capability. I know most people just create a single document usually > > to store binary info about the index, but an binary storage might be > > good too. > > > > Part of me says to apply the Payloads patch now, as it provides a lot > > of bang for the buck and I think the FI is going to take a lot longer > > to hash out. However, I know that it may pin us in or force us to > > change things for FI. Ultimately, I would love to see both these > > features for the next release, but that isn't a requirement. Also, on > > FI, I would love to see two different implementations of whatever API > > we choose before releasing it, as I always find two implementations of > > an Interface really work out the API details. > > > > -Grant > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- Nicolas LALEVÉE Solutions & Technologies ANYWARE TECHNOLOGIES Tel : +33 (0)5 61 00 52 90 Fax : +33 (0)5 61 00 51 46 http://www.anyware-tech.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]