Le Jeudi 3 Août 2006 21:49, Marvin Humphrey a écrit : > On Jul 31, 2006, at 8:25 AM, Nicolas Lalevée wrote: > > That looks good, but there is one restriction : it have to be per > > document. > > Yes, what I laid out was per-document - for each document, the fdx > file would keep a file pointer and an integer mapping to a codec. > > > In fact I was thinking about a more generic version that will allow > > the format > > compatibility, keeping .fdx as is : > > > > FieldData (.fdt) --> <DocFieldData>SegSize > > DocFieldData --> FieldCount, <FieldNum, RawData>FieldCount > > > > And a default FieldsDataWriter will be the actual one, it will read > > the > > RawData as Bits, Value, with Value --> String | BinaryValue,.... > > Then, for my app, I will provide some custom FieldsDataWriter that > > will do > > exactly what I want. > > OK, that's quite similar, but with the info specifying how to > deserialize the document stored in fdt rather than fdx.
In fact, you're not obliged to put a "codec" thing. If in your app your data will always have the same form, then you just put the data and no codec info. For my use case, I would skipped the bits about compressed/binary, and I will only put what I want : a pointer to a type, a pointer to a lang, and the value. One important note about this design is that the index would only be read by my custom reader and write by my custom writter. > However, I > don't think what you're describing makes the field storage in Lucene > arbitrarily extensible, since you're just going to override > FieldsWriter/FieldsReader rather than modify them so that they can > use arbitrary codecs. If you override FieldsWriter/FieldsReader, then you can put the writing/reading code you want, so you implement an arbitrary codec. > I think what I want to do is turn Lucene into an Object-Oriented > Database, or at least have Lucene adopt some characteristics of an > ODBMS. However, I haven't used a real ODBMS and I'm not up on the > theory, so I can't say for sure. I've been doing a little reading > here and there on object databases, but I've been extraordinarily > busy the last few weeks and haven't been able to study it in depth. > > The main point is this: > > Lucene users have diverse needs for what gets stored in the document/ > field storage. We've been meeting those needs by assigning more and > more bit flags. That can't continue that ad infinitum. However, we > *can* meet everyone's needs by applying a variant of the "Replace > Conditionals With Polymorphism" refactoring technique... > > http://xrl.us/p3kn (Link to www.eli.sdsu.edu) > > Think of those bit flags as an if-else chain. Instead of all those > conditionals describing all the attributes of the Lucene Document you > want to store at that file pointer, we allow you to put whatever kind > of serialized object you desire there. Maybe it's a Lucene > Document. Maybe it's a FrechDocument. Maybe it's a > RussianDocument. Maybe it's a wrapped-up jpg. You choose. > > Instead of continually adding to the complexity of the > deserialization algorithm, we we make that deserialization algorithm > user-definable. In fact, this is exactly my point. :-) If people thinks it is interesting, I can try to do a prototype. cheers, Nicolas --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
