Hi, a discussion in
http://issues.apache.org/jira/browse/LUCENE-196 might be of interest to you. Did you think about storing the large pieces of documents in a database to reduce the size of Lucene index? I think there are good reasons to adding support for storing fields in separate files: 1. One could define a binary field of fixed length and store it in a separate file. Then load it into memory and have fast access for field contents. A use case might be: store calendar date (YYYY-MM-DD) in three bytes, 4 bits for months, 5 bits for days and up to 15 bits for years. If you want to retrieve hits sorted by date you can load the fields file of size (3 * documents in index) bytes and support sorting by date without accessing hard drive for reading dates. 2. One could store document contents in a separate file and fields of small size like title and some metadata in the way it is stored now. It could speed up access to fields. It would be interesting to know whether you gain significant perfomance leaving the big chunks out, i.e. not storing them in index. In my opinion 1. is the most interesting case: storing some binary fields (dates, prices, length, any numeric metrics of documents) would enable *really* fast sorting of hits. Any thoughts about this? Regards, Robert We have a similiar problem Am Dienstag, 15. November 2005 23:23 schrieb Karel Tejnora: > Hi all, > in our testing application using lucene 1.4.3. Thanks you guys for > that great job. > We have index file around 12GiB, one file (merged). To retrieve hits it > takes nice small amount of the time, but reading fields takes 10-100 > times more (the stored ones). I think because all the fields are read. > I would like to try implement lucene index files as tables in db with > some lazy fields loading. As I have searched web I have found only impl. > of the store.Directory (bdb), but it only holds data as binary streams. > This technique will be not so helpful because BLOB operations are not > fast performing. On another side I will have a lack of the freedom from > documents fields variability but I can omit a lot of the skipping and > many opened files. Also IndexWriter can have document/term locking > granuality. > So I think that way leads to extends IndexWriter / IndexReader and have > own implementation of index.Segment* classes. It is the best way or I > missing smthg how achieve this? > If it is bad idea, I will be happy to heard another possibilities. > > I would like also join development of the lucene. Is there some points > how to start? > > Thx for reading this, > sorry if I did some mistakes > > Karel > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]