Re: Using Lucene to store document
Hi, When the Index is read to memory for searching, which data in the segment/ index will be load ? I mean all the indexed fields/ terms ? Is the stored field loaded ? thanks, Otis Gospodnetic [EMAIL PROTECTED] wrote:Hello, HEAD version means that you should check out Lucene straight out of CVS. How to work with CVS is another story, probably described somewhere on jakarta.apache.org site. Otis --- Nhan Nguyen Dang wrote: Hi Otis, Please let me know what HEAD version of Lucene is? Actually, I'm consider the advantages of storing document using Lucene Stored field - For my Search engine. I've tested with thousands of documents and see that retrieve document (in this case XML file) with Lucene is a little bit faster than using FS. But I cannot test with a large number of data to hava an accurate comparision. So whether Lucene can support millions of document, still balance and retrieve the with approriate speed. Nhan - FREE Spam Protection! Click Here. SpamExtract Blocks Spam. - Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com
Re: Using Lucene to store document
Not all data in the index is loaded all at once. I believe the .tii file (if you are using multifile index format) is loaded into RAM, maybe some other small ones, but the rest is read off the disk as it's needed, depending on the terms used in the search. Otis --- Nhan Nguyen Dang [EMAIL PROTECTED] wrote: Hi, When the Index is read to memory for searching, which data in the segment/ index will be load ? I mean all the indexed fields/ terms ? Is the stored field loaded ? thanks, Otis Gospodnetic [EMAIL PROTECTED] wrote:Hello, HEAD version means that you should check out Lucene straight out of CVS. How to work with CVS is another story, probably described somewhere on jakarta.apache.org site. Otis --- Nhan Nguyen Dang wrote: Hi Otis, Please let me know what HEAD version of Lucene is? Actually, I'm consider the advantages of storing document using Lucene Stored field - For my Search engine. I've tested with thousands of documents and see that retrieve document (in this case XML file) with Lucene is a little bit faster than using FS. But I cannot test with a large number of data to hava an accurate comparision. So whether Lucene can support millions of document, still balance and retrieve the with approriate speed. Nhan - FREE Spam Protection! Click Here. SpamExtract Blocks Spam. - Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using Lucene to store document
Hi Otis, Please let me know what HEAD version of Lucene is? Actually, I'm consider the advantages of storing document using Lucene Stored field - For my Search engine. I've tested with thousands of documents and see that retrieve document (in this case XML file) with Lucene is a little bit faster than using FS. But I cannot test with a large number of data to hava an accurate comparision. So whether Lucene can support millions of document, still balance and retrieve the with approriate speed. Nhan - FREE Spam Protection! Click Here. SpamExtract Blocks Spam. - Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com
Re: Using Lucene to store document
Hello, HEAD version means that you should check out Lucene straight out of CVS. How to work with CVS is another story, probably described somewhere on jakarta.apache.org site. Otis --- Nhan Nguyen Dang [EMAIL PROTECTED] wrote: Hi Otis, Please let me know what HEAD version of Lucene is? Actually, I'm consider the advantages of storing document using Lucene Stored field - For my Search engine. I've tested with thousands of documents and see that retrieve document (in this case XML file) with Lucene is a little bit faster than using FS. But I cannot test with a large number of data to hava an accurate comparision. So whether Lucene can support millions of document, still balance and retrieve the with approriate speed. Nhan - FREE Spam Protection! Click Here. SpamExtract Blocks Spam. - Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Using Lucene to store document
Hi all, I'm using Lucene to index XML document/ file (may be millions of documents in future, each about 5-10KB) Beside the index for searching, I want to use Lucene to store whole document content with UnIndexed fields -content field(instead of store each document in a XML file). All the document content will be stored on a separate index. Each time I want to get access to a document, I will let Lucene retrieve it. I am consider this issue with another one Use file system to store document content in separate XML document means, 400K document ill be stored in 400K XML file in file system. Purpose of this is that I can access each document rapidly. Can any body who has experience with this problem before give me advise which method is suitable ? Is this better to collect all documents to an Lucene index or store them separately in file system ? Thanks, Dang Nhan - Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com
Re: Using Lucene to store document
It is difficult to give a general answer. You can certainly store the whole XML in the Lucene index, just don't tokenize it. The HEAD version of Lucene even has some compression that you may find handy. On the other hand, storing XML in the FS would allow you to store XML files wherever you wanted, even on separate disk(s). If these are lots of parallel searches/reads, this can be handy. If you want to be able to see XML files without going through the index, this can also be handy. So, it depends on how you like it, but both approaches are doable. Otis --- Nhan Nguyen Dang [EMAIL PROTECTED] wrote: Hi all, I'm using Lucene to index XML document/ file (may be millions of documents in future, each about 5-10KB) Beside the index for searching, I want to use Lucene to store whole document content with UnIndexed fields -content field(instead of store each document in a XML file). All the document content will be stored on a separate index. Each time I want to get access to a document, I will let Lucene retrieve it. I am consider this issue with another one Use file system to store document content in separate XML document means, 400K document ill be stored in 400K XML file in file system. Purpose of this is that I can access each document rapidly. Can any body who has experience with this problem before give me advise which method is suitable ? Is this better to collect all documents to an Lucene index or store them separately in file system ? Thanks, Dang Nhan - Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]