depending on the build of the document, but I guess not, I had to write my own XML parser, you get better results when you customize something like that to your needs.
-----Original Message----- From: Chris Sibert [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 12, 2002 10:27 AM To: Lucene Users List Subject: Creating indexes I have a big ( 40 MB or so) file to index. The file contains a whole bunch of documents, which are each pretty small, about a few typewritten pages long. There's a title, date, and author for each document, in addition to the documents' actual text. I'm not quite sure how you index this in Lucene. For each document in the original file, I assume that I create a separate Lucene Document object in the index with author, date, title, and text fields. If so, my question is that when I'm reading in the original file for indexing, does Lucene know where each document begins and ends in the original file ? Or do I have to write a parser or filter or something for the InputStream that's reading the file ? Chris Sibert -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>