OK, I'm finally catching on. You have to change the demo code to get the contents into something besides an input stream, so you can use one of the alternate forms of the Field constructor. For instance, you could read it all into a string and use the form:
doc.add(new Field("content", <string with all the file contents in it>, Field.Store.YES, Field.Index.TOKENIZED)) Or, you can do something like this, which produces identical results to the above while (more text to read) { String line = read a line of text from the file doc.add(new Field("content", line, Field.Store.YES, Field.Index.TOKENIZED)) } You can add to the same field as often as you want and it just appends the content of calls 2 to N to the same field. Best Erick On Wed, Jul 23, 2008 at 3:42 AM, starz10de <[EMAIL PROTECTED]> wrote: > > Hi Erik, > > I don't remove the stop words, as I index parallel corpora which is used > for learning the translations between pair of languages. so every word is > important. I even develop my own analyzer for Arabic which is just remove > punctuations and special symbols and it return only Arabic text. > > I guess in the FileDocument.java the whole text is already stored > > doc.add(Field.Text("contents", IN)); > > where IN is > > IN = new BufferedReader(new InputStreamReader(new FileInputStream(f)) > > if this is not the case yould you please how to store the whole text inside > the index ? > > I am new to lucene and I don't know how to use this "Field.Store.YES" to > store whole text. > > > > Best regards > Farag > > > > starz10de wrote: > > > > Could any one tell me please how to print the content of the document > > after reading the index. > > for example if i like to print the index terms then i do : > > > > IndexReader ir = IndexReader.open(index); > > TermEnum termEnum = ir.terms(); > > while (termEnum.next()) { > > TermDocs dok = ir.termDocs(); > > dok.seek(termEnum); > > while (dok.next()) { > > System.out.println(termEnum.term().text().trim()); > > } > > > > I can print the text files before indexing them, but because of encoding > > issues i like to print them from the index. > > As i know the content of the document(whole text) is also stored in the > > index, my question how to print this content. > > > > so at the end i will print the path of the current document , index terms > > and the content of the document > > > > > > thanks in advance > > > > -- > View this message in context: > http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18605547.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >