Hi Erik, I don't remove the stop words, as I index parallel corpora which is used for learning the translations between pair of languages. so every word is important. I even develop my own analyzer for Arabic which is just remove punctuations and special symbols and it return only Arabic text.
I guess in the FileDocument.java the whole text is already stored doc.add(Field.Text("contents", IN)); where IN is IN = new BufferedReader(new InputStreamReader(new FileInputStream(f)) if this is not the case yould you please how to store the whole text inside the index ? I am new to lucene and I don't know how to use this "Field.Store.YES" to store whole text. Best regards Farag starz10de wrote: > > Could any one tell me please how to print the content of the document > after reading the index. > for example if i like to print the index terms then i do : > > IndexReader ir = IndexReader.open(index); > TermEnum termEnum = ir.terms(); > while (termEnum.next()) { > TermDocs dok = ir.termDocs(); > dok.seek(termEnum); > while (dok.next()) { > System.out.println(termEnum.term().text().trim()); > } > > I can print the text files before indexing them, but because of encoding > issues i like to print them from the index. > As i know the content of the document(whole text) is also stored in the > index, my question how to print this content. > > so at the end i will print the path of the current document , index terms > and the content of the document > > > thanks in advance > -- View this message in context: http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18605547.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]