Dear Erick , Thnaks for your answer, I tryed other way , where I read the text files before i index them. I will try also your solution here.
best regards Erick Erickson wrote: > > OK, I'm finally catching on. You have to change the demo code to > get the contents into something besides an input stream, so you > can use one of the alternate forms of the Field constructor. For > instance, you could read it all into a string and use the form: > > doc.add(new Field("content", <string with all the file contents in it>, > Field.Store.YES, Field.Index.TOKENIZED)) > > > Or, you can do something like this, which produces identical results > to the above > > while (more text to read) { > String line = read a line of text from the file > doc.add(new Field("content", line, Field.Store.YES, > Field.Index.TOKENIZED)) > } > > You can add to the same field as often as you want and it just appends the > content of calls 2 to N to the same field. > > > Best > Erick > > > On Wed, Jul 23, 2008 at 3:42 AM, starz10de <[EMAIL PROTECTED]> wrote: > >> >> Hi Erik, >> >> I don't remove the stop words, as I index parallel corpora which is used >> for learning the translations between pair of languages. so every word is >> important. I even develop my own analyzer for Arabic which is just remove >> punctuations and special symbols and it return only Arabic text. >> >> I guess in the FileDocument.java the whole text is already stored >> >> doc.add(Field.Text("contents", IN)); >> >> where IN is >> >> IN = new BufferedReader(new InputStreamReader(new FileInputStream(f)) >> >> if this is not the case yould you please how to store the whole text >> inside >> the index ? >> >> I am new to lucene and I don't know how to use this "Field.Store.YES" to >> store whole text. >> >> >> >> Best regards >> Farag >> >> >> >> starz10de wrote: >> > >> > Could any one tell me please how to print the content of the document >> > after reading the index. >> > for example if i like to print the index terms then i do : >> > >> > IndexReader ir = IndexReader.open(index); >> > TermEnum termEnum = ir.terms(); >> > while (termEnum.next()) { >> > TermDocs dok = ir.termDocs(); >> > dok.seek(termEnum); >> > while (dok.next()) { >> > System.out.println(termEnum.term().text().trim()); >> > } >> > >> > I can print the text files before indexing them, but because of >> encoding >> > issues i like to print them from the index. >> > As i know the content of the document(whole text) is also stored in the >> > index, my question how to print this content. >> > >> > so at the end i will print the path of the current document , index >> terms >> > and the content of the document >> > >> > >> > thanks in advance >> > >> >> -- >> View this message in context: >> http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18605547.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > -- View this message in context: http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18631887.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]