So to use Lucene-speak, each sentence is a document. I don't know how you are indexing and what code you are using (and what hardware, etc.), but you if you are not already, should consider multi-threading the indexing which should give you a significant indexing performance boost.
-Glen On Fri, Jul 22, 2011 at 11:04 AM, starz10de <farag_ah...@yahoo.com> wrote: > I am interested to search in sentence level. > It is a parallel corpora , each sentence in the first language is > equivalence to sentence in the second language. I want to index each > sentence and have some id for each sentence in order when I retrieve it I go > easily and retrieve its equivalence in the second language. > > This I did by splitting the file and consider each sentence as text file. > However, this really takes long time to do for many huge text files. > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Index-one-huge-text-file-tp3191605p3191628.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- - --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org