I am interested to search in sentence level. 
It is a parallel corpora , each sentence in the first language is
equivalence to sentence in the second language. I want to index each
sentence and have some id for each sentence in order when I retrieve it I go
easily and retrieve its equivalence in the second language.

This I did by splitting the file and consider each sentence as text file.
However, this really takes long time to do for many huge text files. 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-one-huge-text-file-tp3191605p3191628.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to