Alternatively, you could create a multivalued field whereby each sentence is in the same document, but retrievable in order.
On Fri, Jul 22, 2011 at 11:10 AM, Glen Newton <[email protected]> wrote: > So to use Lucene-speak, each sentence is a document. > > I don't know how you are indexing and what code you are using (and > what hardware, etc.), but you if you are not already, should consider > multi-threading the indexing which should give you a significant > indexing performance boost. > > -Glen > > > On Fri, Jul 22, 2011 at 11:04 AM, starz10de <[email protected]> wrote: >> I am interested to search in sentence level. >> It is a parallel corpora , each sentence in the first language is >> equivalence to sentence in the second language. I want to index each >> sentence and have some id for each sentence in order when I retrieve it I go >> easily and retrieve its equivalence in the second language. >> >> This I did by splitting the file and consider each sentence as text file. >> However, this really takes long time to do for many huge text files. >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Index-one-huge-text-file-tp3191605p3191628.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > > -- > > - > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
