My question might be very easy for you Lucene experts. But after going through the Lucene documentation / example, I haven't been able to figure out how to solve this problem. I'll be really grateful if someone can help me get a starting point here.
Our application tracks SMSes sent from a particular phone number. We have gigs of logs that (Lets say) look like this SomeUselessData1#SMSID#SomeData1#PhoneNumber SomeUselessData2#SMSID#SomeData2 SomeUselessData3#SMSID#SomeData3 SomeUselessData4#SMSID#SomeData4 ... ... Now our search will obviously be done on the basis of the phone number. So we need indexing so that we can: 1 List SMSIDs of all the SMSes that a phone number had sent (Each SMS message will have a globally unique ID) 2 List SomeData1, SomeData2, SomeData3 and SomeData4 for a given SMSID. How can I do this efficiently? I wrote a sample piece of code where each row was a Document, and PhoneNumber, SMSID and SomeData columns were Fields. The indexing was taking much more than minutes for a 1 MB log file, so I realized that I didn't do it right (You can guess how 'not' comfortable I am with Lucene at present). I would expect to be able to index at least a of GB of logs within 1 or 2 minutes. Can someone please point me to the right examples, help me understand what my Documents / Fields / Analyzers should be or help me design a solution? Thanks in advance ps. I just now got Lucene in Action. Is there any example (or similar concept) explained in the book? From what I see, none of the examples really help me much. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]