My question might be very easy for you Lucene experts. But after going
through the Lucene documentation / example, I haven't been able to
figure out how to solve this problem. I'll be really grateful if
someone can help me get a starting point here.

Our application tracks SMSes sent from a particular phone number. We
have gigs of logs that (Lets say) look like this

SomeUselessData1#SMSID#SomeData1#PhoneNumber
SomeUselessData2#SMSID#SomeData2
SomeUselessData3#SMSID#SomeData3
SomeUselessData4#SMSID#SomeData4
...
...

Now our search will obviously be done on the basis of the phone
number. So we need indexing so that we can:

1 List SMSIDs of all the SMSes that a phone number had sent (Each SMS
message will have a globally unique ID)
2 List SomeData1, SomeData2, SomeData3 and SomeData4 for a given SMSID.

How can I do this efficiently?

I wrote a sample piece of code where each row was a Document, and
PhoneNumber, SMSID and SomeData columns were Fields. The indexing was
taking much more than minutes for a 1 MB log file, so I realized that
I didn't do it right (You can guess how 'not' comfortable I am with
Lucene at present). I would expect to be able to index at least a of
GB of logs within 1 or 2 minutes.

Can someone please point me to the right examples, help me understand
what my Documents / Fields / Analyzers should be or help me design a
solution?

Thanks in advance

ps. I just now got Lucene in Action. Is there any example (or similar
concept) explained in the book? From what I see, none of the examples
really help me much.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to