You could always sort by EVENTID, that way at least you'd have all the events for a particular ID together in your results. You'd have to post-filter the results to determine whether all the necessary descriptions were present. But I don't think this works all that well because, as you pointed out, you may have a lot of records to sort through so I don't think this is a very good idea...
How many events are we talking about here and what kind of lag between an event and being able to search it can you tolerate? I guess what I'm really asking is whether it's possible to recreate your index "often enough" to satisfy your users. If so, you can index multiple descriptions in a single document, something like doc.add("EVENTDESCRIPTION", "STARTING EVENT") doc.add("EVENTDESCRIPTION", "XYZ") doc.add("EVENTDESCRIPTION", "ABC") doc.add("EVENTID", "1") IndexWriter.addDocument(doc); You'd have to gather all the descriptions related to each EVENTID before you were able to index the doc..... By manipulating the PositionIncrementGap you could also keep searches from matching across different EVENTDESCRIPTIONs, e.g. if you didn't want to match +STARTING +ABC you could use SpanQueries or the proximity operator, but going into details depends upon whether you can rebuild your index so we'll defer that part.... You could also think about updating the document when new events were added, but since an update is really a delete/add under the covers you'd have to either gather enough information from what I assume is your log or store enough information with the document to recreate it. How big is your index currently and what kind of throughput do you require? Best Erick On Wed, Feb 18, 2009 at 10:20 AM, Christian Brennsteiner < christ...@brennsteiner.at> wrote: > dear lucene community, > > i am playing around with lucene right now. and have come to very bad > problem. > > given environment: > > a signal source gives signals with eventids ans eventdescriptions > > for example EVENTID=1 and EVENTDESCRIPTION="STARTING EVENT" > > those events can be running very long (e.g. one month) during this > period we will receive for example > > EVENTID=1 and EVENTDESCRIPTION="EXECUTING XYZ" > 10 minutes later > EVENTID=1 and EVENTDESCRIPTION="EXECUTING YZA" > 10 minutes later > EVENTID=1 and EVENTDESCRIPTION="PASSED MILESTONE1" > 10 minutes later > EVENTID=1 and EVENTDESCRIPTION="EXECUTING ZAB" > > after e.g. 1 week we receive > EVENTID=1 and EVENTDESCRIPTION="STOPING EVENT" > > what i want: > i want to be able to search e.g. which eventids are connected to "XYZ" > AND "ZAB" AND have already passed "MILESTONE1" > > so my current try is to index all events by full indexing (without > storing) eventdescriptions AND stemming e.g. EXECUTING > > then searching for "+XYZ +ZAB +MILESTONE1" > --> result no document since those are all seperated documents > when i search > "XYZ ZAB MILESTONE1" > i am getting 3 times EVENTID 3 > --> this is bad since when i get 1000000 of such events how do i rank them? > > CONCLUSION: > my biggest problem is that my lucene document given to the index > currently is not in a final state BUT i have to index and search it > also while it is in progress. > as a result of this the ranking as i do it now has no real value since > the ranking is just based on a "line of a whole event" > > QUESTION: > is there a solution within lucene to combine search results? e.g. merge > them OR > is there a better workaround how i would do such updates to the index > without storing the original docmuent inside the index (since this > consumes so many space)? e.g. extracting the keywords that were stored > for the item? > > any hints appreciated. > > regards chris > > > ---------- > Christian Brennsteiner > Salzburg / Austria / Europe > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >