Dmitry,

My middle tier is Compass in this case (it is easy extensible so catching
all events should be easy). Also it allows for index to be stored in DB as
well.

The main question is *what* is important to catch and *how* to get maximum
information out of it later.

Lukas

On 8/10/07, Dmitry <[EMAIL PROTECTED]> wrote:
>
> Lucas,
> Probably one of the solution will be to use database  - like my sql and
> setup Lucene against MySQL  - in thi scase you don't need to think less
> concerning implementaiton based on the content sotrage. ALso you need to
> create middle tier to catch all event concerning Users Search / Hostory/
> Retrieval History/Cache Management...
> Thanks,
> dt
> www.ejinz.com
> Search News
>
> ----- Original Message -----
> From: "Lukas Vlcek" <[EMAIL PROTECTED]>
> To: <java-user@lucene.apache.org>
> Sent: Friday, August 10, 2007 2:28 AM
> Subject: How to keep user search history and how to turn it into
> information?
>
>
> > Hi,
> >
> > I would like to keep user search history data and I am looking for some
> > ideas/advices/recommendations. In general I would like to talk about
> > methods
> > of storing such data, its structure and how to turn it into valuable
> > information.
> >
> > As for the structure:
> > ==============
> > For now I don't have exact idea about what kind of information I should
> > keep. I know that this is application specific but I believe there can
> be
> > some common general patterns. as of now I think can be useful to keep is
> > the
> > following:
> >
> > 1) system time (time of issuing the query) and userid
> > 2) original user query in raw form (untokenized)
> > 3) expanded user query (both tokenized and untokenized can be useful)
> > 4) query execution time
> > 5) # of objects retrieved from index
> > 6) # of total object count in index (this can change during time)
> > 7) and possibly if user clicked some result and if so then which one
> (the
> > hit number) and system time
> >
> > As for the information I can get from this:
> > =============================
> > Such minimal data collection could show if the search engine serves
> users
> > well or not (generally said). I should note that for the users in this
> > case
> > the only other option is to not use the search engine at all (so the
> data
> > should not be biased by the fact that users are using alternative search
> > method). I should be able to learn if:
> >
> > 1) there are bottleneck queries (Prefix,Fuzzy,Proximity queries...)
> > 2) users are finding what they want (they can find it fast and results
> are
> > ordered by properly defined relevance [my model is well tuned in terms
> of
> > term weights] so the result they click is among first hits)
> > 3) user can formulate queries well (do they issue queries which return
> all
> > index documents or they can issue queries which return just a couple of
> > documents)
> > 4) ...?... etc...
> >
> > As for the storage method:
> > ===================
> > I was planning to keep such data in database but now it seems to me that
> > it
> > will be better to keep it directly in index (Lucene index). It seems to
> me
> > that this approach would allow me for better fuzzy searches across
> history
> > and extracting relevant objects and their count more efficiently (with
> > benefit of the relevance based search on top of history search corpus).
> >
> > I think that more scalable solution would be to keep such data in pure
> > flat
> > file and then periodically recreate search history index (or more
> indices)
> > from it (for example by Map-Reduce like task). Event better the flat
> file
> > could be stored in distributer file system. However, for now I would
> like
> > to
> > start with something simple.
> >
> > I know this is a complex topic...
> >
> > Regards,
> > Lukas
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to