Re: How to keep user search history and how to turn it into information?

Dmitry Fri, 10 Aug 2007 00:46:42 -0700

Lucas,

Probably one of the solution will be to use database - like my sql andsetup Lucene against MySQL - in thi scase you don't need to think lessconcerning implementaiton based on the content sotrage. ALso you need tocreate middle tier to catch all event concerning Users Search / Hostory/Retrieval History/Cache Management...

Thanks,
dt
www.ejinz.com
Search News

----- Original Message -----From: "Lukas Vlcek" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Friday, August 10, 2007 2:28 AM

Subject: How to keep user search history and how to turn it intoinformation?

Hi,

I would like to keep user search history data and I am looking for some

ideas/advices/recommendations. In general I would like to talk aboutmethods

of storing such data, its structure and how to turn it into valuable
information.

As for the structure:
==============
For now I don't have exact idea about what kind of information I should
keep. I know that this is application specific but I believe there can be

some common general patterns. as of now I think can be useful to keep isthe

following:

1) system time (time of issuing the query) and userid
2) original user query in raw form (untokenized)
3) expanded user query (both tokenized and untokenized can be useful)
4) query execution time
5) # of objects retrieved from index
6) # of total object count in index (this can change during time)
7) and possibly if user clicked some result and if so then which one (the
hit number) and system time

As for the information I can get from this:
=============================
Such minimal data collection could show if the search engine serves users

well or not (generally said). I should note that for the users in thiscase

the only other option is to not use the search engine at all (so the data
should not be biased by the fact that users are using alternative search
method). I should be able to learn if:

1) there are bottleneck queries (Prefix,Fuzzy,Proximity queries...)
2) users are finding what they want (they can find it fast and results are
ordered by properly defined relevance [my model is well tuned in terms of
term weights] so the result they click is among first hits)
3) user can formulate queries well (do they issue queries which return all
index documents or they can issue queries which return just a couple of
documents)
4) ...?... etc...

As for the storage method:
===================

I was planning to keep such data in database but now it seems to me thatit

will be better to keep it directly in index (Lucene index). It seems to me
that this approach would allow me for better fuzzy searches across history
and extracting relevant objects and their count more efficiently (with
benefit of the relevance based search on top of history search corpus).

I think that more scalable solution would be to keep such data in pureflat

file and then periodically recreate search history index (or more indices)
from it (for example by Map-Reduce like task). Event better the flat file

could be stored in distributer file system. However, for now I would liketo

start with something simple.

I know this is a complex topic...

Regards,
Lukas



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to keep user search history and how to turn it into information?

Reply via email to