Re: How to keep user search history and how to turn it into information?

Peter W. Tue, 14 Aug 2007 13:29:38 -0700

Hey Lukas,

You can get a basic demo of this working in Lucene
first then make a more advanced and efficient version.


First, give each document in your index a score field
using NumberTools so it's sortable. When users perform
a search, log the unique document_id, IP address and
result position for the next step.

Use Hadoop to simplifiy your logs by mapping the id
and emitting IP's as intermediate values. Have
reduce collect unique document_id[IP addresses].

Read thru final output file, increment score value for
each IP who clicked on the document_id, re-index
Lucene and sort results (reverse order) by score field.

A more advanced version could store previous result
positions as Payloads but I don't understand this
new Lucene concept.

Regards,

Peter W.

On Aug 10, 2007, at 5:56 AM, Lukas Vlcek wrote:

Enis,

Thanks for your time.
I gave a quick glance at Pig and it seems good (seems it isdirectly based
on Hadoop which I am starting to play with :-). It obvious that a huge
amount of data (like user queries or access logs) should be storedin flatfiles which makes it convenient for further analysis by Pig (ordirectly byHadoop based tasks) or other tools. And I agree with you that sizeof theindex can be tracked in journal based style in separated log ratherthenwith every since user query. That is for the easier part of myoriginal
question :-)
The true art starts with the mining tasks itself. How toefficiently use
such data for bettering user experience with the search engine...

On 8/10/07, Enis Soztutar <[EMAIL PROTECTED]> wrote:
...

Web server log analysis is a very popular topic nowadays, andyou cancheck for the literature, especially clickthrough data anaysis.All the
major search engines has to interpret the data to improve their
algorithms, and to learn from the latent "collective knowlege"hidden
in web server logs.
...

...
You do not have to implement this from scratch. You just have tospecify
your data mining tasks, then write scripts(in pig latin) or write
map-reduce programs (in hadoop). Either of these are not thathard. I donot think that there is any tool which may satisfy all youinformationneeds. So at the risk of repeating myself i suggest you to look atpig
at write some scripts to mine the data...


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to keep user search history and how to turn it into information?

Reply via email to