On 2 Nov 2005, at 08:10, Richard Jones wrote:
If i've listened to Radiohead (id 1) 10 times, Coldplay (id 2) 5
times and
Beck (id 3) 2 times, the field would look like this "1 1 1 1 1 1 1
1 1 1 2 2
2 2 2 3 3"
I use this index for quickly finding "top fans" of an artist or
combination of
artists, comparing peoples music taste and other things on the fly.
The issue is that i already have the termvecor (radiohead=10,
coldplay=5,
beck=2) handy as a hashtable, and i've found myself building up a
string of
numbers separated by spaces as shown above, then feeding this into
lucene (i
store the termvec of the field in lucene). Is there a way i could
pass a
termvector directly to lucene to cut out the ugly "turn it into a
string and
let lucene parse it" step? basically i want to provide the
termvector for a
field when inserting a new document, rather than let lucene build
it by
analyzing a string.
This does feel like a rather perverted use of lucene i suppose..
It's faster
and less hassle than other methods i've tried to date though.
last.fm using Lucene, sweet! It has caught on with quite a number
of friends, so I tried it just yesterday and my first query for music
like "Michael Hedges" turned up nothing, so I was bummed. - but it
is a very cool service.
Rather than building a string to index in this manner, perhaps adding
each integer as an individual Field with the same name, with the term
vector enabled, and using something like the WhitespaceAnalyzer. To
be honest, though, I'm not sure without digging deeper whether adding
same-named fields in this manner messes with the term vector
capabilities.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]