Re: Personalized search parameters

2018-01-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I'm assuming that you are writing the cosine similarity and you have two 
vectors containing the pairs <term, tfidf>. The two vectors could have 
different sizes because they only contain the terms that have tfidf != 0.
if you want to compute cosine similarity between the two lists you just have to 
consider the pairs that appears in **both the vectors**, because otherwise if a 
term doesn't appear in one of the two the product is going to be 0, so it will 
not contribute to the final tfidf score. 

(Really old) Example: 
https://github.com/diegoceccarelli/dexter/blob/fb4bbcb27a13da2665f3c19d6c75bfc4f5778440/dexter-core/src/main/java/it/cnr/isti/hpc/dexter/lucene/LuceneHelper.java#L386


From: solr-user@lucene.apache.org At: 01/06/18 17:24:07To:  
solr-user@lucene.apache.org
Subject: Re: Personalized search parameters

Don't we need vectors of the same size to calculate the cosine similarity? 
Maybe I missed something, but following that example it looks like i have to
manually recreate the sparse vectors, because the term vector of a document
should (i may be wrong) contain only the terms that appear in that document.
Am I wrong?

Given that i assumed (and that example goes in that direction) that we have
to manually create the sparse vector by first collecting all the terms and
then calculating the tf-idf frequency for each term in each document.
That's what i did, and I obtained vectors of the same dimension for each
document, i was just wondering if there was a better optimized way to obtain
those sparse vectors.


--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: Personalized search parameters

2018-01-06 Thread marco
Don't we need vectors of the same size to calculate the cosine similarity? 
Maybe I missed something, but following that example it looks like i have to
manually recreate the sparse vectors, because the term vector of a document
should (i may be wrong) contain only the terms that appear in that document.
Am I wrong?

Given that i assumed (and that example goes in that direction) that we have
to manually create the sparse vector by first collecting all the terms and
then calculating the tf-idf frequency for each term in each document.
That's what i did, and I obtained vectors of the same dimension for each
document, i was just wondering if there was a better optimized way to obtain
those sparse vectors.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Personalized search parameters

2018-01-06 Thread Diego Ceccarelli
Maybe I misunderstood the question, but why you need to create the
full size vectors? can't you just compute the cosine using the sparse
vectors?

On Fri, Jan 5, 2018 at 10:09 PM, marco  wrote:
> At the moment I have another problem: is there an efficient way to calculate
> the cosine similarity between  documents?
> I'm following (with the required modifications)  THIS
>    code that calculates the cosine
> similarity between 2 documents, but it doesn't look too efficient when it
> comes to repeat everything between the user profile and every document
> retreived by the query.
> This because the termvectors returned by the IndexSearcher function
> getTermVector(...) only contain the terms present in the associated
> document, forcing you to create manually the full vectors.
> Isn't there the possibility to obtain full size vectors? (or are they way
> too big?)
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Personalized search parameters

2018-01-05 Thread marco
At the moment I have another problem: is there an efficient way to calculate
the cosine similarity between  documents?
I'm following (with the required modifications)  THIS
   code that calculates the cosine
similarity between 2 documents, but it doesn't look too efficient when it
comes to repeat everything between the user profile and every document
retreived by the query. 
This because the termvectors returned by the IndexSearcher function
getTermVector(...) only contain the terms present in the associated
document, forcing you to create manually the full vectors.
Isn't there the possibility to obtain full size vectors? (or are they way
too big?)



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Personalized search parameters

2018-01-05 Thread marco
This looks like a very good solution actually.
In the mean time i started working in a different way: I created a custom
query componentan from there i accessed the list of results of the query,
and i was searching a way to reorder that list, but i'd be better look to
the RankQuery, it surely looks like a more standard and elegant solution.

Thank you, i'll let you know how it goes with both the methods.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Personalized search parameters

2018-01-05 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
From: solr-user@lucene.apache.org At: 01/05/18 15:35:46To:  
solr-user@lucene.apache.org
Subject: Re: Personalized search parameters

In particular we have to retrieve the documents with a normal search
followed by a result reranking phase where we calculate the cosine
similarity between the retrieved documents and the user profile.


> That's exactly what RankQuery can do for you. You can specify how many 
> results you want to retrieve using the 'normal search' and then define your 
> own RankQuery / QParserPlugin. See for example ReRankQParserPlugin. 

https://lucene.apache.org/solr/guide/7_2/query-re-ranking.html


You can return your type of RankQuery, that will have access to the 
TopDocsCollector, so there you can reorder the documents by cosine similarity.

https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/search/RankQuery.html


--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: Personalized search parameters

2018-01-05 Thread marco
First of all thank you for the reply.
I understand your idea, and that would make the thing a lot easyer, the
problem is that this system is being created as a university project, and we
were specifically asked to develop a personalized search system based on
result reranking.
In particular we have to retrieve the documents with a normal search
followed by a result reranking phase where we calculate the cosine
similarity between the retrieved documents and the user profile.

I'm still looking around on the web, and it seems like i have to deal with
search component, is it right?
An alternative would be to work with plain Lucene, having the ability to
directly instantiate and call QueryParser, Similarity and everithing else
would simplify everything, but it wouldn't be nearly as cool :)



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Personalized search parameters

2018-01-05 Thread Erik Hatcher
IMO you’re making this more complicated than it needs to be.

Forget for a moment where the user profile is stored.  Say user A likes 
turtles.  User B likes puppies.

User A queries, and this gets sent to Solr:  q=something=turtles
User B queries: q=something=puppies

I’d fetch the user preference details _before_ making the call to Solr, and 
augment the call to Solr with the user-specific boosting/parameters.

If you happen to store the user preferences in a Solr document, fetch that 
document in your application tier before making the call to Solr (and I’d 
suggest using a separate collection for user preferences).   Sure, that’s two 
Solr requests, but… no worries!   Fetching a single document from Solr that is 
likely cached anyway won’t be slow.   And if you account for implementation 
time in your effort, it’s a big win. :)

For the record - this is the kind of thing we do in Lucidworks Fusion - sidecar 
collections, looking stuff up (preferences, recommendations, rules, etc etc) 
and augmenting the final/real Solr request.   Sounds kinda simplistic, and it 
is.  But the synergy of these simple things working together is Powerful Magic. 
  I’d hate to see you go down a really complicated and custom route to achieve 
what you’re asking, but I do empathize with the sentiment to roll all this 
together into a single Solr request hiding all the magic.   But simpler and 
straightforward is better than complex and custom if the end result is the same 
:)

Erik


> On Jan 5, 2018, at 6:10 AM, marco  wrote:
> 
> Hi, first of all I want to say that i'm a beginner with the whole Lucene/Solr
> environment.
> I'm trying to create a simple personalized search engine, and to do so i was
> thinking about adding a parameter user= to the uri of the query
> requests, that i would need during the scoring phase to rerank the result on
> based on the user profile (stored as a normal document).
> 
> My question is: how can i create a custom Similarity class that is able to
> retrieve a parameter passed during the request phase? I "know" from this 
> https://medium.com/@wkaichan/custom-query-parser-in-apache-solr-4634504bc5da
> 
>   
> that extending QParsePlugin I can access the request parameters, but how can
> i pass them during the whole chain of search operations so that they are
> accessible during the scoring phase?
> 
> Thank you for your help.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Personalized search parameters

2018-01-05 Thread marco
Hi, first of all I want to say that i'm a beginner with the whole Lucene/Solr
environment.
I'm trying to create a simple personalized search engine, and to do so i was
thinking about adding a parameter user= to the uri of the query
requests, that i would need during the scoring phase to rerank the result on
based on the user profile (stored as a normal document).

My question is: how can i create a custom Similarity class that is able to
retrieve a parameter passed during the request phase? I "know" from this 
https://medium.com/@wkaichan/custom-query-parser-in-apache-solr-4634504bc5da
  
that extending QParsePlugin I can access the request parameters, but how can
i pass them during the whole chain of search operations so that they are
accessible during the scoring phase?

Thank you for your help.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html