If you can build an analyzer that tokenizes the second field so that it
filters out the words you don't want, you can then take advantage of
more intelligent queries as well.
So for the example that pjaol wrote, the query would become something
like this:
Query= body:(game OR redskins) keyword:(redskins)^10
Depending on your corpus this may or may not be possible, the
determining factor being whether or not the list of words being removed
from each document to create the second field varies.
(more specifically I mean, in some documents you remove the word game,
and in others you don't, if this is the case this technique won't work
for you.)
Matt
theDude_2 wrote:
Ah, Interesting... I didnt think of that! I will try it and report back
pjaol wrote:
Why not put the keywords into the same document as another field? and
search
both fields
at once, you can then use lucene syntax to give a boosting to the keyword
fields.
e.g.
body:A good game last night by the redskins
keyword: redskins
Query= body:(game OR redskins) keyword:(game OR redskins)^10
And adjust the boosting until you're happy.
Check out for querying multiple fields
http://wiki.apache.org/lucene-java/LuceneFAQ#head-300f0756fdaa71f522c96a868351f716573f2d77
You might even want to consider Solr and it's dismax search component
http://wiki.apache.org/solr/DisMaxRequestHandler
to make it easier
On Fri, Apr 17, 2009 at 11:19 AM, theDude_2 <aornst...@webmd.net> wrote:
I appreciate your response, and read the wiki article concerning the
Federated search
and
I'm not sure that my project falls into the "Federated Search" bucket...
What I've done is created 2 indexes created with the same documents.
One index, contains the full documents - great for pure relevancy search
The second index: contains all of the same documents, but a small subset
of
each documents contents - only allowing words to be indexed that we deem
as
"good words" -
(for example) if this was a football article database
Index 1: would index 100% of the article about the Redskins and the New
York
Giants
Index 2: would index the same article by only the "good words" in the
document like Redskins, Giants, Quarterback, Linebacker, etc.
What I'm trying to do, if it's even possible! is run the search on both
indexes containing references to the same article, and multiple the
scores
together to get a final score that would represent something like a
"relative AND good word" score....
Figuring that if a user searches on "Who is the Quarterback for the
Giants"
this will get the user an article that is both related to the query, and
deemed "important" to the query...
I will look further into federated search and related items, but I think
that lucene probably wont be able to help me with this, am I right?
------------
pjaol wrote:
I'd start by doing some research on the question rather than asking for
a
solution..
What your asking for can be considered 'Federated Search'
http://en.wikipedia.org/wiki/Federated_search
And it can be conceived in as many ways as you have document types. Any
answer will probably end up
customized and weighted by your document silo value, usually companies
weight those by business rules
rather than head down the path of federated search, as it's just
quicker
and
cheaper, and you can accomplish more.
e.g
Medication = score *2 (as higher advertising incentives)
Diseases = score
Books = score * 0.75 ( thousands of books, which nobody buys etc..)
You might also want to try consolidating your data into 1 schema, and
consider layering or collapsing results
based on type.
P
On Fri, Apr 17, 2009 at 10:39 AM, theDude_2 <aornst...@webmd.net>
wrote:
(bump) - any thoughts?
----
theDude_2 wrote:
hi!
I am trying to do something a little unique...
I have a 90k text documents that I am trying to search
Search A: indexes and searches the documents using regular relevancy
search
Search B: indexes and searches the documents using a smaller subset
of
"key" words that I have chosen
This gives me 2 seperate scores: Score A, and Score B...
I am trying to show the top 10 results of the scores combined so....
FinalScoretextDoc = (scoreA_of_td1 * 0.5) * (scoreB_of_td1 * 0.5)
While it seems straightforward, I do not want to calculate the
scores
of
all the documents outside of lucene. How can I integrate this
better
into
the lucene search engine? Is this possible to do by any simple
means?
Thanks guys + gals!
--
View this message in context:
http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23098961.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
View this message in context:
http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23099744.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org