Re: MLT Using a Query created in a different index

2013-04-05 Thread Peter Lavin


Thanks for that Jack,

so it's fair to say that if both the sources and target corpus are large 
and diverse, then the impact of using a different index to create the 
query would be negligible.


P.

On 04/04/2013 06:49 PM, Jack Krupansky wrote:

The heart of MLT is examining the top result of a query (or maybe more
than one) and identifying the top terms from the top document(s) and
then simply using those top terms for a subsequent query. The term
ranking would of course depend on term frequency, and other relevancy
considerations - for the corpus of the original query. A rich query
corpus will give great results, a weak corpus will give weak results -
no matter how rich or weak the final target corpus is. OTOH, if the
target corpus really is representative on the source corpus, then
results should be either good or terrible - the selected/query document
may not have any representation in the target corpus.

-- Jack Krupansky

-Original Message- From: Peter Lavin
Sent: Thursday, April 04, 2013 1:06 PM
To: java-user@lucene.apache.org
Subject: MLT Using a Query created in a different index


Dear Users,

I am doing some research where Lucene is integrated into agent
technology. Part of this work involves using an MLT query in an index
which was not created from a document in that index (i.e. the query is
created, serialised and sent to the remote agent).

Can anyone point me towards any information on what the potential impact
of doing this would be?

I'm assuming if both indexes have similar sets of documents, the impact
would be negligible, but what, for example would be the impact of
creating an MLT query from an index with only one or two documents for
use in an index with several (say 100+) documents,

with thanks,
Peter



--
with best regards,
Peter Lavin,
PhD Candidate,
CAG - Computer Architecture  Grid Research Group,
Lloyd Institute, 005,
Trinity College Dublin, Ireland.
+353 1 8961536

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: MLT Using a Query created in a different index

2013-04-05 Thread Jack Krupansky
In a statistical sense, for the majority of documents, yes, but you could 
probably find quite a few outlier examples where the results from A to B or 
from B to A as significantly or even completely different or even 
non-existent.


-- Jack Krupansky

-Original Message- 
From: Peter Lavin

Sent: Friday, April 05, 2013 3:49 AM
To: java-user@lucene.apache.org
Subject: Re: MLT Using a Query created in a different index


Thanks for that Jack,

so it's fair to say that if both the sources and target corpus are large
and diverse, then the impact of using a different index to create the
query would be negligible.

P.

On 04/04/2013 06:49 PM, Jack Krupansky wrote:

The heart of MLT is examining the top result of a query (or maybe more
than one) and identifying the top terms from the top document(s) and
then simply using those top terms for a subsequent query. The term
ranking would of course depend on term frequency, and other relevancy
considerations - for the corpus of the original query. A rich query
corpus will give great results, a weak corpus will give weak results -
no matter how rich or weak the final target corpus is. OTOH, if the
target corpus really is representative on the source corpus, then
results should be either good or terrible - the selected/query document
may not have any representation in the target corpus.

-- Jack Krupansky

-Original Message- From: Peter Lavin
Sent: Thursday, April 04, 2013 1:06 PM
To: java-user@lucene.apache.org
Subject: MLT Using a Query created in a different index


Dear Users,

I am doing some research where Lucene is integrated into agent
technology. Part of this work involves using an MLT query in an index
which was not created from a document in that index (i.e. the query is
created, serialised and sent to the remote agent).

Can anyone point me towards any information on what the potential impact
of doing this would be?

I'm assuming if both indexes have similar sets of documents, the impact
would be negligible, but what, for example would be the impact of
creating an MLT query from an index with only one or two documents for
use in an index with several (say 100+) documents,

with thanks,
Peter



--
with best regards,
Peter Lavin,
PhD Candidate,
CAG - Computer Architecture  Grid Research Group,
Lloyd Institute, 005,
Trinity College Dublin, Ireland.
+353 1 8961536

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



MLT Using a Query created in a different index

2013-04-04 Thread Peter Lavin


Dear Users,

I am doing some research where Lucene is integrated into agent 
technology. Part of this work involves using an MLT query in an index 
which was not created from a document in that index (i.e. the query is 
created, serialised and sent to the remote agent).


Can anyone point me towards any information on what the potential impact 
of doing this would be?


I'm assuming if both indexes have similar sets of documents, the impact 
would be negligible, but what, for example would be the impact of 
creating an MLT query from an index with only one or two documents for 
use in an index with several (say 100+) documents,


with thanks,
Peter

--
with best regards,
Peter Lavin,
PhD Candidate,
CAG - Computer Architecture  Grid Research Group,
Lloyd Institute, 005,
Trinity College Dublin, Ireland.
+353 1 8961536

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: MLT Using a Query created in a different index

2013-04-04 Thread Jack Krupansky
The heart of MLT is examining the top result of a query (or maybe more than 
one) and identifying the top terms from the top document(s) and then 
simply using those top terms for a subsequent query. The term ranking would 
of course depend on term frequency, and other relevancy considerations - for 
the corpus of the original query. A rich query corpus will give great 
results, a weak corpus will give weak results - no matter how rich or weak 
the final target corpus is. OTOH, if the target corpus really is 
representative on the source corpus, then results should be either good or 
terrible - the selected/query document may not have any representation in 
the target corpus.


-- Jack Krupansky

-Original Message- 
From: Peter Lavin

Sent: Thursday, April 04, 2013 1:06 PM
To: java-user@lucene.apache.org
Subject: MLT Using a Query created in a different index


Dear Users,

I am doing some research where Lucene is integrated into agent
technology. Part of this work involves using an MLT query in an index
which was not created from a document in that index (i.e. the query is
created, serialised and sent to the remote agent).

Can anyone point me towards any information on what the potential impact
of doing this would be?

I'm assuming if both indexes have similar sets of documents, the impact
would be negligible, but what, for example would be the impact of
creating an MLT query from an index with only one or two documents for
use in an index with several (say 100+) documents,

with thanks,
Peter

--
with best regards,
Peter Lavin,
PhD Candidate,
CAG - Computer Architecture  Grid Research Group,
Lloyd Institute, 005,
Trinity College Dublin, Ireland.
+353 1 8961536

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org