Same score for different fields

2009-06-18 Thread Nada Mimouni
Hi,

 

I have created a Lucene index with two fields. 

Let's take this example entry from my index as displayed by Luke:  

 

Field   |   Norm  |Value



 |  0.375| average

 |  0.375| salary

 |  0.375| professional

 |  0.375| baseball

 |  0.375| player

 |  0.375| of

|  0.625| average salary

|  0.625| baseball player

 

When I run the search, documents that contain hits of the field with
the highest norm (or score : is it the same?), in this case the field
"seq", are ranked in the top. 

How can I give similar scores for both fields?

 

Thank you.

Nada 



RE: Lucene indexes

2009-02-24 Thread Nada Mimouni
Thank you Erick.

I am totally aware that Lucene uses inverted index (class: IndexWriter).
I have read in the literature about new efficient indexes that are created to 
handle phrases indexing, so I wondered if there are some updates or new classes 
added to Lucene for that reason.

The problem that I am trying to solve is : How to index phrases (rather than 
phrase querying)? 
I have a Questions/Answers corpus, the architecture I am using for IR creates 
one index for questions and another one for answers (based on single terms) and 
then matches between them.
I want to index phrases in addition to single terms (for both questions and 
answers) and then make a search for all terms and phrases in the questions 
index. 

If you have any idea how I can solve this problem of indexing phrases, it would 
be of great help. 

Nada Mimouni



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tue 2/24/2009 2:13 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene indexes
 
I have to ask why do you care? Which is another way of asking
what problem you're trying to solve that you think this information
would help with. As far as I know Lucene is an inverted index,
period. You use IndexWriter to create it.

Really the best way to get a sense for which classes to use is to work
through some of the examples in Lucene In Action or on the website.

This may help as far as the structure of the index is concerned:
http://lucene.apache.org/java/2_4_0/fileformats.html

Best
Erick

On Tue, Feb 24, 2009 at 5:36 AM, Nada Mimouni <
mimo...@tk.informatik.tu-darmstadt.de> wrote:

>
> Hello everybody,
>
> 1) What is the difference between :
> - inverted index
> - nextword index
> - common index
>
> 2) Which one(s) is(are) supported by Lucene?
>
> 3) Which class(es) create this(those) index(es)?
>
>
> Thank you in advance for your help.
> Nada Mimouni
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Lucene indexes

2009-02-24 Thread Nada Mimouni

Hello everybody,

1) What is the difference between :
- inverted index
- nextword index
- common index

2) Which one(s) is(are) supported by Lucene? 

3) Which class(es) create this(those) index(es)?


Thank you in advance for your help.
Nada Mimouni

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: searching a sentence or paragraph

2009-02-19 Thread Nada Mimouni



You need to create a TermQuery or PhraseQuery with terms in your query 
depending on what result you need exactly. 

To create PhraseQuery, try the built-in phrase processing with double quotes, 
e.g.
"this is a phrase".

See the Term section at
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

You can also have a look  at 
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Getting-Started-with-Lucene/
 

Best
Nada


-Original Message-
From: Seid Mohammed [mailto:seidy...@gmail.com]
Sent: Thu 2/19/2009 2:29 PM
To: java-user@lucene.apache.org
Subject: searching a sentence or paragraph
 
from lucen index, how can we search a sentence or a paragraph which
satisfy our query?

thanks a lot
seid m
-- 
"RABI ZIDNI ILMA"

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Index Structure

2009-02-19 Thread Nada Mimouni

Hello,

When indexing Lucene generates terms from your original text.

To see the content and the structure of the index, use "Luke" which is a Lucene 
index toolbox.
You can download it here :  http://www.getopt.org/luke/
There is a detailed description of this tool (with pretty screen-shots) in : 
"Lucene in Action" book, section 8.2.2.

Best
Nada

-Original Message-
From: Seid Mohammed [mailto:seidy...@gmail.com]
Sent: Thu 2/19/2009 12:09 PM
To: java-user@lucene.apache.org
Subject: Index Structure
 
I am new to lucene, and reading lucene in action book
sometimes, i better understand when somone tell me an answer than a book.
my queston is
when indexing, what actually lucene is doing?
if i have a file called test.txt  with contents " lucen is used to
index files" and i apply lucene indexing, what is the content of the
index and  what is the structure of the index?.

and if i apply lucene search, for example a query "index files", from
where lucene searches, from the index or from the test.index file

thanks a lot
seid m


-- 
"RABI ZIDNI ILMA"

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Analyse TermQuery and PhraseQuery

2009-02-19 Thread Nada Mimouni


Hello,

String ws = " ";
String query = 
"The"+ws+"president"+ws+"of"+ws+"the"+ws+"USA"+ws+"is"+ws+""\Barak Obama\"";
Query q = QueryParser.parse(query, new StandardAnalyser());
Query q = QueryParser.parse(query, new WhitespaceAnalyser());


In this example:
- could we create a query in such a format (combine terms and phrases)?
- what will be the result of the analysis?
- what should I do to get this hits {president, USA, Barak Obama}?  


Thank you for help.

Best 
Nada

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Phrase indexing and searching with Lucene

2009-02-19 Thread Nada Mimouni

Hello,

Thank you Erick for this detailed answer, that makes things clearer in my mind.

>I'm still not clear why the built-in phrase query syntax won't work.

I have programmed a set of java classes (I use Lucene classes) to index and 
search into a collection of documents for a set of queries.
To test my system, I use a corpus which consists in a collection of queries (n 
queries) and documents (m documents). 
I started by creating one index for all queries and another one for all 
documents. Then I make the search to match between the queries index and 
documents index.
I use a trec evaluation tool to generate a file that gives all hits (matches) 
between the queryID and documentID with different scores. 

In this first step, I just index terms, therefore the search process (as I have 
it now) looks only for term matches between the query terms and the documents 
terms.
Now I want to get better results (better matching) by adding phrases to terms. 

I don't know exactly whether it makes a difference if I index phrases and terms 
(erick, erickson, thinks, small, thoughts, erick erickson, erickson thinks, 
small thoughts, erickson thoughts) and then search for both, or just keep the 
indexing process as it is (erick, erickson, thinks, small, thoughts) and then 
make a search for phrases (PhraseQuery : erick erickson, erickson thinks, small 
thoughts, erickson thoughts) and terms. 
Any idea?


>Some examples of what you put in your index and what searches
>you expect to return results for your example AND searches you do
>NOT want to hit that document would be a great help.

input: 

*Query*  
898Why is the sun bright?

*Documents* 
7568  Star, large celestial body composed of gravitationally contained hot 
gases emitting electromagnetic radiation, especially light, as a result of 
nuclear reactions inside the star. The sun is a star.
7567  The sun has a magnitude of -26.7, inasmuch as it is about 10 billion 
times as bright as Sirius in the earth's sky. 

output: 

qID dID score 
898 7568 0,13 (not relevant)
898 7567 1 (relevant)


In this example, Lucene matches document 7567 to be relevant to he query (since 
it contains all query terms), however bright here is relative to Sirius (what 
we need is to get "sun bright").




Best 
Nada


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wed 2/18/2009 3:24 PM
To: java-user@lucene.apache.org
Subject: Re: Phrase indexing and searching with Lucene
 
I'm still not clear why the built-in phrase query syntax won't work. If I
index the following terms (erick, erickson, thinks, small, thoughts)
in a single field, then searching for "erick erickson" (as a phrase query,
i.e. with double quotes when sent through a query parser or constructing
a PhraseQuery yourself) will generate a hit but "erick thinks" won't
generate a hit (unless you specify slop).

"thinks small thoughts" would also generate a hit

If you're saying that you only want to match on *all* the tokens, i.e.
the only way to get a hit on the above would be to search for
"erick erickson thinks small thoughts", then you can create a
field that's UN_ANALYZED. If you do this, though, beware
that you have to do things like lower-case terms yourself when
indexing.

I have no idea what IndexTermGenerator is or what it does, but I'm
assuming that it just generates single words.

Some examples of what you put in your index and what searches
you expect to return results for your example AND searches you do
NOT want to hit that document would be a great help.

As far as searching for both, constructing a BooleanQuery with regular
TermQuerys and PhraseQuerys would work if you're constructing
your queries programmatically, or just using a Lucene query
like +termfield:word +phrasefield:"erick erickson thinks" would
work. Or, if you just require that the phrase exists you could do
it all in one field like
+field:word +field:"erick erickson thinks"



Best
Erick


On Wed, Feb 18, 2009 at 8:42 AM, Nada Mimouni <
mimo...@tk.informatik.tu-darmstadt.de> wrote:

>
>
> Thank you Erick.
>
> I need first to index phrases, the built-in phrase processing (with double
> quotes) comes in the search step.
> Is there any difference between :
>1) start by indexing phrases and then make a phrase search
>2) index terms and then search for phrases
>
>
> To make things clearer:
>
> What I am doing now:
>  - In the indexing step:  I am using "IndexTermGenerator" to generate term
> based indexes, one index for all queries I have and another one for
> documents (term means single word).
>  - In the search step : Lucene matches terms in queries index with terms in
> documents index.
>
> What I need to do:
>  - Index phrases ("mul

RE: Phrase indexing and searching with Lucene

2009-02-18 Thread Nada Mimouni


Thank you Erick.

I need first to index phrases, the built-in phrase processing (with double 
quotes) comes in the search step.  
Is there any difference between : 
1) start by indexing phrases and then make a phrase search 
2) index terms and then search for phrases


To make things clearer:

What I am doing now: 
 - In the indexing step:  I am using "IndexTermGenerator" to generate term 
based indexes, one index for all queries I have and another one for documents 
(term means single word). 
 - In the search step : Lucene matches terms in queries index with terms in 
documents index.

What I need to do:
 - Index phrases ("multi" words) in addition to terms (single words)
 - Search for both : phrases and terms


Is there any idea on how to proceed?

Regards
Nada


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wed 2/18/2009 2:10 PM
To: java-user@lucene.apache.org
Subject: Re: Phrase indexing and searching with Lucene
 
Have you tried the built-in phrase processing with double quotes? e.g.
"this is a phrase"?

See the Term section at
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

Best
Erick

On Wed, Feb 18, 2009 at 5:57 AM, Nada Mimouni <
mimo...@tk.informatik.tu-darmstadt.de> wrote:

>
>
> Hello everybody,
>
> I use Lucene to index and search into text documents.
> At present, I just index and search for single words. I want to extend this
> to phrases (or nGrams).
>
> Could anyone please give me details on how to index phrases and then make a
> phrase search?
>
> Thank you very much in advance for your help.
>
> Nada Mimouni
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Phrase indexing and searching with Lucene

2009-02-18 Thread Nada Mimouni


Hello everybody,

I use Lucene to index and search into text documents.
At present, I just index and search for single words. I want to extend this to 
phrases (or nGrams).

Could anyone please give me details on how to index phrases and then make a 
phrase search? 

Thank you very much in advance for your help.

Nada Mimouni

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Phrase indexing and searching with Lucene

2009-02-18 Thread Nada Mimouni

Hello everybody,

In my research work, I use Lucene to index and search into text documents.
At present, I just index and search for single words. I want to extend this to 
phrases (or nGrams).

Could anyone please give me more details on how to do it and also point me to 
some useful references on this?

Thank you very much in advance for your help.

Best regards,
Nada Mimouni



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org