Re: Optional Terms in a single query

2005-02-22 Thread Andrzej Bialecki
Todd VanderVeen wrote:

I would be careful using wildcards as proposed. They can be inefficient 
(particularly in a list of disjunctions) but even more importantly you 
are excluding more than the 3 names. Your results won't be consistent 
with your intent.
In the new version of Luke (the tool) you can view how your wildcard 
query is re-written into boolean queries. This should help to catch 
those cases where wildcard queries match unwanted terms.

--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optional Terms in a single query

2005-02-21 Thread Todd VanderVeen
Luke Shannon wrote:
Hi Tod;
Thanks for your help.
I was able to do what you said but in a much uglier way using a Boolean
Query and adding Wildcard Queries.
The end result looks like this:
The query: +(type:138) +((-name:*tim* -name:*bill* -name:*harry*
+olfaithfull:stillhere))
But this one works as expected.
Thanks!
Luke
- Original Message - 
From: "Todd VanderVeen" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Monday, February 21, 2005 6:26 PM
Subject: Re: Optional Terms in a single query

 

Luke Shannon wrote:
   

The API I'm working with combines a series of queries into one larger one
using a boolean query.
Queries on the same field get OR's into one big query. All remaining
 

queries
 

are AND'd with this big one.
Working with in this system I have:
arg = (mario luigi bobby joe) //i do have control of how this list is
created
I pass this to the QueryParser:
Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
StandardAnalyzer());
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, false, true);
typeNegativeSearch.add(query2, true, false);
This is half the query.
It gets AND'd with the other half, to create what you see below:
+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere))
What I am having trouble with is getting the QueryParser to create
this: -name:(tim bill harry)
I feel like this is something simple, but for some reason I can't figure
 

it
 

out.
Thanks,
Luke

 

Is the API something you control?
Lets call the other half of you query query3. To avoid the extra nesting
you need to do the composition in a single boolean query.
Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
   

StandardAnalyzer());
 

Query query3 = 
BooleanQuery finalQuery = new BooleanQuery();
finalQuery.add(query1, false, true);
finalQuery.add(query2, true, false);
finalQuery.add(query3, true, false);
Cheers,
Todd VanderVeen
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

I would be careful using wildcards as proposed. They can be inefficient 
(particularly in a list of disjunctions) but even more importantly you 
are excluding more than the 3 names. Your results won't be consistent 
with your intent.

Cheers,
Todd VanderVeen

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optional Terms in a single query

2005-02-21 Thread Luke Shannon
Hi Tod;

Thanks for your help.

I was able to do what you said but in a much uglier way using a Boolean
Query and adding Wildcard Queries.

The end result looks like this:

The query: +(type:138) +((-name:*tim* -name:*bill* -name:*harry*
+olfaithfull:stillhere))

But this one works as expected.

Thanks!

Luke
- Original Message - 
From: "Todd VanderVeen" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Monday, February 21, 2005 6:26 PM
Subject: Re: Optional Terms in a single query


> Luke Shannon wrote:
>
> >The API I'm working with combines a series of queries into one larger one
> >using a boolean query.
> >
> >Queries on the same field get OR's into one big query. All remaining
queries
> >are AND'd with this big one.
> >
> >Working with in this system I have:
> >
> >arg = (mario luigi bobby joe) //i do have control of how this list is
> >created
> >
> >I pass this to the QueryParser:
> >
> >Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
> >Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
> >StandardAnalyzer());
> >BooleanQuery typeNegativeSearch = new BooleanQuery();
> >typeNegativeSearch.add(query1, false, true);
> >typeNegativeSearch.add(query2, true, false);
> >
> >This is half the query.
> >
> >It gets AND'd with the other half, to create what you see below:
> >
> >+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere))
> >
> >What I am having trouble with is getting the QueryParser to create
> >this: -name:(tim bill harry)
> >
> >I feel like this is something simple, but for some reason I can't figure
it
> >out.
> >
> >Thanks,
> >
> >Luke
> >
> >
> >
> Is the API something you control?
>
> Lets call the other half of you query query3. To avoid the extra nesting
> you need to do the composition in a single boolean query.
>
> Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
> Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
StandardAnalyzer());
> Query query3 = 
>
> BooleanQuery finalQuery = new BooleanQuery();
> finalQuery.add(query1, false, true);
> finalQuery.add(query2, true, false);
> finalQuery.add(query3, true, false);
>
> Cheers,
> Todd VanderVeen
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optional Terms in a single query

2005-02-21 Thread Todd VanderVeen
Luke Shannon wrote:
The API I'm working with combines a series of queries into one larger one
using a boolean query.
Queries on the same field get OR's into one big query. All remaining queries
are AND'd with this big one.
Working with in this system I have:
arg = (mario luigi bobby joe) //i do have control of how this list is
created
I pass this to the QueryParser:
Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
StandardAnalyzer());
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, false, true);
typeNegativeSearch.add(query2, true, false);
This is half the query.
It gets AND'd with the other half, to create what you see below:
+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere))
What I am having trouble with is getting the QueryParser to create
this: -name:(tim bill harry)
I feel like this is something simple, but for some reason I can't figure it
out.
Thanks,
Luke
 

Is the API something you control?
Lets call the other half of you query query3. To avoid the extra nesting 
you need to do the composition in a single boolean query.

Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
Query query2 = QueryParser.parse("stillhere", "olfaithfull", new 
StandardAnalyzer());
Query query3 = 
BooleanQuery finalQuery = new BooleanQuery();
finalQuery.add(query1, false, true);
finalQuery.add(query2, true, false);
finalQuery.add(query3, true, false);
Cheers,
Todd VanderVeen
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optional Terms in a single query

2005-02-21 Thread Luke Shannon
The API I'm working with combines a series of queries into one larger one
using a boolean query.

Queries on the same field get OR's into one big query. All remaining queries
are AND'd with this big one.

Working with in this system I have:

arg = (mario luigi bobby joe) //i do have control of how this list is
created

I pass this to the QueryParser:

Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer());
Query query2 = QueryParser.parse("stillhere", "olfaithfull", new
StandardAnalyzer());
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, false, true);
typeNegativeSearch.add(query2, true, false);

This is half the query.

It gets AND'd with the other half, to create what you see below:

+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere))

What I am having trouble with is getting the QueryParser to create
this: -name:(tim bill harry)

I feel like this is something simple, but for some reason I can't figure it
out.

Thanks,

Luke

- Original Message - 
From: "Todd VanderVeen" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Monday, February 21, 2005 5:33 PM
Subject: Re: Optional Terms in a single query


> Luke Shannon wrote:
>
> >Hi;
> >
> >I'm trying to create a query that look for a field containing type:181
and
> >name doesn't contain tim, bill or harry.
> >
> >+(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))
> >+(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
> >+(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))
> >+(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))
> >
> >I would really think to do this all in one Query. Is this even possible?
> >
> >Thanks,
> >
> >Luke
> >
> >
> >
> >-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> All all the queries listed attempts at the same things?
>
> I'm guessing you want this:
>
> +type:181 -name:(tim bill harry) +oldfaith:stillHere
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optional Terms in a single query

2005-02-21 Thread Luke Shannon
Sorry about the typos.

What I would like is a document with a type field = 181,
olfaithfull=stillHere and a name field not containing tim, bill or harry.

Thanks,

Luke

- Original Message - 
From: "Paul Elschot" <[EMAIL PROTECTED]>
To: 
Sent: Monday, February 21, 2005 5:31 PM
Subject: Re: Optional Terms in a single query


> On Monday 21 February 2005 23:23, Luke Shannon wrote:
> > Hi;
> >
> > I'm trying to create a query that look for a field containing type:181
and
> > name doesn't contain tim, bill or harry.
>
> type: 181  -(name: tim name:bill name:harry)
>
> > +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))
>
> stillHere is normally lowercased before searching. Is that ok?
>
> > +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
> > +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))
>
> typo? olfaithfull
>
> > +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))
>
> typo? (type:1 81)
>
> > I would really think to do this all in one Query. Is this even possible?
>
> How would you want to combine the results?
>
> Regards,
> Paul Elschot
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Optional Terms in a single query

2005-02-21 Thread Todd VanderVeen
Luke Shannon wrote:
Hi;
I'm trying to create a query that look for a field containing type:181 and
name doesn't contain tim, bill or harry.
+(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))
+(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
+(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))
+(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))
I would really think to do this all in one Query. Is this even possible?
Thanks,
Luke

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

All all the queries listed attempts at the same things?
I'm guessing you want this:
+type:181 -name:(tim bill harry) +oldfaith:stillHere

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Optional Terms in a single query

2005-02-21 Thread Paul Elschot
On Monday 21 February 2005 23:23, Luke Shannon wrote:
> Hi;
> 
> I'm trying to create a query that look for a field containing type:181 and
> name doesn't contain tim, bill or harry.

type: 181  -(name: tim name:bill name:harry)

> +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))

stillHere is normally lowercased before searching. Is that ok?

> +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
> +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))

typo? olfaithfull 

> +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))

typo? (type:1 81)
 
> I would really think to do this all in one Query. Is this even possible?

How would you want to combine the results?

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Optional Terms in a single query

2005-02-21 Thread Luke Shannon
Hi;

I'm trying to create a query that look for a field containing type:181 and
name doesn't contain tim, bill or harry.

+(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere))
+(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere))
+(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere))
+(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere))

I would really think to do this all in one Query. Is this even possible?

Thanks,

Luke



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Tuning

2005-02-21 Thread Paul Elschot
On Monday 21 February 2005 20:43, Todd VanderVeen wrote:
> Runde, Kevin wrote:
> 
> >Hi All,
> >
> >How does Lucene handle multi term queries? Does it use short circuiting?
> >So if a user entered:
> >(a OR b) AND c
> >But my program knew testing for "c" is cheaper than testing for "(a OR
> >b)" and I rewrote the query as:
> >c AND (a OR b)
> >Would the query run faster?
> >
> >Sorry if this has already be answered, but for some reason the Archive
> >search is not working for me today.
> >
> >Thanks,
> >Kevin
> >
> >
> >  
> >
> Not sure about what is in CVS, but look at BooleanQuery.scorer(). If all 

It's in svn nowadays.

> of the clauses of the BooleanQuery are required and none of the clauses 
> are BooleanQueries a ConjunctionScorer is returned that offers the 
> optimizations you seek. In the example you gave, there is a clause that 
> is boolean ( a or b) that will have to be evaluated independently with a 
> boolean scorer. This will be performed regardless of the ordering. 
> (BooleanScorer doesn't preserve document order when it return results 
> and hence it can't utilize the optimal algorithm provided by 
> ConjuntionScorer). Others have been down this path as evidenced by the 
> sigh in the javadoc.

In the svn version a ConjunctionScorer is used for all top level AND queries.
 
> If calculating (a or b) is expensive and the docFreq of a is much less 
> than the union of a and b, you might consider rewriting it to (a and c) 
> or (b and c) using DeMorgan's law. Expansion like this isn't always 
> beneficial and can't be applied blindly. As far as I can tell there is 

In the svn version the subquery (a or b) is only evaluated for documents
matching c. In the current version the expansion to
(a and c) or (b and c)
might help: the tradeoff is between evaluating c twice and having
less work for the OR operator.

> no query planning/optimization aside from the merging of related clauses 
> and attempts to rewrite to simpler queries.

One optimization in the current version is the use of ConjunctionScorer
for some cases. One such case, which happens a lot in practice, is a
query that has a few required terms.

Another optimization in the current version that some scoring is done ahead
for each clause into an unordered buffer.
This helps for top level OR queries, but loses for OR queries that are
subqueries of AND.

The svn version does not score ahead. It relies on the buffering done by
TermScorer. Perhaps the buffering for a TermScorer should be made
dependent on it's expected use: more buffering for top level OR, less 
buffering when used under AND.

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Tuning

2005-02-21 Thread Todd VanderVeen
Runde, Kevin wrote:
Hi All,
How does Lucene handle multi term queries? Does it use short circuiting?
So if a user entered:
(a OR b) AND c
But my program knew testing for "c" is cheaper than testing for "(a OR
b)" and I rewrote the query as:
c AND (a OR b)
Would the query run faster?
Sorry if this has already be answered, but for some reason the Archive
search is not working for me today.
Thanks,
Kevin
 

Not sure about what is in CVS, but look at BooleanQuery.scorer(). If all 
of the clauses of the BooleanQuery are required and none of the clauses 
are BooleanQueries a ConjunctionScorer is returned that offers the 
optimizations you seek. In the example you gave, there is a clause that 
is boolean ( a or b) that will have to be evaluated independently with a 
boolean scorer. This will be performed regardless of the ordering. 
(BooleanScorer doesn't preserve document order when it return results 
and hence it can't utilize the optimal algorithm provided by 
ConjuntionScorer). Others have been down this path as evidenced by the 
sigh in the javadoc.

If calculating (a or b) is expensive and the docFreq of a is much less 
than the union of a and b, you might consider rewriting it to (a and c) 
or (b and c) using DeMorgan's law. Expansion like this isn't always 
beneficial and can't be applied blindly. As far as I can tell there is 
no query planning/optimization aside from the merging of related clauses 
and attempts to rewrite to simpler queries.

Cheers,
Todd VanderVeen
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Tuning

2005-02-21 Thread Paul Elschot
On Monday 21 February 2005 19:59, Runde, Kevin wrote:
> Hi All,
> 
> How does Lucene handle multi term queries? Does it use short circuiting?
> So if a user entered:
> (a OR b) AND c
> But my program knew testing for "c" is cheaper than testing for "(a OR
> b)" and I rewrote the query as:
> c AND (a OR b)
> Would the query run faster?

Exchanging the operands of AND would not make a noticeable difference
in speed. Queries are evaluated by iterating the inverted term index entries
for all query terms  in parallel, with buffering.

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Query Tuning

2005-02-21 Thread Runde, Kevin
Hi All,

How does Lucene handle multi term queries? Does it use short circuiting?
So if a user entered:
(a OR b) AND c
But my program knew testing for "c" is cheaper than testing for "(a OR
b)" and I rewrote the query as:
c AND (a OR b)
Would the query run faster?

Sorry if this has already be answered, but for some reason the Archive
search is not working for me today.

Thanks,
Kevin



RE: Using the highlighter from the sandbox with a prefix query.

2005-02-21 Thread Michael Celona
Thank you this helped a lot...

Michael Celona

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 21, 2005 11:55 AM
To: Lucene Users List
Subject: Re: Using the highlighter from the sandbox with a prefix query.


On Feb 21, 2005, at 10:53 AM, Michael Celona wrote:

> That the only stack I get.  One thing to mention that I am using a
> MultiSearcher to rewrite the queries. I tried...
>
> query = searcher_last.rewrite( query );
> query = searcher_cur.rewrite( query );
>
> using IndexSearcher and I don't get an error... However, I not able to
> highlight wildcard queries.

I use Highlighter for lucenebook.com and have two indexes that I search 
with MultiSearcher.  Here's how I highlight:

 IndexReader reader = readers[indexIndex];
 QueryScorer scorer = new QueryScorer(query.rewrite(reader));
 SimpleHTMLFormatter formatter =
 new SimpleHTMLFormatter("",
 "");
 Highlighter highlighter = new Highlighter(formatter, scorer);

I get the appropriate IndexReader for the document being highlighted.  
You can get the index _index_ this way:
'
 int indexIndex = searcher.subSearcher(hits.id(position));

Hope this helps.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Using the highlighter from the sandbox with a prefix query.

2005-02-21 Thread mark harwood
>One thing to mention
> that I am using a
> MultiSearcher to rewrite the queries. I tried...

Ah. I remember this got a little ugly. The highlighter
has a Junit test that demonstrates highlighting fuzzy
queries when using a multisearcher. Take a look at
that.

I can't remember the ins and outs of the issues but I
know the code there still runs clean with the latest
versions.

Cheers
Mark.






___ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! 
http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using the highlighter from the sandbox with a prefix query.

2005-02-21 Thread Erik Hatcher
On Feb 21, 2005, at 10:53 AM, Michael Celona wrote:
That the only stack I get.  One thing to mention that I am using a
MultiSearcher to rewrite the queries. I tried...
query = searcher_last.rewrite( query );
query = searcher_cur.rewrite( query );
using IndexSearcher and I don't get an error... However, I not able to
highlight wildcard queries.
I use Highlighter for lucenebook.com and have two indexes that I search 
with MultiSearcher.  Here's how I highlight:

IndexReader reader = readers[indexIndex];
QueryScorer scorer = new QueryScorer(query.rewrite(reader));
SimpleHTMLFormatter formatter =
new SimpleHTMLFormatter("",
"");
Highlighter highlighter = new Highlighter(formatter, scorer);
I get the appropriate IndexReader for the document being highlighted.  
You can get the index _index_ this way:
'
int indexIndex = searcher.subSearcher(hits.id(position));

Hope this helps.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Using the highlighter from the sandbox with a prefix query.

2005-02-21 Thread Michael Celona
That the only stack I get.  One thing to mention that I am using a
MultiSearcher to rewrite the queries. I tried...

query = searcher_last.rewrite( query );
query = searcher_cur.rewrite( query );

using IndexSearcher and I don't get an error... However, I not able to
highlight wildcard queries.

Michael 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 21, 2005 10:32 AM
To: Lucene Users List
Subject: Re: Using the highlighter from the sandbox with a prefix query.


On Feb 21, 2005, at 10:20 AM, Michael Celona wrote:

> I am using
>   query = searcher.rewrite( query );
>
> and it is throwing java.lang.UnsupportedOperationException .
>
> Am I able to use the searcher rewrite method like this?

What's the full stack trace?

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using the highlighter from the sandbox with a prefix query.

2005-02-21 Thread Erik Hatcher
On Feb 21, 2005, at 10:20 AM, Michael Celona wrote:
I am using
query = searcher.rewrite( query );
and it is throwing java.lang.UnsupportedOperationException .
Am I able to use the searcher rewrite method like this?
What's the full stack trace?
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Using the highlighter from the sandbox with a prefix query.

2005-02-21 Thread Michael Celona
I am using
query = searcher.rewrite( query );

and it is throwing java.lang.UnsupportedOperationException .

Am I able to use the searcher rewrite method like this?

Thanks,
Michael

-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 17, 2005 4:09 AM
To: Lucene Users List
Subject: Re: Using the highlighter from the sandbox with a prefix query.

On Thursday 17 February 2005 08:37, lucuser4851 wrote:

>  We have been using the highlighter from the lucene sandbox, which works
> very nicely most of the time. However when we try and use it with a
> prefix query (which is what you get having parsed a wild-card query), it
> doesn't return any highlighted sections. Has anyone else experienced
> this problem, or found a way around it?

You need to call rewrite() on the query before you pass it to the
highlighter.

Regards
 Daniel

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Question

2005-02-18 Thread Luke Shannon
Thanks Erik. Option 2 sounds like the path of least resistance.

Luke
- Original Message - 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 17, 2005 9:05 PM
Subject: Re: Query Question


> On Feb 17, 2005, at 5:51 PM, Luke Shannon wrote:
> > My manager is now totally stuck about being able to query data with * 
> > in it.
> 
> He's gonna have to wait a bit longer, you've got a slightly tricky 
> situation on your hands
> 
> > WildcardQuery(new Term("name", "*home\**"));
> 
> The \* is the problem.  WildcardQuery doesn't deal with escaping like 
> you're trying.  Your query is essentially this now:
> 
> home\*
> 
> Where backslash has no special meaning at all... you're literally 
> looking for all terms that start with home followed by a backslash.  
> Two asterisks at the end really collapse into a single one logically.
> 
> > Any theories as to why the it would not match:
> >
> > Document (relevant fields):
> > Keyword
> > Keyword
> >
> > Is the \ escaping both * characters?
> 
> So, again, no escaping is being done here.  You're a bit stuck in this 
> situation because * (and ?) are special to WildcardQuery, and it does 
> no escaping.  Two options I think of:
> 
> - Build your own clone of WildcardQuery that does escaping - or 
> perhaps change the wildcard characters to something you do not index 
> and use those instead.
> 
> - Replace asterisks in the terms indexed with some other non-wildcard 
> character, then replace it on your queries as appropriate.
> 
> Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: select where from query type in lucene

2005-02-18 Thread Morus Walter
Miles Barr writes:
> On Fri, 2005-02-18 at 03:58 +0100, Miro Max wrote:
> > how can i search for content where type=document or
> > (type=document OR type=view).
> > actually i can do it with: "(type:document OR
> > type:entry) AND queryText" as QueryString.
> > but does exist any other better way to realize this?
>
[...] 
> 
> Another alternative is to put each type in it's own index and use a
> MultiSearcher to pull in the types you want.
> 
If the change rate of the index and the number of commonly used
type combinations aren't too large, cached filters might be another 
alternative.
Of couse the filter would have to be recreated whenever the index changes.
The advantage is, that you save searching for the types for each query
where the filter is reused while you can keep all documents within one 
index.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: select where from query type in lucene

2005-02-18 Thread Miles Barr
On Fri, 2005-02-18 at 03:58 +0100, Miro Max wrote:
> how can i search for content where type=document or
> (type=document OR type=view).
> actually i can do it with: "(type:document OR
> type:entry) AND queryText" as QueryString.
> but does exist any other better way to realize this?

What's wrong with that method? I don't think you can do it any simpler. 

Are you concerned about writing a string then having to use the query
parser? You could also build it up manually:

QueryParser parser = ...

Query text = parser.parse(queryText);

Query type = new BooleanQuery();
type.add(new TermQuery(new Term("type", "document")), false, false);
type.add(new TermQuery(new Term("type", "view")), false, false);

Query everything = new BooleanQuery();
everything.add(text, true, false);
everything.add(type, true, false);

That way you could avoid things in queryText overriding the type check.

Another alternative is to put each type in it's own index and use a
MultiSearcher to pull in the types you want.



-- 
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



select where from query type in lucene

2005-02-17 Thread Miro Max
Hi,

i've problem with my my classes using lucene.
my index looks like:

type   |   content
-
document   |  x
document   |  x
view   |  x
view   |  x
dbentry|  x
dbentry|  x

my question now:

how can i search for content where type=document or
(type=document OR type=view).
actually i can do it with: "(type:document OR
type:entry) AND queryText" as QueryString.
but does exist any other better way to realize this?

thx

miro




___ 
Gesendet von Yahoo! Mail - Jetzt mit 250MB Speicher kostenlos - Hier anmelden: 
http://mail.yahoo.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Question

2005-02-17 Thread Erik Hatcher
On Feb 17, 2005, at 5:51 PM, Luke Shannon wrote:
My manager is now totally stuck about being able to query data with * 
in it.
He's gonna have to wait a bit longer, you've got a slightly tricky 
situation on your hands

WildcardQuery(new Term("name", "*home\**"));
The \* is the problem.  WildcardQuery doesn't deal with escaping like 
you're trying.  Your query is essentially this now:

home\*
Where backslash has no special meaning at all... you're literally 
looking for all terms that start with home followed by a backslash.  
Two asterisks at the end really collapse into a single one logically.

Any theories as to why the it would not match:
Document (relevant fields):
Keyword
Keyword
Is the \ escaping both * characters?
So, again, no escaping is being done here.  You're a bit stuck in this 
situation because * (and ?) are special to WildcardQuery, and it does 
no escaping.  Two options I think of:

	- Build your own clone of WildcardQuery that does escaping - or 
perhaps change the wildcard characters to something you do not index 
and use those instead.

	- Replace asterisks in the terms indexed with some other non-wildcard 
character, then replace it on your queries as appropriate.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Query Question

2005-02-17 Thread Luke Shannon
Hello;

My manager is now totally stuck about being able to query data with * in it.

Here are two queries.

TermQuery(new Term("type", "203"));
WildcardQuery(new Term("name", "*home\**"));

They are joined in a boolean query. That query gives this result when you
call the toString():

+(type:203) +(name:*home\**)

This looks right to me.

Any theories as to why the it would not match:

Document (relevant fields):
Keyword
Keyword

Is the \ escaping both * characters?

Thanks,

Luke




- Original Message - 
From: "Luke Shannon" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 17, 2005 2:44 PM
Subject: Query Question


> Hello;
>
> Why won't this query find the document below?
>
> Query:
> +(type:203) +(name:*home\**)
>
> Document (relevant fields):
> Keyword
> Keyword
>
> I was hoping by escaping the * it would be treated as a string. What am I
> doing wrong?
>
> Thanks,
>
> Luke
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Question

2005-02-17 Thread Luke Shannon
That is a query toString(). I created the Query using a Wildcard Query
object.

Luke

- Original Message - 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 17, 2005 3:00 PM
Subject: Re: Query Question


>
> On Feb 17, 2005, at 2:44 PM, Luke Shannon wrote:
>
> > Hello;
> >
> > Why won't this query find the document below?
> >
> > Query:
> > +(type:203) +(name:*home\**)
>
> Is that what the query toString is?  Or is that what you handed to
> QueryParser?
>
> Depending on your analyzer, 203 may go away.  QueryParser doesn't
> support leading asterisks, so "*home" would fail to parse.
>
> > Document (relevant fields):
> > Keyword
> > Keyword
> >
> > I was hoping by escaping the * it would be treated as a string. What
> > am I
> > doing wrong?
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Question

2005-02-17 Thread Erik Hatcher
On Feb 17, 2005, at 2:44 PM, Luke Shannon wrote:
Hello;
Why won't this query find the document below?
Query:
+(type:203) +(name:*home\**)
Is that what the query toString is?  Or is that what you handed to 
QueryParser?

Depending on your analyzer, 203 may go away.  QueryParser doesn't 
support leading asterisks, so "*home" would fail to parse.

Document (relevant fields):
Keyword
Keyword
I was hoping by escaping the * it would be treated as a string. What 
am I
doing wrong?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Query Question

2005-02-17 Thread Luke Shannon
Hello;

Why won't this query find the document below?

Query:
+(type:203) +(name:*home\**)

Document (relevant fields):
Keyword
Keyword

I was hoping by escaping the * it would be treated as a string. What am I
doing wrong?

Thanks,

Luke



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using the highlighter from the sandbox with a prefix query.

2005-02-17 Thread lucuser4851
Thanks very much Marc and Daniel. That solved the problem!!


On Thu, 2005-02-17 at 08:55 +, mark harwood wrote:
> See the highlighter's package.html for a description
> of how query.rewrite should be used to solve this.
> 
> Cheers,
> Mark
> 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using the highlighter from the sandbox with a prefix query.

2005-02-17 Thread Daniel Naber
On Thursday 17 February 2005 08:37, lucuser4851 wrote:

>  We have been using the highlighter from the lucene sandbox, which works
> very nicely most of the time. However when we try and use it with a
> prefix query (which is what you get having parsed a wild-card query), it
> doesn't return any highlighted sections. Has anyone else experienced
> this problem, or found a way around it?

You need to call rewrite() on the query before you pass it to the highlighter.

Regards
 Daniel

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using the highlighter from the sandbox with a prefix query.

2005-02-17 Thread mark harwood
See the highlighter's package.html for a description
of how query.rewrite should be used to solve this.

Cheers,
Mark


 --- lucuser4851 <[EMAIL PROTECTED]> wrote: 
> Dear All,
>  We have been using the highlighter from the lucene
> sandbox, which works
> very nicely most of the time. However when we try
> and use it with a
> prefix query (which is what you get having parsed a
> wild-card query), it
> doesn't return any highlighted sections. Has anyone
> else experienced
> this problem, or found a way around it?
> 
> Thanks a lot for your suggestions!!
> 
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
>  





___ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! 
http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Using the highlighter from the sandbox with a prefix query.

2005-02-16 Thread lucuser4851
Dear All,
 We have been using the highlighter from the lucene sandbox, which works
very nicely most of the time. However when we try and use it with a
prefix query (which is what you get having parsed a wild-card query), it
doesn't return any highlighted sections. Has anyone else experienced
this problem, or found a way around it?

Thanks a lot for your suggestions!!



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: What does [] do to a query and what's up with lucene.apache.org?

2005-02-14 Thread Jim Lynch
Otis and Erik,
Thanks for the info.  That's a great reference.
Jim.
Erik Hatcher wrote:
Jim,
The Lucene website is transitioning to the new top-level space.  I 
have  checked out the current site to the new lucene.apache.org area 
and set  up redirects from the old Jakarta URL's.  The source code, 
though, is  not an official part of the website.  Thanks to our 
conversion to  Subversion, though, the source is browsable starting here:

http://svn.apache.org/repos/asf/lucene/java/trunk
The HTML of the website will need link adjustments to get everything  
back in shape.

The brackets are documented here:  
http://lucene.apache.org/queryparsersyntax.html

Erik
On Feb 14, 2005, at 10:31 AM, Jim Lynch wrote:
First I'm getting a
   The requested URL could not be retrieved
--- 
-

While trying to retrieve the URL:  
http://lucene.apache.org/src/test/org/apache/lucene/queryParser/ 
TestQueryParser.java

The following error was encountered:
   Unable to determine IP address from host name for /lucene.apache.org
   /Guess the system is down.
I'm getting this error:
org.apache.lucene.queryParser.ParseException: Encountered "is" at 
line  1, column 15.
Was expecting:
   "]" ...
when I tried to parse the following string "[this is a test]".

I can't find any documentation that tells me what the brackets do to 
a  query.  I had a user that was used to another search engine that 
used  [] to do proximity or near searches and tried it on this one. 
Actually  I'd like to see the documentation for what the parser 
does.  All that  is mentioned in the javadoc is + - and ().  
Obviously there are more  special characters.

Thanks,
Jim.
Jim.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: What does [] do to a query and what's up with lucene.apache.org?

2005-02-14 Thread Erik Hatcher
Jim,
The Lucene website is transitioning to the new top-level space.  I have  
checked out the current site to the new lucene.apache.org area and set  
up redirects from the old Jakarta URL's.  The source code, though, is  
not an official part of the website.  Thanks to our conversion to  
Subversion, though, the source is browsable starting here:

http://svn.apache.org/repos/asf/lucene/java/trunk
The HTML of the website will need link adjustments to get everything  
back in shape.

The brackets are documented here:  
http://lucene.apache.org/queryparsersyntax.html

Erik
On Feb 14, 2005, at 10:31 AM, Jim Lynch wrote:
First I'm getting a
   The requested URL could not be retrieved
--- 
-

While trying to retrieve the URL:  
http://lucene.apache.org/src/test/org/apache/lucene/queryParser/ 
TestQueryParser.java

The following error was encountered:
   Unable to determine IP address from host name for /lucene.apache.org
   /Guess the system is down.
I'm getting this error:
org.apache.lucene.queryParser.ParseException: Encountered "is" at line  
1, column 15.
Was expecting:
   "]" ...
when I tried to parse the following string "[this is a test]".

I can't find any documentation that tells me what the brackets do to a  
query.  I had a user that was used to another search engine that used  
[] to do proximity or near searches and tried it on this one. Actually  
I'd like to see the documentation for what the parser does.  All that  
is mentioned in the javadoc is + - and ().  Obviously there are more  
special characters.

Thanks,
Jim.
Jim.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: What does [] do to a query and what's up with lucene.apache.org?

2005-02-14 Thread Otis Gospodnetic
Hi,

lucene.apache.org seems to work now.
Here is the query syntax:
  http://lucene.apache.org/queryparsersyntax.html
[] is used as [BEGIN-RANGE-STRING TO END-RANGE-STRING]

Otis



--- Jim Lynch <[EMAIL PROTECTED]> wrote:

> First I'm getting a
> 
> 
> The requested URL could not be retrieved
> 
>

> 
> While trying to retrieve the URL: 
>
http://lucene.apache.org/src/test/org/apache/lucene/queryParser/TestQueryParser.java
> 
> 
> 
> The following error was encountered:
> 
> Unable to determine IP address from host name for
> /lucene.apache.org
> 
> /Guess the system is down.
> 
> I'm getting this error:
> 
> org.apache.lucene.queryParser.ParseException: Encountered "is" at
> line 
> 1, column 15.
> Was expecting:
> "]" ...
>  when I tried to parse the following string "[this is a test]".
> 
> I can't find any documentation that tells me what the brackets do to
> a 
> query.  I had a user that was used to another search engine that used
> [] 
> to do proximity or near searches and tried it on this one. Actually
> I'd 
> like to see the documentation for what the parser does.  All that is 
> mentioned in the javadoc is + - and ().  Obviously there are more 
> special characters.
> 
> Thanks,
> Jim.
> 
> Jim.
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



What does [] do to a query and what's up with lucene.apache.org?

2005-02-14 Thread Jim Lynch
First I'm getting a
   The requested URL could not be retrieved

While trying to retrieve the URL: 
http://lucene.apache.org/src/test/org/apache/lucene/queryParser/TestQueryParser.java 

The following error was encountered:
   Unable to determine IP address from host name for /lucene.apache.org
   /Guess the system is down.
I'm getting this error:
org.apache.lucene.queryParser.ParseException: Encountered "is" at line 
1, column 15.
Was expecting:
   "]" ...
when I tried to parse the following string "[this is a test]".

I can't find any documentation that tells me what the brackets do to a 
query.  I had a user that was used to another search engine that used [] 
to do proximity or near searches and tried it on this one. Actually I'd 
like to see the documentation for what the parser does.  All that is 
mentioned in the javadoc is + - and ().  Obviously there are more 
special characters.

Thanks,
Jim.
Jim.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Query Analyzer

2005-02-07 Thread Ravi
That worked. Thanks a lot. 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 07, 2005 11:39 AM
To: Lucene Users List
Subject: Re: Query Analyzer


On Feb 7, 2005, at 11:29 AM, Ravi wrote:

> How do I set the analyzer when I build the query in my code instead of
> using a query parser ?

You don't.  All terms you use for any Query subclasses you instantiate 
must match exactly the terms in the index.  If you need an analyzer to 
do this then you're responsible for doing it yourself, just as 
QueryParser does underneath.  I do this myself in my current 
application like this:

     private Query createPhraseQuery(String fieldName, String string, 
boolean lowercase) {
 RossettiAnalyzer analyzer = new RossettiAnalyzer(lowercase);
 TokenStream stream = analyzer.tokenStream(fieldName, new 
StringReader(string));

 PhraseQuery pq = new PhraseQuery();
 Token token;
 try {
   while ((token = stream.next()) != null) {
   pq.add(new Term(fieldName, token.termText()));
   }
 } catch (IOException ignored) {
   // ignore - shouldn't get an IOException on a StringReader
 }

 if (pq.getTerms().length == 1) {
 // optimize single term phrase to TermQuery
 return new TermQuery(pq.getTerms()[0]);
 }

 return pq;
 }

Hope that helps.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Analyzer

2005-02-07 Thread Erik Hatcher
On Feb 7, 2005, at 11:29 AM, Ravi wrote:
How do I set the analyzer when I build the query in my code instead of
using a query parser ?
You don't.  All terms you use for any Query subclasses you instantiate 
must match exactly the terms in the index.  If you need an analyzer to 
do this then you're responsible for doing it yourself, just as 
QueryParser does underneath.  I do this myself in my current 
application like this:

    private Query createPhraseQuery(String fieldName, String string, 
boolean lowercase) {
RossettiAnalyzer analyzer = new RossettiAnalyzer(lowercase);
TokenStream stream = analyzer.tokenStream(fieldName, new 
StringReader(string));

PhraseQuery pq = new PhraseQuery();
Token token;
try {
  while ((token = stream.next()) != null) {
  pq.add(new Term(fieldName, token.termText()));
  }
} catch (IOException ignored) {
  // ignore - shouldn't get an IOException on a StringReader
}
if (pq.getTerms().length == 1) {
// optimize single term phrase to TermQuery
return new TermQuery(pq.getTerms()[0]);
}
return pq;
}
Hope that helps.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Query Analyzer

2005-02-07 Thread Ravi
How do I set the analyzer when I build the query in my code instead of
using a query parser ?

Thanks in advance
Ravi. 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x (but still has the field)

2005-02-04 Thread Luke Shannon
Hello;

I think Chris's approach might be helpfull, but I can't seems to get it to
work.

So since I running out of time and I still need to figure out "starts with"
and "ends with" queries, I have implemented a hacky solution to getting all
documents with a kcfileupload field present that does not contain jpg:

query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer());
query2 = QueryParser.parse("stillhere", "olfaithfull", new
StandardAnalyzer());//each document contains this
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, false, true);
typeNegativeSearch.add(query2, true, false);

What gets returned are all the documents without a kcfileupload = jpg. This
includes documents that don't even have a kcfileupload.

When I go through the results before displaying I check to make sure there
is a "kcfileupload" field.

This is not a good solution, and I hope to replace it soon. If anyone has
ideas please let me know.

Luke

- Original Message - 
From: "Chris Hostetter" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Friday, February 04, 2005 3:03 PM
Subject: Re: Parsing The Query: Every document that doesn't have a field
containing x



Another approach...

You can make a Filter that is the inverse of the output from another
filter, which means you can make a QueryFilter on the search, then wrap it
in your inverse Filter.

you can't execute a query on a filter without having a Query object, but
you can just apply the Filter directly to an IndexReader yourself, and get
back a BitSet containing the docIds of everydocument that does not contain
your term.

something like this should work...

   class NotFilter extends Filter {
  private Filter wraped;
  public NotFilter(Filter w) {
wraped = w;
  }
  public BitSet bits(IndexReader r) {
BitSet b = wraped.bits(r);
b.flip(0,b.size());
return b;
  }
   }
   ...
   BitSet results = (new NotFilter
 (new QueryFilter
  (new TermQuery(new Term("f","x").bits(reader);




: Date: Thu, 3 Feb 2005 19:51:36 +0100
: From: Kelvin Tan <[EMAIL PROTECTED]>
: Reply-To: Lucene Users List 
: To: Lucene Users List 
: Subject: Re: Parsing The Query: Every document that doesn't have a field
: containing x
:
: Alternatively, add a dummy field-value to all documents, like
doc.add(Field.Keyword("foo", "bar"))
:
: Waste of space, but allows you to perform negated queries.
:
: On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote:
: >> Negating a term must be combined with at least one nonnegated
: >> term to return documents; in other words, it isn't possible to
: >> use a query like NOT term to find all documents that don't
: >> contain a term.
: >>
: >> So does that mean the above example wouldn't work?
: >>
: > Exactly. You cannot search for "-kcfileupload:jpg", you need at
: > least one clause that actually _includes_ documents.
: >
: > Do you by chance have a field with known contents? If so, you could
: > misuse that one and include it in your query (perhaps by doing
: > range or wildcard/prefix search). If not, try IndexReader.terms()
: > for building a Query yourself, then use that one for search.
:
:
:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-04 Thread Luke Shannon
Hi Chris;

So the result would contain all documents that don't have field f containing
x?

What I need to figure out how to do is return all documents that have a
field f, but does not contain x.

Thanks for your post.

Luke


- Original Message - 
From: "Chris Hostetter" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Friday, February 04, 2005 3:03 PM
Subject: Re: Parsing The Query: Every document that doesn't have a field
containing x



Another approach...

You can make a Filter that is the inverse of the output from another
filter, which means you can make a QueryFilter on the search, then wrap it
in your inverse Filter.

you can't execute a query on a filter without having a Query object, but
you can just apply the Filter directly to an IndexReader yourself, and get
back a BitSet containing the docIds of everydocument that does not contain
your term.

something like this should work...

   class NotFilter extends Filter {
  private Filter wraped;
  public NotFilter(Filter w) {
wraped = w;
  }
  public BitSet bits(IndexReader r) {
BitSet b = wraped.bits(r);
b.flip(0,b.size());
return b;
  }
   }
   ...
   BitSet results = (new NotFilter
 (new QueryFilter
  (new TermQuery(new Term("f","x").bits(reader);




: Date: Thu, 3 Feb 2005 19:51:36 +0100
: From: Kelvin Tan <[EMAIL PROTECTED]>
: Reply-To: Lucene Users List 
: To: Lucene Users List 
: Subject: Re: Parsing The Query: Every document that doesn't have a field
: containing x
:
: Alternatively, add a dummy field-value to all documents, like
doc.add(Field.Keyword("foo", "bar"))
:
: Waste of space, but allows you to perform negated queries.
:
: On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote:
: >> Negating a term must be combined with at least one nonnegated
: >> term to return documents; in other words, it isn't possible to
: >> use a query like NOT term to find all documents that don't
: >> contain a term.
: >>
: >> So does that mean the above example wouldn't work?
: >>
: > Exactly. You cannot search for "-kcfileupload:jpg", you need at
: > least one clause that actually _includes_ documents.
: >
: > Do you by chance have a field with known contents? If so, you could
: > misuse that one and include it in your query (perhaps by doing
: > range or wildcard/prefix search). If not, try IndexReader.terms()
: > for building a Query yourself, then use that one for search.
:
:
:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-04 Thread Luke Shannon
Thanks for everyone who has been posting possible solutions. I am making
great progress and learning a lot.

This works, but the results include files that don't even contain a
"kcfileupload" field (not good):

query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer());
query2 = QueryParser.parse("stillhere", "olfaithfull", new
StandardAnalyzer());
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, false, true);
typeNegativeSearch.add(query2, true, false);

Someone meantioned a filter. So I have been playing with the test below.

The problem I have is this line:

Query query2 = QueryParser.parse("*", "kcfileupload", new
StandardAnalyzer());

Results in the following error:

org.apache.lucene.queryParser.ParseException: Lexical error at line 1,
column 2.  Encountered:  after : ""

I was hoping it would create a wild card search on kcfileupload. I feel like
I am getting close to a good solution. Any tips would help.

Thanks,

Luke

import junit.framework.TestCase;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryFilter;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.RAMDirectory;

public class IsNotTypeTest extends TestCase {

private RAMDirectory directory;

protected void setUp() throws Exception {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory,
    new StandardAnalyzer(), true);

//jpg should show up in first query
Document document = new Document();
document.add(Field.Text("kcfileupload", "picture.jpg"));
document.add(Field.Text("name", "pic one"));
writer.addDocument(document);

//jpg should show up in first query
document = new Document();
document.add(Field.Text("kcfileupload", "picture2.jpg"));
document.add(Field.Text("name", "pic two"));
writer.addDocument(document);

//pdf should show up in second query
document = new Document();
document.add(Field.Text("kcfileupload", "file.pdf"));
document.add(Field.Text("name", "pdf one"));
writer.addDocument(document);

//ppt should show up in second query
document = new Document();
    document.add(Field.Text("kcfileupload", "file.ppt"));
document.add(Field.Text("name", "power point one"));
writer.addDocument(document);

//ppt should show up in second query
document = new Document();
document.add(Field.Text("kcfileupload", "file2.ppt"));
document.add(Field.Text("name", "power point two"));
writer.addDocument(document);

//other should not show in this test
document = new Document();
document.add(Field.Text("name", "link"));
document.add(Field.Text("address", "www.cbc.ca"));
writer.addDocument(document);

writer.close();

}

public void testIsNotType() throws Exception {
IndexSearcher searcher = new IndexSearcher(directory);
Query query1 = QueryParser.parse("jpg", "kcfileupload", new
StandardAnalyzer());
Query query2 = QueryParser.parse("*", "kcfileupload", new
StandardAnalyzer());
QueryFilter jpgFilter = new QueryFilter(new TermQuery(new
Term("kcfileupload", "jpg")));
Hits hits = searcher.search(query1);
assertEquals(2, hits.length());
int totalHits = hits.length();
int count = 0;
while (count < totalHits) {
Document current = (Document)hits.doc(count);
System.out.println("The upload is " + count + " is " +
current.getField("kcfileupload"));
count++;
}
hits = searcher.search(query2, jpgFilter);
assertEquals(3, hits.length());
totalHits = hits.length();
count = 0;
while (count < totalHits) {
Document current = (Document)hits.doc(count);
System.out.println("The upload is " + count + " is " +
current.getField("kcfileupload"));
count++;
}
}

}


- Original Message - 
From: "åç" <[EMAIL PROTECTED]>
To: "

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-04 Thread Chris Hostetter

Another approach...

You can make a Filter that is the inverse of the output from another
filter, which means you can make a QueryFilter on the search, then wrap it
in your inverse Filter.

you can't execute a query on a filter without having a Query object, but
you can just apply the Filter directly to an IndexReader yourself, and get
back a BitSet containing the docIds of everydocument that does not contain
your term.

something like this should work...

   class NotFilter extends Filter {
  private Filter wraped;
  public NotFilter(Filter w) {
wraped = w;
  }
  public BitSet bits(IndexReader r) {
BitSet b = wraped.bits(r);
b.flip(0,b.size());
return b;
  }
   }
   ...
   BitSet results = (new NotFilter
 (new QueryFilter
  (new TermQuery(new Term("f","x").bits(reader);




: Date: Thu, 3 Feb 2005 19:51:36 +0100
: From: Kelvin Tan <[EMAIL PROTECTED]>
: Reply-To: Lucene Users List 
: To: Lucene Users List 
: Subject: Re: Parsing The Query: Every document that doesn't have a field
: containing x
:
: Alternatively, add a dummy field-value to all documents, like 
doc.add(Field.Keyword("foo", "bar"))
:
: Waste of space, but allows you to perform negated queries.
:
: On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote:
: >> Negating a term must be combined with at least one nonnegated
: >> term to return documents; in other words, it isn't possible to
: >> use a query like NOT term to find all documents that don't
: >> contain a term.
: >>
: >> So does that mean the above example wouldn't work?
: >>
: > Exactly. You cannot search for "-kcfileupload:jpg", you need at
: > least one clause that actually _includes_ documents.
: >
: > Do you by chance have a field with known contents? If so, you could
: > misuse that one and include it in your query (perhaps by doing
: > range or wildcard/prefix search). If not, try IndexReader.terms()
: > for building a Query yourself, then use that one for search.
:
:
:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-04 Thread Luke Shannon
Very Nice. Thanks!

Luke

- Original Message - 
From: "åç" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Friday, February 04, 2005 2:12 AM
Subject: Re: Parsing The Query: Every document that doesn't have a field
containing x


I  think you may can use a filter to get right result!
See examlples below
package lia.advsearching;

import junit.framework.TestCase;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.QueryFilter;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.RAMDirectory;

public class SecurityFilterTest extends TestCase {
  private RAMDirectory directory;

  protected void setUp() throws Exception {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory,
new WhitespaceAnalyzer(), true);

// Elwood
Document document = new Document();
document.add(Field.Keyword("owner", "elwood"));
document.add(Field.Text("keywords", "elwoods sensitive info"));
writer.addDocument(document);

// Jake
document = new Document();
document.add(Field.Keyword("owner", "jake"));
document.add(Field.Text("keywords", "jakes sensitive info"));
writer.addDocument(document);

writer.close();
  }

  public void testSecurityFilter() throws Exception {
TermQuery query = new TermQuery(new Term("keywords", "info"));

IndexSearcher searcher = new IndexSearcher(directory);
Hits hits = searcher.search(query);
assertEquals("Both documents match", 2, hits.length());

QueryFilter jakeFilter = new QueryFilter(
new TermQuery(new Term("owner", "jake")));

hits = searcher.search(query, jakeFilter);
assertEquals(1, hits.length());
assertEquals("elwood is safe",
    "jakes sensitive info", hits.doc(0).get("keywords"));
  }

}


On Thu, 3 Feb 2005 13:04:50 -0500, Luke Shannon
<[EMAIL PROTECTED]> wrote:
> Hello;
>
> I have a query that finds document that contain fields with a specific
> value.
>
> query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer());
>
> This works well.
>
> I would like a query that find documents containing all kcfileupload
fields
> that don't contain jpg.
>
> The example I found in the book that seems to relate shows me how to find
> documents without a specific term:
>
> QueryParser parser = new QueryParser("contents", analyzer);
> parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
>
> But than it says:
>
> Negating a term must be combined with at least one nonnegated term to
return
> documents; in other words, it isn't possible to use a query like NOT term
to
> find all documents that don't contain a term.
>
> So does that mean the above example wouldn't work?
>
> The API says:
>
>  a plus (+) or a minus (-) sign, indicating that the clause is required or
> prohibited respectively;
>
> I have been playing around with using the minus character without much
luck.
>
> Can someone give point me in the right direction to figure this out?
>
> Thanks,
>
> Luke
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
æäåäæäå

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Numbers in the Query String

2005-02-03 Thread åç
I agree their viewpoint!


On Thu, 3 Feb 2005 14:29:13 -0800 (PST), Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Using different analyzers for indexing and searching is not
> recommended.
> Your numbers are not even in the index because you are using
> StandardAnalyzer.  Use Luke to look at your index.
> 
> Otis
> 
> 
> --- Hetan Shah <[EMAIL PROTECTED]> wrote:
> 
> > Hello,
> >
> > How can one search for a document based on the query which has
> > numbers
> > in the query srting.
> >
> > e.g. query = Java 2 Platform J2EE
> >
> > What do I need to do so that the numbers do not get neglected.
> >
> > I am using StandardAnalyzer to index the pages and using StopAnalyzer
> > to
> > search the documents. Would the use of two different analyzers cause
> > any
> > trouble for the results?
> >
> > Thanks.
> > -H
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-- 
æäåäæäå

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread åç
I  think you may can use a filter to get right result!
See examlples below
package lia.advsearching;

import junit.framework.TestCase;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.QueryFilter;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.RAMDirectory;

public class SecurityFilterTest extends TestCase {
  private RAMDirectory directory;

  protected void setUp() throws Exception {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory,
new WhitespaceAnalyzer(), true);

// Elwood
Document document = new Document();
document.add(Field.Keyword("owner", "elwood"));
document.add(Field.Text("keywords", "elwoods sensitive info"));
writer.addDocument(document);

// Jake
document = new Document();
document.add(Field.Keyword("owner", "jake"));
document.add(Field.Text("keywords", "jakes sensitive info"));
writer.addDocument(document);

writer.close();
  }

  public void testSecurityFilter() throws Exception {
TermQuery query = new TermQuery(new Term("keywords", "info"));

IndexSearcher searcher = new IndexSearcher(directory);
Hits hits = searcher.search(query);
assertEquals("Both documents match", 2, hits.length());

QueryFilter jakeFilter = new QueryFilter(
new TermQuery(new Term("owner", "jake")));

hits = searcher.search(query, jakeFilter);
assertEquals(1, hits.length());
assertEquals("elwood is safe",
"jakes sensitive info", hits.doc(0).get("keywords"));
  }

}


On Thu, 3 Feb 2005 13:04:50 -0500, Luke Shannon
<[EMAIL PROTECTED]> wrote:
> Hello;
> 
> I have a query that finds document that contain fields with a specific
> value.
> 
> query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer());
> 
> This works well.
> 
> I would like a query that find documents containing all kcfileupload fields
> that don't contain jpg.
> 
> The example I found in the book that seems to relate shows me how to find
> documents without a specific term:
> 
> QueryParser parser = new QueryParser("contents", analyzer);
> parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
> 
> But than it says:
> 
> Negating a term must be combined with at least one nonnegated term to return
> documents; in other words, it isn't possible to use a query like NOT term to
> find all documents that don't contain a term.
> 
> So does that mean the above example wouldn't work?
> 
> The API says:
> 
>  a plus (+) or a minus (-) sign, indicating that the clause is required or
> prohibited respectively;
> 
> I have been playing around with using the minus character without much luck.
> 
> Can someone give point me in the right direction to figure this out?
> 
> Thanks,
> 
> Luke
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-- 
æäåäæäå

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Bingo! Nice catch. That was it. Made everything lower case when I set the
field. Works great now.

Thanks!

Luke

- Original Message - 
From: "Kauler, Leto S" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 03, 2005 6:48 PM
Subject: RE: Parsing The Query: Every document that doesn't have a field
containing x


Because you are build from QueryParser rather than a TermQuery, all
search terms in the query are being lowercased by StandardAnalyzer.

So your query of "olFaithFull:stillhere" requires that there is an exact
index term of "stillhere" in that field.  It depends on how you built
the index (index and stored fields are different), but I would check on
that.  Also maybe try out TermQuery and see if that does anything for
you.



> -Original Message-
> From: Luke Shannon [mailto:[EMAIL PROTECTED]
> Sent: Friday, 4 February 2005 10:47 AM
> To: Lucene Users List
> Subject: Re: Parsing The Query: Every document that doesn't
> have a field containing x
>
>
> "stillHere"
>
> Capital H.
>
> - Original Message - 
> From: "Kauler, Leto S" <[EMAIL PROTECTED]>
> To: "Lucene Users List" 
> Sent: Thursday, February 03, 2005 6:40 PM
> Subject: RE: Parsing The Query: Every document that doesn't
> have a field containing x
>
>
> First thing that jumps out is case-sensitivity.  Does your
> olFaithFull field contain "stillHere" or "stillhere"?
>
> --Leto
>
>
> > -Original Message-
> > From: Luke Shannon [mailto:[EMAIL PROTECTED]
> > This works:
> >
> > query1 = QueryParser.parse("jpg", "kcfileupload", new
> > StandardAnalyzer()); query2 = QueryParser.parse("stillHere",
> > "olFaithFull", new StandardAnalyzer()); BooleanQuery
> > typeNegativeSearch = new BooleanQuery();
> > typeNegativeSearch.add(query1, false, false);
> > typeNegativeSearch.add(query2, false, false);
> >
> > It returns 9 results. And in string form is: kcfileupload:jpg
> > olFaithFull:stillhere
> >
> > But this:
> >
> > query1 = QueryParser.parse("jpg", "kcfileupload", new
> > StandardAnalyzer());
> > query2 = QueryParser.parse("stillHere",
> "olFaithFull", new
> > StandardAnalyzer());
> > BooleanQuery typeNegativeSearch = new BooleanQuery();
> > typeNegativeSearch.add(query1, true, false);
> > typeNegativeSearch.add(query2, true, false);
> >
> > Reutrns 0 results and is in string form : +kcfileupload:jpg
> > +olFaithFull:stillhere
> >
> > If I do the query kcfileupload:jpg in Luke I get 9 docs, each doc
> > containing a olFaithFull:stillHere. Why would
> > +kcfileupload:jpg +olFaithFull:stillhere return no results?
> >
> > Thanks,
> >
> > Luke

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom
it is addressed and may contain privileged and/or confidential information.
If you are not the intended recipient, any disclosure, copying or
dissemination of the information is unauthorised and you should
delete/destroy all copies and notify the sender. No liability is accepted
for any unauthorised use of the information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Kauler, Leto S
Because you are build from QueryParser rather than a TermQuery, all
search terms in the query are being lowercased by StandardAnalyzer.

So your query of "olFaithFull:stillhere" requires that there is an exact
index term of "stillhere" in that field.  It depends on how you built
the index (index and stored fields are different), but I would check on
that.  Also maybe try out TermQuery and see if that does anything for
you.



> -Original Message-
> From: Luke Shannon [mailto:[EMAIL PROTECTED] 
> Sent: Friday, 4 February 2005 10:47 AM
> To: Lucene Users List
> Subject: Re: Parsing The Query: Every document that doesn't 
> have a field containing x
> 
> 
> "stillHere"
> 
> Capital H.
> 
> - Original Message - 
> From: "Kauler, Leto S" <[EMAIL PROTECTED]>
> To: "Lucene Users List" 
> Sent: Thursday, February 03, 2005 6:40 PM
> Subject: RE: Parsing The Query: Every document that doesn't 
> have a field containing x
> 
> 
> First thing that jumps out is case-sensitivity.  Does your 
> olFaithFull field contain "stillHere" or "stillhere"?
> 
> --Leto
> 
> 
> > -Original Message-
> > From: Luke Shannon [mailto:[EMAIL PROTECTED]
> > This works:
> >
> > query1 = QueryParser.parse("jpg", "kcfileupload", new 
> > StandardAnalyzer()); query2 = QueryParser.parse("stillHere", 
> > "olFaithFull", new StandardAnalyzer()); BooleanQuery 
> > typeNegativeSearch = new BooleanQuery(); 
> > typeNegativeSearch.add(query1, false, false); 
> > typeNegativeSearch.add(query2, false, false);
> >
> > It returns 9 results. And in string form is: kcfileupload:jpg 
> > olFaithFull:stillhere
> >
> > But this:
> >
> > query1 = QueryParser.parse("jpg", "kcfileupload", new 
> > StandardAnalyzer());
> > query2 = QueryParser.parse("stillHere", 
> "olFaithFull", new 
> > StandardAnalyzer());
> > BooleanQuery typeNegativeSearch = new BooleanQuery();
> > typeNegativeSearch.add(query1, true, false);
> > typeNegativeSearch.add(query2, true, false);
> >
> > Reutrns 0 results and is in string form : +kcfileupload:jpg
> > +olFaithFull:stillhere
> >
> > If I do the query kcfileupload:jpg in Luke I get 9 docs, each doc 
> > containing a olFaithFull:stillHere. Why would
> > +kcfileupload:jpg +olFaithFull:stillhere return no results?
> >
> > Thanks,
> >
> > Luke

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
"stillHere"

Capital H.

- Original Message - 
From: "Kauler, Leto S" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 03, 2005 6:40 PM
Subject: RE: Parsing The Query: Every document that doesn't have a field
containing x


First thing that jumps out is case-sensitivity.  Does your olFaithFull
field contain "stillHere" or "stillhere"?

--Leto


> -Original Message-
> From: Luke Shannon [mailto:[EMAIL PROTECTED]
> This works:
>
> query1 = QueryParser.parse("jpg", "kcfileupload", new
> StandardAnalyzer()); query2 = QueryParser.parse("stillHere",
> "olFaithFull", new StandardAnalyzer()); BooleanQuery
> typeNegativeSearch = new BooleanQuery();
> typeNegativeSearch.add(query1, false, false);
> typeNegativeSearch.add(query2, false, false);
>
> It returns 9 results. And in string form is: kcfileupload:jpg
> olFaithFull:stillhere
>
> But this:
>
> query1 = QueryParser.parse("jpg", "kcfileupload", new
> StandardAnalyzer());
> query2 = QueryParser.parse("stillHere",
> "olFaithFull", new StandardAnalyzer());
> BooleanQuery typeNegativeSearch = new BooleanQuery();
> typeNegativeSearch.add(query1, true, false);
> typeNegativeSearch.add(query2, true, false);
>
> Reutrns 0 results and is in string form : +kcfileupload:jpg
> +olFaithFull:stillhere
>
> If I do the query kcfileupload:jpg in Luke I get 9 docs, each
> doc containing a olFaithFull:stillHere. Why would
> +kcfileupload:jpg +olFaithFull:stillhere return no results?
>
> Thanks,
>
> Luke

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom
it is addressed and may contain privileged and/or confidential information.
If you are not the intended recipient, any disclosure, copying or
dissemination of the information is unauthorised and you should
delete/destroy all copies and notify the sender. No liability is accepted
for any unauthorised use of the information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Kauler, Leto S
First thing that jumps out is case-sensitivity.  Does your olFaithFull
field contain "stillHere" or "stillhere"?

--Leto


> -Original Message-
> From: Luke Shannon [mailto:[EMAIL PROTECTED] 
> This works:
> 
> query1 = QueryParser.parse("jpg", "kcfileupload", new 
> StandardAnalyzer()); query2 = QueryParser.parse("stillHere", 
> "olFaithFull", new StandardAnalyzer()); BooleanQuery 
> typeNegativeSearch = new BooleanQuery(); 
> typeNegativeSearch.add(query1, false, false); 
> typeNegativeSearch.add(query2, false, false);
> 
> It returns 9 results. And in string form is: kcfileupload:jpg 
> olFaithFull:stillhere
> 
> But this:
> 
> query1 = QueryParser.parse("jpg", "kcfileupload", new 
> StandardAnalyzer());
> query2 = QueryParser.parse("stillHere", 
> "olFaithFull", new StandardAnalyzer());
> BooleanQuery typeNegativeSearch = new BooleanQuery();
> typeNegativeSearch.add(query1, true, false);
> typeNegativeSearch.add(query2, true, false);
> 
> Reutrns 0 results and is in string form : +kcfileupload:jpg
> +olFaithFull:stillhere
> 
> If I do the query kcfileupload:jpg in Luke I get 9 docs, each 
> doc containing a olFaithFull:stillHere. Why would 
> +kcfileupload:jpg +olFaithFull:stillhere return no results?
> 
> Thanks,
> 
> Luke

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
This works:

query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer());
query2 = QueryParser.parse("stillHere", "olFaithFull", new
StandardAnalyzer());
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, false, false);
typeNegativeSearch.add(query2, false, false);

It returns 9 results. And in string form is: kcfileupload:jpg
olFaithFull:stillhere

But this:

query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer());
query2 = QueryParser.parse("stillHere", "olFaithFull", new
StandardAnalyzer());
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, true, false);
typeNegativeSearch.add(query2, true, false);

Reutrns 0 results and is in string form : +kcfileupload:jpg
+olFaithFull:stillhere

If I do the query kcfileupload:jpg in Luke I get 9 docs, each doc containing
a olFaithFull:stillHere. Why would +kcfileupload:jpg +olFaithFull:stillhere
return no results?

Thanks,

Luke

- Original Message - 
From: "Maik Schreiber" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 03, 2005 4:55 PM
Subject: Re: Parsing The Query: Every document that doesn't have a field
containing x


> > Yes. There should be 119 with stillHere,
>
> You have double-checked that, haven't you? :)
>
> > and if I run a query in Luke on
> > kcfileupload = ppt, it returns one result. I am thinking I should at
least
> > get this result back with: -kcfileupload:jpg +olFaithFull:stillhere?
>
> You really should.
>
> -- 
> Maik Schreiber   *   http://www.blizzy.de <-- Get GMail invites here!
>
> GPG public key:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
> Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Numbers in the Query String

2005-02-03 Thread Otis Gospodnetic
Using different analyzers for indexing and searching is not
recommended.
Your numbers are not even in the index because you are using
StandardAnalyzer.  Use Luke to look at your index.

Otis


--- Hetan Shah <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> How can one search for a document based on the query which has
> numbers 
> in the query srting.
> 
> e.g. query = Java 2 Platform J2EE
> 
> What do I need to do so that the numbers do not get neglected.
> 
> I am using StandardAnalyzer to index the pages and using StopAnalyzer
> to 
> search the documents. Would the use of two different analyzers cause
> any 
> trouble for the results?
> 
> Thanks.
> -H
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Numbers in the Query String

2005-02-03 Thread Andrzej Bialecki
Hetan Shah wrote:
Hello,
How can one search for a document based on the query which has numbers 
in the query srting.

e.g. query = Java 2 Platform J2EE
What do I need to do so that the numbers do not get neglected.
I am using StandardAnalyzer to index the pages and using StopAnalyzer to 
search the documents. Would the use of two different analyzers cause any 
trouble for the results?
Yes. StopAnalyzer eats all numbers for breakfast. ;-) You need to use 
another analyzer, one that doesn't discard numbers.

--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Numbers in the Query String

2005-02-03 Thread Hetan Shah
Hello,
How can one search for a document based on the query which has numbers 
in the query srting.

e.g. query = Java 2 Platform J2EE
What do I need to do so that the numbers do not get neglected.
I am using StandardAnalyzer to index the pages and using StopAnalyzer to 
search the documents. Would the use of two different analyzers cause any 
trouble for the results?

Thanks.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
I did, I have ran both queries in Luke.

kcfileupload:ppt

returns 1

olFaithfull:stillhere

returns 119

Luke

- Original Message - 
From: "Maik Schreiber" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 03, 2005 4:55 PM
Subject: Re: Parsing The Query: Every document that doesn't have a field
containing x


> > Yes. There should be 119 with stillHere,
>
> You have double-checked that, haven't you? :)
>
> > and if I run a query in Luke on
> > kcfileupload = ppt, it returns one result. I am thinking I should at
least
> > get this result back with: -kcfileupload:jpg +olFaithFull:stillhere?
>
> You really should.
>
> -- 
> Maik Schreiber   *   http://www.blizzy.de <-- Get GMail invites here!
>
> GPG public key:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
> Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Maik Schreiber
Yes. There should be 119 with stillHere,
You have double-checked that, haven't you? :)
and if I run a query in Luke on
kcfileupload = ppt, it returns one result. I am thinking I should at least
get this result back with: -kcfileupload:jpg +olFaithFull:stillhere?
You really should.
--
Maik Schreiber   *   http://www.blizzy.de <-- Get GMail invites here!
GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Yes. There should be 119 with stillHere, and if I run a query in Luke on
kcfileupload = ppt, it returns one result. I am thinking I should at least
get this result back with: -kcfileupload:jpg +olFaithFull:stillhere?

Luke

- Original Message - 
From: "Maik Schreiber" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 03, 2005 4:27 PM
Subject: Re: Parsing The Query: Every document that doesn't have a field
containing x


> > -kcfileupload:jpg +olFaithFull:stillhere
> >
> > This looks right to me. Why the 0 results?
>
> Looks good to me, too. You sure all your documents have
> olFaithFull:stillhere and there is at least a document with kcfileupload
not
> being "jpg"?
>
> -- 
> Maik Schreiber   *   http://www.blizzy.de <-- Get GMail invites here!
>
> GPG public key:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
> Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Maik Schreiber
-kcfileupload:jpg +olFaithFull:stillhere
This looks right to me. Why the 0 results?
Looks good to me, too. You sure all your documents have 
olFaithFull:stillhere and there is at least a document with kcfileupload not 
being "jpg"?

--
Maik Schreiber   *   http://www.blizzy.de <-- Get GMail invites here!
GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Hello,

Still working on the same query, here is the code I am currently working
with.

I am thinking this should bring up all the documents that have
olFaithFull=stillHere and kcfileupload!=jpg (so anything else)

query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer());
query2 = QueryParser.parse("stillHere", "olFaithFull", new
StandardAnalyzer());
BooleanQuery typeNegativeSearch = new BooleanQuery();
typeNegativeSearch.add(query1, false, true);
typeNegativeSearch.add(query2, true, false);

There toString() on the query is:

-kcfileupload:jpg +olFaithFull:stillhere

This looks right to me. Why the 0 results?

Thanks,

Luke

- Original Message - 
From: "Maik Schreiber" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 03, 2005 1:19 PM
Subject: Re: Parsing The Query: Every document that doesn't have a field
containing x


> > Negating a term must be combined with at least one nonnegated term to
return
> > documents; in other words, it isn't possible to use a query like NOT
term to
> > find all documents that don't contain a term.
> >
> > So does that mean the above example wouldn't work?
>
> Exactly. You cannot search for "-kcfileupload:jpg", you need at least one
> clause that actually _includes_ documents.
>
> Do you by chance have a field with known contents? If so, you could misuse
> that one and include it in your query (perhaps by doing range or
> wildcard/prefix search). If not, try IndexReader.terms() for building a
> Query yourself, then use that one for search.
>
> -- 
> Maik Schreiber   *   http://www.blizzy.de
>
> GPG public key:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
> Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Ok.

I have added the following to every document:

doc.add(Field.UnIndexed("olFaithfull", "stillHere"));

The plan is a query that says: olFaithull = stillHere and kcfileupload!=jpg.

I have been experimenting with the MultiFieldQueryParser, this is not
working out for me. From a syntax how is this done? Does someone have an
example of a query similar to the one I am trying?

Thanks,

Luke

- Original Message - 
From: "Maik Schreiber" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 03, 2005 1:19 PM
Subject: Re: Parsing The Query: Every document that doesn't have a field
containing x


> > Negating a term must be combined with at least one nonnegated term to
return
> > documents; in other words, it isn't possible to use a query like NOT
term to
> > find all documents that don't contain a term.
> >
> > So does that mean the above example wouldn't work?
>
> Exactly. You cannot search for "-kcfileupload:jpg", you need at least one
> clause that actually _includes_ documents.
>
> Do you by chance have a field with known contents? If so, you could misuse
> that one and include it in your query (perhaps by doing range or
> wildcard/prefix search). If not, try IndexReader.terms() for building a
> Query yourself, then use that one for search.
>
> -- 
> Maik Schreiber   *   http://www.blizzy.de
>
> GPG public key:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
> Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Kelvin Tan
Alternatively, add a dummy field-value to all documents, like 
doc.add(Field.Keyword("foo", "bar"))

Waste of space, but allows you to perform negated queries.

On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote:
>> Negating a term must be combined with at least one nonnegated
>> term to return documents; in other words, it isn't possible to
>> use a query like NOT term to find all documents that don't
>> contain a term.
>>
>> So does that mean the above example wouldn't work?
>>
> Exactly. You cannot search for "-kcfileupload:jpg", you need at
> least one clause that actually _includes_ documents.
>
> Do you by chance have a field with known contents? If so, you could
> misuse that one and include it in your query (perhaps by doing
> range or wildcard/prefix search). If not, try IndexReader.terms()
> for building a Query yourself, then use that one for search.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Maik Schreiber
Negating a term must be combined with at least one nonnegated term to return
documents; in other words, it isn't possible to use a query like NOT term to
find all documents that don't contain a term.
So does that mean the above example wouldn't work?
Exactly. You cannot search for "-kcfileupload:jpg", you need at least one 
clause that actually _includes_ documents.

Do you by chance have a field with known contents? If so, you could misuse 
that one and include it in your query (perhaps by doing range or 
wildcard/prefix search). If not, try IndexReader.terms() for building a 
Query yourself, then use that one for search.

--
Maik Schreiber   *   http://www.blizzy.de
GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713
Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Hello;

I have a query that finds document that contain fields with a specific
value.

query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer());

This works well.

I would like a query that find documents containing all kcfileupload fields
that don't contain jpg.

The example I found in the book that seems to relate shows me how to find
documents without a specific term:

QueryParser parser = new QueryParser("contents", analyzer);
parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);

But than it says:

Negating a term must be combined with at least one nonnegated term to return
documents; in other words, it isn't possible to use a query like NOT term to
find all documents that don't contain a term.

So does that mean the above example wouldn't work?

The API says:

 a plus (+) or a minus (-) sign, indicating that the clause is required or
prohibited respectively;

I have been playing around with using the minus character without much luck.

Can someone give point me in the right direction to figure this out?

Thanks,

Luke




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Format

2005-02-01 Thread Erik Hatcher
How are you indexing your document?
If you're using QueryParser with the default operator set to OR (which 
is the default), then you've already provided the expression you need 
:)

Erik
On Feb 1, 2005, at 6:29 PM, Hetan Shah wrote:
Hello All,
What should my query look like if I want to search all or any of the 
following key words.

Sun Linux Red Hat Advance Server
replies are much appreciated.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Query Format

2005-02-01 Thread Hetan Shah
Hello All,
What should my query look like if I want to search all or any of the 
following key words.

Sun Linux Red Hat Advance Server
replies are much appreciated.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: query term frequency

2005-01-28 Thread markharw00d
This from the highlighter package will give you the IDF :
WeightedTerm[]  QueryTermExtractor.getIdfWeightedTerms(Query query, 
IndexReader reader, String fieldName)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: query term frequency

2005-01-28 Thread Grant Ingersoll
I implemented a Query version of the TermVector

org.apache.lucene.search.QueryTermVector

Works off of an array of Strings or a String and an Analyzer.  Is this
what you are looking for?


>>> [EMAIL PROTECTED] 1/28/2005 6:33:18 AM >>>
On Jan 27, 2005, at 10:24 PM, Jonathan Lasko wrote:
> No, the number of occurrences of a term in a Query.

Nothing built-in gives you this.  You'd have to dissect the Query  
clause-by-clause and cast each clause to the proper type to pull the  
terms from them.  The Highlighter code does this.

If there is a better way, I'd like to know.

Erik


>
> Jonathan
>
> Quoting David Spencer <[EMAIL PROTECTED]>:
>
>> Jonathan Lasko wrote:
>>
>>> What do I call to get the term frequencies for terms in the Query? 
I
>>> can't seem to find it in the Javadoc...
>>
>> Do you mean the # of docs that have a term?
>>
>>
> http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ 
> IndexReader.html#docFreq(org.apache.lucene.index.Term)
>>> Thanks.
>>>
>>> Jonathan
>>>
>>>
-
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]

>>> For additional commands, e-mail:
[EMAIL PROTECTED] 
>>>
>>
>>
>>
-
>> To unsubscribe, e-mail: [EMAIL PROTECTED] 
>> For additional commands, e-mail: [EMAIL PROTECTED]

>>
>>
>>
>>
>
>
>
>
-
> To unsubscribe, e-mail: [EMAIL PROTECTED] 
> For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene query (sql kind)

2005-01-28 Thread Erik Hatcher
Ross - I'm really perplexed by your message.  You create HTML from a 
database so that you can index it with Lucene, yet wish you could 
simply index the data in your database tied to a primary key directly, 
right?

Well, you're in luck - you already can do this!
What are you using for indexing?  It sounds like you borrowed the 
Lucene demo and have just run with that directly.

Erik
On Jan 28, 2005, at 11:02 AM, Ross Rankin wrote:
I agree.  My site is all dynamic pages created from the database.  
Right
now, I have to have a process create dummy pages, index them with 
Lucene,
then translate the Lucene results into meaningful links.  It actually 
works
better than it sounds, however it could be easier.

If I could just give Lucene a query result (i.e. a list of rows) and 
then
have Lucene send me back say the primary key of the rows that match 
and the
other Lucene goodness: ranking, number of hits, etc.

Could be pretty powerful and simplify the deployment for database 
driven
applications.

[Note: this opinion and $3.00 will get you a coffee at Starbucks]
Ross
-Original Message-
From: PA [mailto:[EMAIL PROTECTED]
Sent: Friday, January 28, 2005 6:44 AM
To: Lucene Users List
Subject: Re: lucene query (sql kind)
On Jan 28, 2005, at 12:40, sunil goyal wrote:
I want to run dynamic queries against the lucene index. Is there any
native syntax available for Lucene so that I can query, by first
generating the query in say an XML or SQL like format (cache this
query) and then  use this query over lucene index.
Talking of which, did anyone contemplated the possibility of a
JDBC adaptor of sort for Lucene?
Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: lucene query (sql kind)

2005-01-28 Thread Ross Rankin
I agree.  My site is all dynamic pages created from the database.  Right
now, I have to have a process create dummy pages, index them with Lucene,
then translate the Lucene results into meaningful links.  It actually works
better than it sounds, however it could be easier.

If I could just give Lucene a query result (i.e. a list of rows) and then
have Lucene send me back say the primary key of the rows that match and the
other Lucene goodness: ranking, number of hits, etc. 

Could be pretty powerful and simplify the deployment for database driven
applications.   

[Note: this opinion and $3.00 will get you a coffee at Starbucks]

Ross

-Original Message-
From: PA [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 28, 2005 6:44 AM
To: Lucene Users List
Subject: Re: lucene query (sql kind)


On Jan 28, 2005, at 12:40, sunil goyal wrote:

> I want to run dynamic queries against the lucene index. Is there any
> native syntax available for Lucene so that I can query, by first
> generating the query in say an XML or SQL like format (cache this
> query) and then  use this query over lucene index.

Talking of which, did anyone contemplated the possibility of a 
JDBC adaptor of sort for Lucene?

Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene query (sql kind)

2005-01-28 Thread jian chen
I like your idea and think you are quite right. I see quite some
people are using lucene to the extreme such that relational database
functionalities are replaced by lucene.

However, storing everything in lucene and use it as a relational type
of database will be kind of re-inventing the wheel. For example,
sorting on the date field, and any other range query.

I think the better way is to look at ways to integrate lucene tightly
into a java relational database, such as HSQL, McKoi or Derby.

In particular, that integration would make it possible for queries
like "contains(...)", which is included in MySQL full text search
syntax and other major relational db vendors.

I would like to contribute any possible help I could for that to happen.

Thanks,

Jian

On Fri, 28 Jan 2005 13:01:40 + (GMT), mark harwood
<[EMAIL PROTECTED]> wrote:
> I've added some user-defined lucene functions to
> HSQLDB and I've been able to run queries like the
> following one:
> 
> select top 10 lucene_highlight(adText) from ads where
> pricePounds <200  and lucene_query('bass guitar
> drums',id)>0 order by lucene_score(id) DESC
> 
> I've had similar success with Derby (Cloudscape).
> This approach has some appeal and I've been able to
> use the same class as a UDF in both databases but it
> does have issues: it looks like this UDF based
> integration won't scale. The above query took 80
> milliseconds using 10,000 records. Another
> index/database with 50,000 records was taking a matter
> of seconds. I think a scalable integration is likely
> to require modification of the core RDBMS code.
> 
> I think it is worth considering developing such a
> tight RDBMS integration if you consider the issues
> commonly associated with using Lucene:
> 1) Sorting on float/date fields and associated memory
> consumption
> 2) Representing numbers/dates in Lucene (eg having to
> pad with sufficent leading zeros and add to index's
> list of terms)
> 3) Retrieving only certain stored fields from a
> document (all storage can be done in db)
> 4) Issues to do with updating volatile data eg price
> data used in sorts
> 5) Manually coding joins with RDBMS content as custom
> filters
> 6) Too-many terms exceptions produced by range queries
> 7) Grouping results eg by website
> 8) Boosting docs based on stored content eg date
> 
> I'm not saying there aren't answers to the above using
> Lucene. However,I do wonder if these can be addressed
> more effectively in a project which seeks tighter
> integration with an RDBMS and leveraging its
> capabilities.
> 
> Any one else been down this route?
> 
> 
> ___
> ALL-NEW Yahoo! Messenger - all new features - even more fun! 
> http://uk.messenger.yahoo.com
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene query (sql kind)

2005-01-28 Thread mark harwood
I've added some user-defined lucene functions to
HSQLDB and I've been able to run queries like the
following one:

select top 10 lucene_highlight(adText) from ads where
pricePounds <200  and lucene_query('bass guitar
drums',id)>0 order by lucene_score(id) DESC

I've had similar success with Derby (Cloudscape).
This approach has some appeal and I've been able to
use the same class as a UDF in both databases but it
does have issues: it looks like this UDF based
integration won't scale. The above query took 80
milliseconds using 10,000 records. Another
index/database with 50,000 records was taking a matter
of seconds. I think a scalable integration is likely
to require modification of the core RDBMS code.

I think it is worth considering developing such a
tight RDBMS integration if you consider the issues
commonly associated with using Lucene:
1) Sorting on float/date fields and associated memory
consumption
2) Representing numbers/dates in Lucene (eg having to
pad with sufficent leading zeros and add to index's
list of terms)
3) Retrieving only certain stored fields from a
document (all storage can be done in db)
4) Issues to do with updating volatile data eg price
data used in sorts
5) Manually coding joins with RDBMS content as custom
filters
6) Too-many terms exceptions produced by range queries
7) Grouping results eg by website
8) Boosting docs based on stored content eg date

I'm not saying there aren't answers to the above using
Lucene. However,I do wonder if these can be addressed
more effectively in a project which seeks tighter
integration with an RDBMS and leveraging its
capabilities.

Any one else been down this route?








___ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! 
http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene query (sql kind)

2005-01-28 Thread sunil goyal
Hello,

Thanks, It works fine.

> The field parameter simply defines the default field for all queries
> without an explicit field specification (:).
> Using 'field AND field' as default field does not make sense but does
> not hurt as long as the default field is not used.
> I'm not sure why you choose that.
I just thought that Query Parser needs to be specifies what it should
expect before hand. So did "field AND field". But I was wrong.

> Further name:\"john\" and name:john should be the same.
Just in case it's not "john" but "hello john" or some phrase.

Regards
Sunil



On Fri, 28 Jan 2005 13:26:26 +0100, Morus Walter <[EMAIL PROTECTED]> wrote:
> sunil goyal writes:
> >
> > I was just trying that...
> >
> > QueryParser qp = new QueryParser("field AND field", new StandardAnalyzer());
> > Query query = qp.parse("name:\"john\" AND age:[10 TO 16]");
> >
> > It works fine with this. Do I need to specify that QueryParser should
> > expect things in order
> > "field AND field". Or can I do without it?
> > 
> The field parameter simply defines the default field for all queries
> without an explicit field specification (:).
> Using 'field AND field' as default field does not make sense but does
> not hurt as long as the default field is not used.
> I'm not sure why you choose that.
> 
> Further name:\"john\" and name:john should be the same.
> 
> HTH
>   Morus
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene query (sql kind)

2005-01-28 Thread Morus Walter
sunil goyal writes:
> 
> I was just trying that...
> 
> QueryParser qp = new QueryParser("field AND field", new StandardAnalyzer());
> Query query = qp.parse("name:\"john\" AND age:[10 TO 16]");
> 
> It works fine with this. Do I need to specify that QueryParser should
> expect things in order
> "field AND field". Or can I do without it?
> 
The field parameter simply defines the default field for all queries 
without an explicit field specification (:).
Using 'field AND field' as default field does not make sense but does
not hurt as long as the default field is not used.
I'm not sure why you choose that.

Further name:\"john\" and name:john should be the same.

HTH
  Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene query (sql kind)

2005-01-28 Thread David Escuer
I've merged some different fields in one query, with the name of one of 
these fields as the second parameter in the
static method, and it worked fine.
Also, you can do a little query parser, and build the queries with 
BooleanQuery.

David
sunil goyal wrote:
Hello,
I was just trying that...
QueryParser qp = new QueryParser("field AND field", new StandardAnalyzer());
Query query = qp.parse("name:\"john\" AND age:[10 TO 16]");
It works fine with this. Do I need to specify that QueryParser should
expect things in order
"field AND field". Or can I do without it?
The static method of QueryParser.parse(String , String, Analyzer) -
expects the first string to be the query and second to be the field.
Thanks
Regards
Sunil
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: lucene query (sql kind)

2005-01-28 Thread sunil goyal
Hello,

I was just trying that...

QueryParser qp = new QueryParser("field AND field", new StandardAnalyzer());
Query query = qp.parse("name:\"john\" AND age:[10 TO 16]");

It works fine with this. Do I need to specify that QueryParser should
expect things in order
"field AND field". Or can I do without it?

The static method of QueryParser.parse(String , String, Analyzer) -
expects the first string to be the query and second to be the field.

Thanks

Regards
Sunil

On Fri, 28 Jan 2005 12:54:27 +0100, David Escuer
<[EMAIL PROTECTED]> wrote:
> 
> Hello,
>To build queries, you can generate a query like "(text:house OR
> text:car) AND (keywords:building)", and then
>parse it with the QueryParser.parse method to get the Lucene query.
> Is not 100% sql-like syntax, but it's more clear
>than the lucene syntax.
> 
> Hope it helps
> 
> David
> 
> sunil goyal wrote:
> 
> >Hello all,
> >
> >I want to run dynamic queries against the lucene index. Is there any
> >native syntax available for Lucene so that I can query, by first
> >generating the query in say an XML or SQL like format (cache this
> >query) and then  use this query over lucene index.
> >
> >
> >e.g. So a lucene query syntax in which I can define a query
> >(name="john" AND age <10)  and then I can just use this query to
> >execute over Lucene index.
> >
> >
> >Regards
> >Sunil
> >
> >-
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene query (sql kind)

2005-01-28 Thread David Escuer
Hello,
  To build queries, you can generate a query like "(text:house OR 
text:car) AND (keywords:building)", and then
  parse it with the QueryParser.parse method to get the Lucene query. 
Is not 100% sql-like syntax, but it's more clear
  than the lucene syntax.

Hope it helps
David
sunil goyal wrote:
Hello all,
I want to run dynamic queries against the lucene index. Is there any
native syntax available for Lucene so that I can query, by first
generating the query in say an XML or SQL like format (cache this
query) and then  use this query over lucene index.
e.g. So a lucene query syntax in which I can define a query
(name="john" AND age <10)  and then I can just use this query to
execute over Lucene index.
Regards
Sunil
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: lucene query (sql kind)

2005-01-28 Thread PA
On Jan 28, 2005, at 12:40, sunil goyal wrote:
I want to run dynamic queries against the lucene index. Is there any
native syntax available for Lucene so that I can query, by first
generating the query in say an XML or SQL like format (cache this
query) and then  use this query over lucene index.
Talking of which, did anyone contemplated the possibility of a 
JDBC adaptor of sort for Lucene?

Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


lucene query (sql kind)

2005-01-28 Thread sunil goyal
Hello all,

I want to run dynamic queries against the lucene index. Is there any
native syntax available for Lucene so that I can query, by first
generating the query in say an XML or SQL like format (cache this
query) and then  use this query over lucene index.


e.g. So a lucene query syntax in which I can define a query
(name="john" AND age <10)  and then I can just use this query to
execute over Lucene index.


Regards
Sunil

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: query term frequency

2005-01-28 Thread Erik Hatcher
On Jan 27, 2005, at 10:24 PM, Jonathan Lasko wrote:
No, the number of occurrences of a term in a Query.
Nothing built-in gives you this.  You'd have to dissect the Query  
clause-by-clause and cast each clause to the proper type to pull the  
terms from them.  The Highlighter code does this.

If there is a better way, I'd like to know.
Erik

Jonathan
Quoting David Spencer <[EMAIL PROTECTED]>:
Jonathan Lasko wrote:
What do I call to get the term frequencies for terms in the Query?  I
can't seem to find it in the Javadoc...
Do you mean the # of docs that have a term?

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ 
IndexReader.html#docFreq(org.apache.lucene.index.Term)
Thanks.
Jonathan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: query term frequency

2005-01-27 Thread Jonathan Lasko
No, the number of occurrences of a term in a Query.

Jonathan

Quoting David Spencer <[EMAIL PROTECTED]>:

> Jonathan Lasko wrote:
> 
> > What do I call to get the term frequencies for terms in the Query?  I 
> > can't seem to find it in the Javadoc...
> 
> Do you mean the # of docs that have a term?
> 
>
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term)
> > Thanks.
> > 
> > Jonathan
> > 
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: query term frequency

2005-01-27 Thread David Spencer
Jonathan Lasko wrote:
What do I call to get the term frequencies for terms in the Query?  I 
can't seem to find it in the Javadoc...
Do you mean the # of docs that have a term?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term)
Thanks.
Jonathan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


query term frequency

2005-01-27 Thread Jonathan Lasko
What do I call to get the term frequencies for terms in the Query?  I 
can't seem to find it in the Javadoc...
Thanks.

Jonathan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-25 Thread Pierrick Brihaye
Hi,
David Spencer a écrit :
Do you plan to add expansion on other Wordnet relationships ? 
Hypernyms and hyponyms would be a good start point for thesaurus-like 
search, wouldn't it ?
Good point, I hadn't considered this - but how would it work -just 
consider these 2 relationships "synonyms" (thus easier to use) or make 
it separate (too academic?)
Well... the ideal case would be (easy) customization :-), form an 
external text (XML ?) file. Depending of the kind of relationship, the 
boost factor could be adjusted when the query is expanded. The same on 
relationships' depths.

For example a "father" hypernym could have a boost factor of 0.8, a 
"grand-father" a boost factor of 0.4, a "grand-grand-father" a boost 
factor of 0.2. Well, I wonder whether a logarithmic scale makes a better 
sense than a linear scale, but this should/would be customizable...

However, I'm afraid that this kind of feature would require 
refactoring, probably based on WordNet-dedicated libraries. JWNL 
(http://jwordnet.sourceforge.net/) may be a good candidate for this.
Good point, should leverage existing code.
One thing you can also easily get from this library are Wordnet's 
"exceptions", often irregular plurals (mouse/mice, addendum/addenda...). 
A very basic yet efficient kind of stemming which should be expanded 
with the same boost factor than the original term.

Well, there are many other relationships in WordNet. Take a look at :
http://jws-champo.ac-toulouse.fr:8080/treebolic-wordnet/
legends are here :
http://treebolic.sourceforge.net/en/browserwn.htm
Cheers,
--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:[EMAIL PROTECTED]
+33 (0)2 99 29 67 78
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-24 Thread David Spencer
Pierrick Brihaye wrote:
Hi,
David Spencer a écrit :
One example of expansion with the synonym boost set to 0.9 is the 
query "big dog" expands to:

Interesting.
Do you plan to add expansion on other Wordnet relationships ? Hypernyms 
and hyponyms would be a good start point for thesaurus-like search, 
wouldn't it ?
Good point, I hadn't considered this - but how would it work -just 
consider these 2 relationships "synonyms" (thus easier to use) or make 
it separate (too academic?)
However, I'm afraid that this kind of feature would require refactoring, 
probably based on WordNet-dedicated libraries. JWNL 
(http://jwordnet.sourceforge.net/) may be a good candidate for this.
Good point, should leverage existing code.

Thank you for your work.
thx,
 Dave
Cheers,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-19 Thread Pierrick Brihaye
Hi,
David Spencer a écrit :
One example of expansion with the synonym boost set to 0.9 is the query 
"big dog" expands to:
Interesting.
Do you plan to add expansion on other Wordnet relationships ? Hypernyms 
and hyponyms would be a good start point for thesaurus-like search, 
wouldn't it ?

However, I'm afraid that this kind of feature would require refactoring, 
probably based on WordNet-dedicated libraries. JWNL 
(http://jwordnet.sourceforge.net/) may be a good candidate for this.

Thank you for your work.
Cheers,
--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:[EMAIL PROTECTED]
+33 (0)2 99 29 67 78
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


MoreLikeThis and other similarity query generators checked in + online demo

2005-01-17 Thread David Spencer
Based on mail from Doug I wrote a "more like this" query generator, 
named, well, MoreLikeThis. Bruce Ritchie and Mark Harwood made changes 
to it (esp term vector support) and bug fixes. Thanks to everyone.

I've checked in the code to the sandbox under contributions/similarity.
The package it ends up at is org.apache.lucene.search.similar -- hope 
that makes sense.

I also created a class, SimilarityQueries, to hold other methods of 
similarity query generation. The 2 methods in there are "dumber" 
variations that use the entire source of the target doc to from a large 
query.

Javadoc is here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/similarity/build/docs/api/org/apache/lucene/search/similar/package-summary.html
Online demo here - this page below compares the 3 variations on 
detecting similar docs. The timing info (3 numbers w/ "(ms)") may be 
suspect. Also note if you scroll to the bottom you can see the queries 
that were generated.

Here's a page showing docs similar to the entry for Iraq:
http://www.searchmorph.com/kat/wikipedia-compare.jsp?s=Iraq
And here's one for docs similar to the one on Garry Kasparov (he knows 
how to play chess :) ):

http://www.searchmorph.com/kat/wikipedia-compare.jsp?s=Garry_Kasparov
To get to it you start here:
http://www.searchmorph.com/kat/wikipedia.jsp
And search for something - on the search results page follow a "cmp" link
http://www.searchmorph.com/kat/wikipedia.jsp?s=iraq
Make sense? Useful? Has anyone done any other variations (e.g. cosine 
measure)?

- Dave
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Boolean Query

2005-01-14 Thread Ryan Aslett
Um.. Nevermind.. I figured it out.. I was using the StandardAnalyzer
when I built my index and thus didn't have N or St in the index itself.

R 

-Original Message-
From: Ryan Aslett 
Sent: Friday, January 14, 2005 11:18 AM
To: Lucene Users List
Subject: Boolean Query

 
Okay, Im not grokking something here. Im trying to run a query that
returns only the results that have *all* of the terms in my query
string.

When I run this query, which I construct myself and do a


Analyzer analyzer = new WhitespaceAnalyzer(); 
QueryParser qp = new QueryParser("address", analyzer);
    Query addyQ = qp.parse(queryString);

I get the following in addyQ.toString: 

Query: +address:122 +address:N +address:30th +address:St
0 total matching documents in 19 ms
So the querystring is unchanged from what I created, and I get zilch.

However, when I do this:

Analyzer analyzer = new WhitespaceAnalyzer();
BooleanQuery baQ = new BooleanQuery();
Query parsedAddyQ = qp.parse("122 N 30th St");
baQ.add(parsedAddyQ, true, false);

I get this:
Query: +(address:122 address:N address:30th address:St)
62 total matching documents in 8 ms
0. 122 N 30th St
1. PO Box 122
2. PO Box 122
3. 122 N 9th St
4. 122 S Clay Ave
5. 122 E 3rd St
...

So.. I want that first result, I know its in there, its matching as the
highest match, but not when I require all 4 tokens? What gives? What am
I doing wrong?

Also, if it matters Im running the query on a parallelMultiSearcher with
280 indexes of 1 million records each.

Ryan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Boolean Query

2005-01-14 Thread Ryan Aslett
 
Okay, Im not grokking something here. Im trying to run a query that
returns only the results that have *all* of the terms in my query
string.

When I run this query, which I construct myself and do a


Analyzer analyzer = new WhitespaceAnalyzer(); 
QueryParser qp = new QueryParser("address", analyzer);
    Query addyQ = qp.parse(queryString);

I get the following in addyQ.toString: 

Query: +address:122 +address:N +address:30th +address:St
0 total matching documents in 19 ms
So the querystring is unchanged from what I created, and I get zilch.

However, when I do this:

Analyzer analyzer = new WhitespaceAnalyzer();
BooleanQuery baQ = new BooleanQuery();
Query parsedAddyQ = qp.parse("122 N 30th St");
baQ.add(parsedAddyQ, true, false);

I get this:
Query: +(address:122 address:N address:30th address:St)
62 total matching documents in 8 ms
0. 122 N 30th St
1. PO Box 122
2. PO Box 122
3. 122 N 9th St
4. 122 S Clay Ave
5. 122 E 3rd St
...

So.. I want that first result, I know its in there, its matching as the
highest match, but not when I require all 4 tokens? What gives? What am
I doing wrong?

Also, if it matters Im running the query on a parallelMultiSearcher with
280 indexes of 1 million records each.

Ryan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-14 Thread Ian Soboroff
Daniel Naber <[EMAIL PROTECTED]> writes:

> On Wednesday 12 January 2005 01:47, David Spencer wrote:
>
>> Amusingly then, documents with the terms "liberal wienerwurst" match
>> "big dog"! :)
>
> There's something like frequency information in WordNet, it could probably 
> be used to ignore the uncommon meanings.

If you just go search CiteSeer for "WordNet", you will find the output
of every failed MS thesis experiment to improve retrieval performance
by naive application of WordNet synsets.

But I like the query expansion code.

Ian



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-12 Thread Daniel Naber
On Wednesday 12 January 2005 01:47, David Spencer wrote:

> Amusingly then, documents with the terms "liberal wienerwurst" match
> "big dog"! :)

There's something like frequency information in WordNet, it could probably 
be used to ignore the uncommon meanings.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-11 Thread David Spencer
Erik Hatcher wrote:
On Jan 10, 2005, at 6:54 PM, David Spencer wrote:
Hi...I wrote the WordNet sandbox code - but I'm not sure if I 
undertand this thread. Are we saying that it does not work w/ the new 
WordNet data, or that code in Eric's book is better/more up to date etc?

I have not tried the sandbox with any versions past WordNet 1.6.  
Karthik shows a Java API to it, which I have not used - only your code 
that parses the prolog files.  So the book code explains exactly what is 
in the sandbox and describes WordNet 1.6 integration.  Though WordNet 
has evolved.

If needed I can update the sandbox code..

It'd be awesome to have current WordNet support - I haven't looked at 
what is involved in making it so.

I verified that the code works w/ the latest WordNet (2.0), and it does 
so, no problem. The relevant data from WordNet has not changed so 
there's no need to upgrade WordNet for this package at least.

I added "query expansion" which takes in a simple query string and for 
every term adds their synonyms. There's an optional boost parameter to 
be used to "penalize" synonyms if you want to use the heuristic that the 
 user probably knows the right word.

One example of expansion with the synonym boost set to 0.9 is the query 
"big dog" expands to:

big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 
bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 
giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 
magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 
vainglorious^0.9 vauntingly^0.9
 dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 
detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 
heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 
trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9

Amusingly then, documents with the terms "liberal wienerwurst" match 
"big dog"! :)

Javadoc is here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/package-summary.html
The new query expansion is here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/SynExpand.html
Want to try it out? This page *expands* a query and prints out the 
result (but doesn't execute it yet).
http://www.searchmorph.com/kat/synonym.jsp?syn=big

CVS tree here:
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/
If you just want to use a prebuild index it's here (1MB):
http://searchmorph.com/pub/syn_index.zip
The prebuilt jar file is here:
http://www.searchmorph.com/pub/lucene-wordnet-dev.jar
Redundant weblog entry here:
http://www.searchmorph.com/weblog/index.php?id=34
Hope y'all like it and someone finds it useful,
  Dave
PS
 Oh - it may need the 1.5 dev branch of Lucene to work - I'm not 
positive but it I tried to remove deprecated warnings and doing so may 
have tied it to the latest code...

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Query based stemming

2005-01-07 Thread David Spencer
Jim Lynch wrote:
 From what I've read, if you want to have a choice, the easiest way is 
to index the documents twice. Once with stemming on and once with it off 
placing the results in two different indexes.  Then at query time, 
select which index you want to use based on whether you want stemming on 
or off.
IMHO keeping the data in the same index is easiest.
PerFieldAnalyzerWrapper is part of the magic...approx uasge follows from 
my code below. Second magic is to call doc.add(...) multiple times, 
"redundantly".

Don't use code below exactly however - things like MySnowballAnalyzer 
should become SnowballAnalyzer in your code...

Analyzer fa;
Analyzer getAnalyzer()
{
	Analyzer snowball = new MySnowballStopAnalyzer();
	Analyzer def = new AlphaNumStopAnalyzer();  // prob StandardAnalyzer 
for most people..
	PerFieldAnalyzerWrapper fa = new PerFieldAnalyzerWrapper( def);
	fa.addAnalyzer( "scontents", snowball);  // "s" in "scontents" if for 
stemming
	fa.addAnalyzer( "stitle", snowball);		
	return fa;
}

...
later:
Document doc = new Document();
doc.add( Field.Text( "title", title));
doc.add( Field.Text( "stitle", new StringReader( title))); // don't need 
recall
String body = ...;
doc.add( Field.Text( "contents", new StringReader( body), true)); // 
term vector
doc.add( Field.Text( "scontents", new StringReader( body)));
writer.addDocument( doc);


Jim.
Peter Kim wrote:
Hi,
I'm new to Lucene, so I apologize if this issue has been discussed
before (I'm sure it has), but I had a hard time finding an answer using
google. (Maybe this would be a good candidate for the FAQ!) :)
Is it possible to enable stem queries on a per-query basis? It doesn't
seem to be possible since the stem tokenizing is done during the
indexing process. Are people basically stuck with having all their
queries stemmed or none at all?
Thanks!
Peter
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Query based stemming

2005-01-07 Thread Chris Hostetter

: >Is it possible to enable stem queries on a per-query basis? It doesn't
: >seem to be possible since the stem tokenizing is done during the
: >indexing process. Are people basically stuck with having all their
: >queries stemmed or none at all?

:  From what I've read, if you want to have a choice, the easiest way is
: to index the documents twice. Once with stemming on and once with it off
: placing the results in two different indexes.  Then at query time,
: select which index you want to use based on whether you want stemming on
: or off.

As I understand it, the intented place to impliment Stemming is in an
Analyzer Filter (not to be confused with a search Filter).  Since you can
can specify an Analyzer when you call addDocument, you don't have to
acctually have two seperate indexes, you could just have all the docs in
one index - and use a search Filter to indicate which docs to look at.

Alternately: the Analyzer's tokenStream method is given the fieldName
being analyzed, so you could write an Analyzer with a set of rules
telling it to only apply your Stemming filter to certain fields, and
then instead of having twice as many documents, you can just index your
text in two seperate fields (which should be a little easier, then
seperate docs because you are only duplicating the fields where stemming
is relevant)  Then at search time you don't have to filter anything, just
search the field that's applicable to your current desire (stemmed or
unstemmed)

Lastely: Allthough it's tricky to get correct, there's no law saying you
have to use the same Analyzer when you query as when you index.  You could
index your documents using an Analyzer that does no stemming, and then at
search time (if you want stemming) use an Analyzer that does "reverse
stemming" to expand your query terms out to all the possible variants.


(NOTE: I've never acctaully tried this, but i think the theory is sound).


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query based stemming

2005-01-07 Thread Jim Lynch
From what I've read, if you want to have a choice, the easiest way is 
to index the documents twice. Once with stemming on and once with it off 
placing the results in two different indexes.  Then at query time, 
select which index you want to use based on whether you want stemming on 
or off.

Jim.
Peter Kim wrote:
Hi,
I'm new to Lucene, so I apologize if this issue has been discussed
before (I'm sure it has), but I had a hard time finding an answer using
google. (Maybe this would be a good candidate for the FAQ!) :)
Is it possible to enable stem queries on a per-query basis? It doesn't
seem to be possible since the stem tokenizing is done during the
indexing process. Are people basically stuck with having all their
queries stemmed or none at all?
Thanks!
Peter
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Query based stemming

2005-01-07 Thread Peter Kim
Hi,

I'm new to Lucene, so I apologize if this issue has been discussed
before (I'm sure it has), but I had a hard time finding an answer using
google. (Maybe this would be a good candidate for the FAQ!) :)

Is it possible to enable stem queries on a per-query basis? It doesn't
seem to be possible since the stem tokenizing is done during the
indexing process. Are people basically stuck with having all their
queries stemmed or none at all?

Thanks!
Peter

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Span Query Performance

2005-01-06 Thread Paul Elschot
Sorry for the duplicate on lucene-dev, it should have gone to lucene-user 
directly:

A bit more:

On Thursday 06 January 2005 10:22, Paul Elschot wrote:
> On Thursday 06 January 2005 02:17, Andrew Cunningham wrote:
> > Hi all,
> > 
> > I'm currently doing a query similar to the following:
> > 
> > for w in wordset:
> >     query = w near (word1 V word2 V word3 ... V word1422);
> >     perform query
> > 
> > and I am doing this through SpanQuery.getSpans(), iterating through the 
> > spans and counting
> > the matches, which can result in 4782282 matches (essentially I am only 
> > after the match count).
> > The query works but the performance can be somewhat slow; so I am 
wondering:
> > 
...
> > c) Is there a faster method to what I am doing I should consider?
> 
> Preindexing all word combinations that you're interested in.
> 

In case you know all the words in advance, you could also index a
helper word at the same position as each of those words.
This requires a custom analyzer that inserts the helper word in the
token stream with a zero position increment.
The query then simplifies to:
query = w near helperword
which would probably speed things up significantly.

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Span Query Performance

2005-01-06 Thread Paul Elschot
On Thursday 06 January 2005 02:17, Andrew Cunningham wrote:
> Hi all,
> 
> I'm currently doing a query similar to the following:
> 
> for w in wordset:
> query = w near (word1 V word2 V word3 ... V word1422);
> perform query
> 
> and I am doing this through SpanQuery.getSpans(), iterating through the 
> spans and counting
> the matches, which can result in 4782282 matches (essentially I am only 
> after the match count).
> The query works but the performance can be somewhat slow; so I am wondering:
> 
> a) Would the query potentially run faster if I used 
> Searcher.search(query) with a custom similarity,
> or do both methods essentially use the same mechanics

It would be somewhat slower, because it loops over the getSpans()
and computes document scores and constructs a Hits from the scores.

> b) Does using a RAMDirectory improve query performance any significant 
> amount.

That depends on your operating system, the size of the index, the amount
of RAM you can use, the file buffering efficiency, other loads on the 
computer ...
 
> c) Is there a faster method to what I am doing I should consider?

Preindexing all word combinations that you're interested in.

Regards,
Paul Elschot
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Span Query Performance

2005-01-05 Thread Andrew Cunningham
Hi all,
I'm currently doing a query similar to the following:
for w in wordset:
   query = w near (word1 V word2 V word3 ... V word1422);
   perform query
and I am doing this through SpanQuery.getSpans(), iterating through the 
spans and counting
the matches, which can result in 4782282 matches (essentially I am only 
after the match count).
The query works but the performance can be somewhat slow; so I am wondering:

a) Would the query potentially run faster if I used 
Searcher.search(query) with a custom similarity,
or do both methods essentially use the same mechanics

b) Does using a RAMDirectory improve query performance any significant 
amount.

c) Is there a faster method to what I am doing I should consider?
Thanks,
Andrew
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Correct query

2004-12-27 Thread Erik Hatcher
On Dec 27, 2004, at 6:28 AM, Alex Kiselevski wrote:
Thanks Erik,
I use StandardAnalyze to index RPG/4.
I use StandardAnalyzer and IndexSearcher with TermQuery without
QueryParser. So, I thought that as a result of query
Text:RPG I still have to get some hit, but it didn't happen.

   StandardAnalyzer:
 [rpg/4]
As you can see, StandardAnalyzer tokenized RPG/4 as "rpg/4".  A 
TermQuery must be *exactly* that to match.  You could, alternatively, 
bypass QueryParser and use the analyzer directly making a TermQuery (or 
PhraseQuery) out of the results.  I do this in quite a few queries in 
the system I'm building for my primary work to allow queries against 
some library archives.

This is one of the most critical, but seemingly misunderstood, aspect 
to using Lucene effectively - how to manage the analysis process and 
match it to the searching side.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Correct query

2004-12-27 Thread Alex Kiselevski

Thanks Erik,
I use StandardAnalyze to index RPG/4.
I use StandardAnalyzer and IndexSearcher with TermQuery without
QueryParser. So, I thought that as a result of query
Text:RPG I still have to get some hit, but it didn't happen.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, December 27, 2004 11:51 AM
To: Lucene Users List
Subject: Re: Correct query



On Dec 27, 2004, at 3:21 AM, Alex Kiselevski wrote:
> Hello,
> I indexed some document that included a word RPG/4.
> So, when I made a search - I builded a query
>
> Text:RPG but it didn't find a thing only Text:RPG/4 gave me the
> correct result. Tell me please what have I do to build a a dynamic
> (not hardcoded like in this example )query to get right results

What Analyzer did you use?   Are you using QueryParser and using the
same analyzer with it?  Please read the AnalysisParalysis page on the
wiki.  Also, running the AnalyzerDemo from Lucene in Action's source
code yields this, which should help illuminate the situation:

$ ant -emacs AnalyzerDemo
Buildfile: build.xml

AnalyzerDemo:

   Demonstrates analysis of sample text.

   Refer to the "Analysis" chapter for much more on this
   extremely crucial topic.

Press return to continue...

String to analyze: [This string will be analyzed.]
RPG/4
Running lia.analysis.AnalyzerDemo...
Analyzing "RPG/4"
   WhitespaceAnalyzer:
 [RPG/4]

   SimpleAnalyzer:
 [rpg]

   StopAnalyzer:
 [rpg]

   StandardAnalyzer:
 [rpg/4]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


The information contained in this message is proprietary of Amdocs,
protected from disclosure, and may be privileged.
The information is intended to be conveyed only to the designated recipient(s)
of the message. If the reader of this message is not the intended recipient,
you are hereby notified that any dissemination, use, distribution or copying of
this communication is strictly prohibited and may be unlawful.
If you have received this communication in error, please notify us immediately
by replying to the message and deleting it from your computer.
Thank you.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  1   2   3   4   5   6   7   8   >