Re: Filter for a search refinement

2004-11-21 Thread Erik Hatcher
Nicolas - how does your filter differ from the capabilities available 
from the built-in QueryFilter?  It seems at first glance to be nearly 
the same thing.

Erik
On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
I developped a filter to seach in filtering the search with anterior
hits (search refinement)
see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334
Nicolas Maisonneuve
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: cache check?

2004-11-21 Thread Erik Hatcher
On Nov 20, 2004, at 5:49 PM, Vic wrote:
Erik Hatcher wrote:
The Hits object already does some most-recently-used caching.
Is there any docs on this or should I look in source?
The caching is there to avoid disk access of the Lucene index for the 
documents most likely to be accessed next.

I plan on terabytes search
That's quite a lot of data.  You'll have to do more than just use plain 
Lucene to handle this much data, of course.

I have no idea how fast Lucene will be untill I am done and loaded and 
have querries coming in, but I know I will need to manage the cache.
My advice would be to not worry about caching unless and until you need 
it.  You're searching terrabytes, you say, but that does not mean you 
are accessing every single document that comes back from searches.  One 
big issue is how you access the documents you get back from hits - 
accessing a document is when Lucene goes to the index and retrieves 
(currently) the entire document including all the stored fields.  
Minimizing the documents you access in this way (say displaying 10 or 
20 at a time, which is typical) is wise.

I really don't see a need for any custom caching on top of Lucene.  
Remember the rule of optimization: don't.  And for experts only: don't 
do it yet.  :)

It depends on how good and tuneable is "some LRU caching" in Hits. Is 
it  soft? Can it take up 2 gigs of ram?
Hits is not tunable.  It caches up to 200 documents.  Though you can 
use Lucene's lower-level search() API methods to do some of your own 
magic if you like - look to see how Hits does its thing with the basic 
search(Query) method.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Filter for a search refinement

2004-11-21 Thread Nicolas Maisonneuve
yes ...it's the same kind of feature... (i didn't see this Filter !,
shame on me)
but my method is maybe faster because with the queryFilter an internal
search is launched and not with my method

nicolas



On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
> Nicolas - how does your filter differ from the capabilities available
> from the built-in QueryFilter?  It seems at first glance to be nearly
> the same thing.
> 
> Erik
> 
> 
> 
> 
> On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
> 
> > I developped a filter to seach in filtering the search with anterior
> > hits (search refinement)
> >
> > see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334
> >
> > Nicolas Maisonneuve
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Filter for a search refinement

2004-11-21 Thread Nicolas Maisonneuve
hmm just a question ..

- in the normal indexSearcher method  
there is a  if (score >0.0F || filter.get(doc)) { doc  in the hit}

- but in the queryFilter , there  isn't a minimum score condition 

normal or not ?

nicolas



On Sun, 21 Nov 2004 14:34:00 +0100, Nicolas Maisonneuve
<[EMAIL PROTECTED]> wrote:
> yes ...it's the same kind of feature... (i didn't see this Filter !,
> shame on me)
> but my method is maybe faster because with the queryFilter an internal
> search is launched and not with my method
> 
> nicolas
> 
> 
> 
> 
> On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher
> <[EMAIL PROTECTED]> wrote:
> > Nicolas - how does your filter differ from the capabilities available
> > from the built-in QueryFilter?  It seems at first glance to be nearly
> > the same thing.
> >
> > Erik
> >
> >
> >
> >
> > On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
> >
> > > I developped a filter to seach in filtering the search with anterior
> > > hits (search refinement)
> > >
> > > see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334
> > >
> > > Nicolas Maisonneuve
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Filter for a search refinement

2004-11-21 Thread Erik Hatcher
QueryFilter keys off the hits from a previous search to light up the  
bits for documents to pass the filter.  The previous search hits all  
have a score > 0 already, so no need to be concerned with score there.

Erik
On Nov 21, 2004, at 8:49 AM, Nicolas Maisonneuve wrote:
hmm just a question ..
- in the normal indexSearcher method
there is a  if (score >0.0F || filter.get(doc)) { doc  in the hit}
- but in the queryFilter , there  isn't a minimum score condition
normal or not ?
nicolas

On Sun, 21 Nov 2004 14:34:00 +0100, Nicolas Maisonneuve
<[EMAIL PROTECTED]> wrote:
yes ...it's the same kind of feature... (i didn't see this Filter !,
shame on me)
but my method is maybe faster because with the queryFilter an internal
search is launched and not with my method
nicolas

On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
Nicolas - how does your filter differ from the capabilities available
from the built-in QueryFilter?  It seems at first glance to be nearly
the same thing.
Erik

On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
I developped a filter to seach in filtering the search with anterior
hits (search refinement)
see the patch  
http://issues.apache.org/bugzilla/show_bug.cgi?id=32334

Nicolas Maisonneuve
 
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Filter for a search refinement

2004-11-21 Thread Erik Hatcher
On Nov 21, 2004, at 8:34 AM, Nicolas Maisonneuve wrote:
yes ...it's the same kind of feature... (i didn't see this Filter !,
shame on me)
but my method is maybe faster because with the queryFilter an internal
search is launched and not with my method
It'd be interesting for you to compare the speed and see.  In your 
approach you are still doing two searches, with the first one used to 
get the documents available for the second search.

QueryFilter also caches.  Filters are really designed for long standing 
use, not a one time query refinement.  For query refinement (search 
within search), simply combine the two queries in a BooleanQuery as 
required clauses.

Erik

On Sun, 21 Nov 2004 05:06:12 -0500, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
Nicolas - how does your filter differ from the capabilities available
from the built-in QueryFilter?  It seems at first glance to be nearly
the same thing.
Erik

On Nov 21, 2004, at 4:52 AM, Nicolas Maisonneuve wrote:
I developped a filter to seach in filtering the search with anterior
hits (search refinement)
see the patch http://issues.apache.org/bugzilla/show_bug.cgi?id=32334
Nicolas Maisonneuve
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: cache check?

2004-11-21 Thread Vic
Erik Hatcher wrote:
Remember the rule of optimization: don't.  And for experts only: don't 
do it yet.  :)

:-)
.V
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


disadvantages

2004-11-21 Thread Miguel Angel
What are disadvantages the Lucene?? 
-- 
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: disadvantages

2004-11-21 Thread Terry Steichen
Compared to what?
  - Original Message - 
  From: Miguel Angel 
  To: [EMAIL PROTECTED] 
  Sent: Sunday, November 21, 2004 12:00 PM
  Subject: disadvantages


  What are disadvantages the Lucene?? 
  -- 
  Miguel Angel Angeles R.
  Asesoria en Conectividad y Servidores
  Telf. 97451277

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



Re: disadvantages

2004-11-21 Thread Erik Hatcher
On Nov 21, 2004, at 12:00 PM, Miguel Angel wrote:
What are disadvantages the Lucene??
The users of your system won't have time to get coffee when running 
searches.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Using multiple analysers within a query

2004-11-21 Thread Kauler, Leto S
Hi Lucene list,

We have the need for analysed and 'not analysed/not tokenised' clauses
within one query.  Imagine an unparsed query like:

+title:"Hello World" +path:Resources\Live\1

In the above example we would want the first clause to use
StandardAnalyser and the second to use an analyser which returns the
term as a single token.  So a parsed result might look like:

+(title:hello title:world) +path:Resources\Live\1

Would anyone have any suggestions on how this could be done?  I was
thinking maybe the QueryParser would have to be changed/extended to
accept a separator other than colon ":", something like "=" for example
to indicate this clause is not to be tokenised.  Or perhaps this can all
be done using a single analyser?

Regards (and excuse the disclaimer),
--Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: disadvantages

2004-11-21 Thread Nader Henein
You may singe your fingers if you touch the keyboard during indexing
Nader
Miguel Angel wrote:
What are disadvantages the Lucene?? 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


LIMO 0.5 released

2004-11-21 Thread Luke Francl
I am pleased to announce that version 0.5 of LIMO, the Lucene Index Monitor, 
has been released.

LIMO is a web application that allows you to browse your Lucene indexes 
remotely. It is an ideal companion for Lucene applications that run in a 
servlet container.

The 0.5 release adds some cool new features such as:

* More index summary statistics, including index version number, deletion 
status, number of documents, number of fields, number of indexed fields, and 
number of unindexed fields.
* Querying the index.
* Display expanded wild card and range queries (using Query.rewrite) with term 
count so you can see how many terms a complex query is expanded to. This is 
particularly helpful if you are trying to track down an annoying TooManyClauses 
exception.
* Query timing to show how expensive queries are.
* Estimated query memory consumption (as given by the formula in this message: 
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1757461).
* Query result count.
* Query result explanation.
* Stored field reconstruction as in Luke.
* Highlighting of matching terms in search results and reconstructed documents 
using Mark Harwood's library.

LIMO requires Java 1.4 or later and a servlet container.

Download it from SourceForge: http://sourceforge.net/projects/limo/ 

LIMO is still ready to go out of the box (er, war file). Just edit the web.xml 
to point LIMO to your indexes.

Thanks to Julien Nioche for starting a great and very useful project and 
letting me join it; and to Andrzej Bialecki for Luke from which I appropriated 
several ideas and his GrowableStringArray class. If you are interested in 
getting involved, LIMO is now available in SourceForge CVS.

Regards,
Luke Francl


RE: disadvantages

2004-11-21 Thread Luke Francl
Well that really depends on how big your index is and what they search for, now 
doesn't it? ;)


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Sun 11/21/2004 2:52 PM
To: Lucene Users List
Subject: Re: disadvantages
 
On Nov 21, 2004, at 12:00 PM, Miguel Angel wrote:
> What are disadvantages the Lucene??

The users of your system won't have time to get coffee when running 
searches.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]