RemoteSearcher

2005-01-05 Thread Yura Smolsky
Hello.

Does anyone know application which based on RemoteSearcher to
distribute index on many servers?

Yura Smolsky,




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-05 Thread Bill Janssen
> On Jan 5, 2005, at 3:46 PM, Bill Janssen wrote:
> > Maybe I just misunderstand your release numbering policy.  Typically,
> > in a library project that has major, minor, and micro release numbers,
> > I'd expect no API changes between micro releases of a single minor
> > release; only backward-compatible API extensions between different
> > minor releases of a single major release; possible wholesale API
> > changes (not backward compatible) between different major releases.
> > Is this the kind of thinking that you also have?
> 
> Yes, absolutely.  The flaw you have stumbled on was completely an 
> oversight and a mistake that should not have occurred.  I, for one, 
> apologize for not catching it.  Only because I have custom QueryParser 
> subclasses and lots of unit tests did I catch the signature changes 
> that I did, and I'm not sure how I missed this one.  I have not gone 
> back, yet, to review the change history and whether my code is broken 
> in one of those versions of Lucene, or whether I've not overridden that 
> method.

OK, then it's just a bug, and we all make bugs (me probably more than
you, at that).  Thanks for all your help with this, Erik.

Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Span Query Performance

2005-01-05 Thread Andrew Cunningham
Hi all,
I'm currently doing a query similar to the following:
for w in wordset:
   query = w near (word1 V word2 V word3 ... V word1422);
   perform query
and I am doing this through SpanQuery.getSpans(), iterating through the 
spans and counting
the matches, which can result in 4782282 matches (essentially I am only 
after the match count).
The query works but the performance can be somewhat slow; so I am wondering:

a) Would the query potentially run faster if I used 
Searcher.search(query) with a custom similarity,
or do both methods essentially use the same mechanics

b) Does using a RAMDirectory improve query performance any significant 
amount.

c) Is there a faster method to what I am doing I should consider?
Thanks,
Andrew
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Indexing flat files with out .txt extension

2005-01-05 Thread Erik Hatcher
On Jan 5, 2005, at 6:31 PM, Hetan Shah wrote:
How can one index simple text files with out the .txt extension. I am 
trying to use the IndexFiles and IndexHTML but not to my satisfaction. 
In the IndexFiles I do not get any control over the content of the 
file and in case of IndexHTML the files with out any extension do not 
get index all together. Any pointers are really appreciated.
Try out the Indexer code from Lucene in Action.  You can download it 
from the link here: 
http://www.lucenebook.com/blog/announcements/sourcecode.html

It'll be cleaner to follow and borrow from.  The code that ships with 
Lucene is for demonstration purposes.  It surprises me how often folks 
use that code to build real indexes.  It's quite straightforward to 
create your own Java code to do the indexing in whatever manner you 
like, borrowing from examples.

When you get the download unpacked, simply run "ant Indexer" to see it 
in action.  And then "ant Searcher" to search the index just built.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Indexing flat files with out .txt extension

2005-01-05 Thread Hetan Shah
Hello,
How can one index simple text files with out the .txt extension. I am 
trying to use the IndexFiles and IndexHTML but not to my satisfaction. 
In the IndexFiles I do not get any control over the content of the file 
and in case of IndexHTML the files with out any extension do not get 
index all together. Any pointers are really appreciated.

Thanks.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-05 Thread Otis Gospodnetic
Hello Bill,

"I feel your pain" ;)
But seriously, there was a QueryParser mess-up in the recent minor
releases.  I think this is the first time we've messed up the backward
compatibility in the last ~4 years, I believe.  Lucene public API is
very 'narrow', and typically very stable.  What we did with QueryParser
was the result of 'overeagerness', but is really out of character for
Lucene.

Otis


--- Bill Janssen <[EMAIL PROTECTED]> wrote:

> Doug,
> 
> My application (see http://www.parc.com/janssen/pubs/TR-03-16.pdf for
> details) is not just a Java app (you're probably not surprised :-).
> It requires about a dozen other packages to be installed on a
> machine,
> before building from source.  The Python Imaging Library, ReportLab,
> libtiff, libpng, xpdf, htmldoc, etc.  Lucene is one of these
> prerequisites.  I don't include any other outside code with my tar
> file; not sure why Lucene should be the only one to require this.
> 
> Besides, I'd like to keep up with the continuous improvements in
> Lucene.  I don't want to be stuck with 1.4.1 forever.
> 
> Please understand that I'm not trying to push your project in any
> particular direction.  I'm just trying to understand whether Lucene
> is
> usable for my project.  If every micro-release of Lucene means that I
> will potentially have to re-write my code, I may have to look for a
> library with a more stable API.
> 
> Maybe I just misunderstand your release numbering policy.  Typically,
> in a library project that has major, minor, and micro release
> numbers,
> I'd expect no API changes between micro releases of a single minor
> release; only backward-compatible API extensions between different
> minor releases of a single major release; possible wholesale API
> changes (not backward compatible) between different major releases.
> Is this the kind of thinking that you also have?
> 
> I can certainly understand that when you find improvements you'd like
> to make in the API, you'd want to put them in.  I just think it's
> important not to break existing code without bumping the release
> number, so that a user can say, "This works with Lucene 1.4".  Right
> now, that can't be said.
> 
> Bill
> 
> Doug Cutting wrote:
> > Bill, most folks bundle appropriate versions of required jars with
> their 
> > applications to avoid this sort of problem.  How are you deploying 
> > things?  Are you not bundling a compatible version of the lucene
> jar 
> > with each release of your application?  If not, why not?
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-05 Thread Erik Hatcher
On Jan 5, 2005, at 3:46 PM, Bill Janssen wrote:
Maybe I just misunderstand your release numbering policy.  Typically,
in a library project that has major, minor, and micro release numbers,
I'd expect no API changes between micro releases of a single minor
release; only backward-compatible API extensions between different
minor releases of a single major release; possible wholesale API
changes (not backward compatible) between different major releases.
Is this the kind of thinking that you also have?
Yes, absolutely.  The flaw you have stumbled on was completely an 
oversight and a mistake that should not have occurred.  I, for one, 
apologize for not catching it.  Only because I have custom QueryParser 
subclasses and lots of unit tests did I catch the signature changes 
that I did, and I'm not sure how I missed this one.  I have not gone 
back, yet, to review the change history and whether my code is broken 
in one of those versions of Lucene, or whether I've not overridden that 
method.

In short - we screwed up, and we should fix it since its obviously 
important to you.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-05 Thread Erik Hatcher
On Jan 5, 2005, at 3:48 PM, Bill Janssen wrote:
In 1.4.1 or 1.4.3?
Both - my suggestion was an attempt to get you something that would 
work in both versions.

Erik

On Jan 4, 2005, at 9:43 PM, Bill Janssen wrote:
Let me be a bit more explicit.  My method (essentially an
after-method, for those Lisp'rs out there) begins thusly:
protected Query getFieldQuery (String field,
   Analyzer a,
   String queryText)
throws ParseException {
  Query x = super.getFieldQuery(field, a, queryText);
  ...
}
If I remove the "Analyzer a" from both the signature and the super
call, the super call won't compile because that method isn't in the
QueryParser in 1.4.1.  But my getFieldQuery() method won't even be
called in 1.4.1, because it doesn't exist in that version of the
QueryParser.
Will it work if you override this method also?
protected Query getFieldQuery(String field,
   Analyzer analyzer,
   String queryText,
   int slop)
	Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-05 Thread Bill Janssen
In 1.4.1 or 1.4.3?

> On Jan 4, 2005, at 9:43 PM, Bill Janssen wrote:
> > Let me be a bit more explicit.  My method (essentially an
> > after-method, for those Lisp'rs out there) begins thusly:
> >
> > protected Query getFieldQuery (String field,
> >Analyzer a,
> >String queryText)
> > throws ParseException {
> >
> >   Query x = super.getFieldQuery(field, a, queryText);
> >
> >   ...
> > }
> >
> > If I remove the "Analyzer a" from both the signature and the super
> > call, the super call won't compile because that method isn't in the
> > QueryParser in 1.4.1.  But my getFieldQuery() method won't even be
> > called in 1.4.1, because it doesn't exist in that version of the
> > QueryParser.
> 
> Will it work if you override this method also?
> 
> protected Query getFieldQuery(String field,
>Analyzer analyzer,
>String queryText,
>int slop)
> 
>   Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-05 Thread Bill Janssen
Doug,

My application (see http://www.parc.com/janssen/pubs/TR-03-16.pdf for
details) is not just a Java app (you're probably not surprised :-).
It requires about a dozen other packages to be installed on a machine,
before building from source.  The Python Imaging Library, ReportLab,
libtiff, libpng, xpdf, htmldoc, etc.  Lucene is one of these
prerequisites.  I don't include any other outside code with my tar
file; not sure why Lucene should be the only one to require this.

Besides, I'd like to keep up with the continuous improvements in
Lucene.  I don't want to be stuck with 1.4.1 forever.

Please understand that I'm not trying to push your project in any
particular direction.  I'm just trying to understand whether Lucene is
usable for my project.  If every micro-release of Lucene means that I
will potentially have to re-write my code, I may have to look for a
library with a more stable API.

Maybe I just misunderstand your release numbering policy.  Typically,
in a library project that has major, minor, and micro release numbers,
I'd expect no API changes between micro releases of a single minor
release; only backward-compatible API extensions between different
minor releases of a single major release; possible wholesale API
changes (not backward compatible) between different major releases.
Is this the kind of thinking that you also have?

I can certainly understand that when you find improvements you'd like
to make in the API, you'd want to put them in.  I just think it's
important not to break existing code without bumping the release
number, so that a user can say, "This works with Lucene 1.4".  Right
now, that can't be said.

Bill

Doug Cutting wrote:
> Bill, most folks bundle appropriate versions of required jars with their 
> applications to avoid this sort of problem.  How are you deploying 
> things?  Are you not bundling a compatible version of the lucene jar 
> with each release of your application?  If not, why not?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



multi-threaded thru-put in lucene

2005-01-05 Thread John Wang
Hi folks:

We are trying to measure thru-put lucene in a multi-threaded environment. 

This is what we found:

 1 thread, search takes 20 ms.

  2 threads, search takes 40 ms.

  5 threads, search takes 100 ms.


 Seems like under a multi-threaded scenario, thru-put isn't good,
performance is not any better than that of 1 thread.

 I tried to share an IndexSearcher amongst all threads as well as
having an IndexSearcher per thread. Both yield same numbers.

 Is this consistent with what you'd expect?

Thanks

-John

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Question about Analyzer and words spelled in different languages

2005-01-05 Thread Mariella Di Giacomo
Hi ALL,
We are trying to index scientic articles written in english, but whose 
authors can be spelled in any language (depending on the author's nazionality)

E.g.
Schäffer
In the XML document that we provide to Lucene the author name is written in 
the following way (using HTML ENTITIES)

Schäffer
So in practice that is the name that would be given to a Lucene analyzer/filter
Is there any already written analyzer that would take that name 
(Schäffer or any other name that has entities) so that
Lucene index could searched (once the field has been indexed) for the real 
version of the name, which is

Schäffer
and the english spelled version of the name which is
Schaffer
Thanks a lot in advance for your help,
Mariella


Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-05 Thread Doug Cutting
Bill Janssen wrote:
Sure, if I wanted to ship different code for each micro-release of
Lucene (which, you might guess, I don't).  That signature doesn't
compile with 1.4.1.
Bill, most folks bundle appropriate versions of required jars with their 
applications to avoid this sort of problem.  How are you deploying 
things?  Are you not bundling a compatible version of the lucene jar 
with each release of your application?  If not, why not?

I'm not trying to be difficult, just trying to understand.
Thanks,
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: simultaneous index/search/delete

2005-01-05 Thread Otis Gospodnetic
Any index-modifying operations need to be serializes.  Searching is
read-only and can be done in parallel with anything else. See
http://www.lucenebook.com/search?query=concurrent for some hints.

Otis


--- Alex Kiselevski <[EMAIL PROTECTED]> wrote:

> 
> Concerning the question about simultaneous index/search/delete :
> Do i have to put synchronized on methods that call  to API functions
> of
> index/search/delete
> 
> 
> The information contained in this message is proprietary of Amdocs,
> protected from disclosure, and may be privileged.
> The information is intended to be conveyed only to the designated
> recipient(s)
> of the message. If the reader of this message is not the intended
> recipient,
> you are hereby notified that any dissemination, use, distribution or
> copying of
> this communication is strictly prohibited and may be unlawful.
> If you have received this communication in error, please notify us
> immediately
> by replying to the message and deleting it from your computer.
> Thank you.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: PDFBox deprecated methods

2005-01-05 Thread ben
Daniel,

Yes, that getText( PDDocument ) is the method you should be using.

You no longer need to use a COSDocument object, please note the following 
methods that go along with the deprecation of getText( COSDocument )

PDFParser.getPDDocument() - to get a PDDocument instead of a COSDocument after 
parsing
PDDocument.load() - A convenience method that does all the PDFParser stuff and 
returns a PDDocument
LucenePDFDocument.getDocument() - to go straight from a File/URL to a lucene 
document object


Ben


Quoting Daniel Cortes <[EMAIL PROTECTED]>:

> Ok I reply myself
> the method deprecated is .getText(Cos Document))
> if you do stripper.getText(new PDDocument(cosDoc)) there isn't any problem.
> 
> 
> Excuse me, for the question
> 
> 
> Daniel Cortes wrote:
> 
> > I've been use PDFBox in my indexation of a directory . I've download  
> > the last version of  PDFBox (0.6.7.a) and I've seen that the method 
> > that I use to extract
> > was a deprecated method. PDFTextStripper.getText().
> > stripper.getText(new PDDocument(cosDoc));
> > I know a lot of person use same me this method. What  are alternative 
> > options ?
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 




-
This mail sent through IMP: http://horde.org/imp/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: searching while indexing.

2005-01-05 Thread Paul Elschot
On Wednesday 05 January 2005 12:14, Morus Walter wrote:
> Peter Veentjer - Anchor Men writes:
> > >>Is your IndexReader doing deletes?  
> > Yes.. I have to remove the documents I`m going to update from the
> > Reader. 
> > 
> > >>That is the only time it locks the index (because that is essentially 
> > >>a write operation).  If you're purely searching with the reader it 
> > >>should work fine with a writer concurrently.
> > 
> > Ok, I understand why there are problems. But how can I fix this problem?
> > I have to update documents, so how can I do this without deleting
> > documents from the Reader? I don`t want to add the same document twice. 
> > 
> You have to bundle all writes at one point and serialize deletions and
> imports.
> That is:
> open a reader for deleting
> delete the documents to be deleted
> close that reader
> open a writer for adding content
> add documents
> close that writer
> begin at start.
> 
> It's up to you, whether you open a reader to delete single documents and
> a writer for adding a single document or use batches of several documents,
> but you cannot escape the need to serialize the writes.

And while this updating is going on, you can keep another reader open for
searching, it will not be affected by the updates.
After all updates are done, close that reader and reopen
another one to see the updates.

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: PDFBox deprecated methods

2005-01-05 Thread Daniel Cortes
Ok I reply myself
the method deprecated is .getText(Cos Document))
if you do stripper.getText(new PDDocument(cosDoc)) there isn't any problem.
Excuse me, for the question
Daniel Cortes wrote:
I've been use PDFBox in my indexation of a directory . I've download  
the last version of  PDFBox (0.6.7.a) and I've seen that the method 
that I use to extract
was a deprecated method. PDFTextStripper.getText().
stripper.getText(new PDDocument(cosDoc));
I know a lot of person use same me this method. What  are alternative 
options ?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


PDFBox deprecated methods

2005-01-05 Thread Daniel Cortes
I've been use PDFBox in my indexation of a directory . I've download  
the last version of  PDFBox (0.6.7.a) and I've seen that the method that 
I use to extract
was a deprecated method. PDFTextStripper.getText().
stripper.getText(new PDDocument(cosDoc));
I know a lot of person use same me this method. What  are alternative 
options ?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


simultaneous index/search/delete

2005-01-05 Thread Alex Kiselevski

Concerning the question about simultaneous index/search/delete :
Do i have to put synchronized on methods that call  to API functions of
index/search/delete


The information contained in this message is proprietary of Amdocs,
protected from disclosure, and may be privileged.
The information is intended to be conveyed only to the designated recipient(s)
of the message. If the reader of this message is not the intended recipient,
you are hereby notified that any dissemination, use, distribution or copying of
this communication is strictly prohibited and may be unlawful.
If you have received this communication in error, please notify us immediately
by replying to the message and deleting it from your computer.
Thank you.

RE: searching while indexing.

2005-01-05 Thread Morus Walter
Peter Veentjer - Anchor Men writes:
> >>Is your IndexReader doing deletes?  
> Yes.. I have to remove the documents I`m going to update from the
> Reader. 
> 
> >>That is the only time it locks the index (because that is essentially 
> >>a write operation).  If you're purely searching with the reader it 
> >>should work fine with a writer concurrently.
> 
> Ok, I understand why there are problems. But how can I fix this problem?
> I have to update documents, so how can I do this without deleting
> documents from the Reader? I don`t want to add the same document twice. 
> 
You have to bundle all writes at one point and serialize deletions and
imports.
That is:
open a reader for deleting
delete the documents to be deleted
close that reader
open a writer for adding content
add documents
close that writer
begin at start.

It's up to you, whether you open a reader to delete single documents and
a writer for adding a single document or use batches of several documents,
but you cannot escape the need to serialize the writes.

HTH
Morus



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Advice on indexing content from a database

2005-01-05 Thread Nader Henein
Hibernate + Lucene,
Use Hibernate to read from your DB, this will pull out the data you need 
in nice and clean objects, and then loop through your object collection 
and create Lucene documents, you can add Quartz to the equation and have 
this process run scheduled on chunks of your data till it's all been 
indexed and then continue on with incremental updates / deletes.


Nader Henein
[EMAIL PROTECTED] wrote:
Hi
I'm working on integrating lucene with a cms. All the data is stored in a
database. I'm looking at about 2 million records. Any advice on an
effective technique to index this (incrementally or using threads) that
would not overload my server.
Thanks
Aneesha
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Advice on indexing content from a database

2005-01-05 Thread aneesha
Hi

I'm working on integrating lucene with a cms. All the data is stored in a
database. I'm looking at about 2 million records. Any advice on an
effective technique to index this (incrementally or using threads) that
would not overload my server.

Thanks

Aneesha


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: searching while indexing.

2005-01-05 Thread Peter Veentjer - Anchor Men
>>Is your IndexReader doing deletes?  
Yes.. I have to remove the documents I`m going to update from the
Reader. 

>>That is the only time it locks the index (because that is essentially 
>>a write operation).  If you're purely searching with the reader it 
>>should work fine with a writer concurrently.

Ok, I understand why there are problems. But how can I fix this problem?
I have to update documents, so how can I do this without deleting
documents from the Reader? I don`t want to add the same document twice. 

This is a problem many users of Lucene will face.. Could you please add
a good explanation to the FAQ?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: searching while indexing.

2005-01-05 Thread Erik Hatcher
On Jan 5, 2005, at 5:12 AM, Peter Veentjer - Anchor Men wrote:
-Oorspronkelijk bericht-
Van: Erik Hatcher [mailto:[EMAIL PROTECTED]
Verzonden: woensdag 5 januari 2005 10:58
Aan: Lucene Users List
Onderwerp: Re: searching while indexing.
There are no problems searching while indexing.  How are you
experiencing otherwise?  What error do you get?
I have experienced (lock) problems if I use a Reader and Writer (on the
same index-directory) at the same time. My application is multithreaded
(a pool of worker threads for the webrequests) and a scheduledworker
thread for signaling changes (new (normal) files, changed files and
removed files) and updating the index.
And I`m not the only one experiencing this problem... It ( Reader and
Writer open at the same time) has been mentioned on mailinglist quite a
few times.
Is your IndexReader doing deletes?  That is the only time it locks the 
index (because that is essentially a write operation).  If you're 
purely searching with the reader it should work fine with a writer 
concurrently.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Réf. : Re: do a simple search

2005-01-05 Thread Morus Walter
[EMAIL PROTECTED] writes:
> I must change the request to made search like this
> 
> type=value AND (shortDesc=value OR longDesc=value)
> 
> but I don't know how to do this ?
> 
create a boolean query for (shortDesc=value OR longDesc=value)
(as you do so far) and create another boolean query adding that boolean
query and the query for type:product.
For the latter use add(, true, false) to make both subqueries
required.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Réf. : Re: Réf. : Re: do a simple search

2005-01-05 Thread Stephane . Giner
OK, thanks




On Wed, 2005-01-05 at 10:56 +0100, [EMAIL PROTECTED] wrote:
> I have alway another field "type" who is the type of the searched 
> document.
> I must change the request to made search like this
> 
> type=value AND (shortDesc=value OR longDesc=value)
> 
> but I don't know how to do this ?
> 
> here is the query with fields values
> 
> Field name: type
> Field value: product
> Field name: shortDesc
> Field value: toto
> Field name: longDesc
> Field value: toto
> IndexManager query = type:product shortDesc:toto longDesc:toto

For the type field I suggest using a TermQuery. Is the document type
from a list of defined types? i.e. is it stored as a keyword and hence
doesn't need parsing?

For the other fields I recommend trying out the
DistributingMultiFieldQueryParser class, which isn't in the main distro
yet but can be found here:

http://issues.apache.org/bugzilla/show_bug.cgi?id=32674

It handles all the awkward bits of making sure all fields are searched
correctly. 

Then combine the two query objects in a BooleanQuery.

-- 
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Réf. : Re: do a simple search

2005-01-05 Thread Miles Barr
On Wed, 2005-01-05 at 10:56 +0100, [EMAIL PROTECTED] wrote:
> I have alway another field "type" who is the type of the searched 
> document.
> I must change the request to made search like this
> 
> type=value AND (shortDesc=value OR longDesc=value)
> 
> but I don't know how to do this ?
> 
> here is the query with fields values
> 
> Field name: type
> Field value: product
> Field name: shortDesc
> Field value: toto
> Field name: longDesc
> Field value: toto
> IndexManager query = type:product shortDesc:toto longDesc:toto

For the type field I suggest using a TermQuery. Is the document type
from a list of defined types? i.e. is it stored as a keyword and hence
doesn't need parsing?

For the other fields I recommend trying out the
DistributingMultiFieldQueryParser class, which isn't in the main distro
yet but can be found here:

http://issues.apache.org/bugzilla/show_bug.cgi?id=32674

It handles all the awkward bits of making sure all fields are searched
correctly. 

Then combine the two query objects in a BooleanQuery.

-- 
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: searching while indexing.

2005-01-05 Thread Peter Veentjer - Anchor Men
 

-Oorspronkelijk bericht-
Van: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Verzonden: woensdag 5 januari 2005 10:58
Aan: Lucene Users List
Onderwerp: Re: searching while indexing.

There are no problems searching while indexing.  How are you
experiencing otherwise?  What error do you get?

I have experienced (lock) problems if I use a Reader and Writer (on the
same index-directory) at the same time. My application is multithreaded
(a pool of worker threads for the webrequests) and a scheduledworker
thread for signaling changes (new (normal) files, changed files and
removed files) and updating the index.

And I`m not the only one experiencing this problem... It ( Reader and
Writer open at the same time) has been mentioned on mailinglist quite a
few times.

A possible solution I have seen is creating a shadow-index and switching
the reader to that index if the writer is finished. But I don`t
understand while a Reader and Writer can not be opened on the same
directory... On the same time. 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-05 Thread Erik Hatcher
On Jan 4, 2005, at 9:43 PM, Bill Janssen wrote:
Let me be a bit more explicit.  My method (essentially an
after-method, for those Lisp'rs out there) begins thusly:
protected Query getFieldQuery (String field,
   Analyzer a,
   String queryText)
throws ParseException {
  Query x = super.getFieldQuery(field, a, queryText);
  ...
}
If I remove the "Analyzer a" from both the signature and the super
call, the super call won't compile because that method isn't in the
QueryParser in 1.4.1.  But my getFieldQuery() method won't even be
called in 1.4.1, because it doesn't exist in that version of the
QueryParser.
Will it work if you override this method also?
protected Query getFieldQuery(String field,
  Analyzer analyzer,
  String queryText,
  int slop)
My head is spinning looking at all the various signatures of this 
method we have and trying to backtrack where things went awry.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: searching while indexing.

2005-01-05 Thread Erik Hatcher
There are no problems searching while indexing.  How are you 
experiencing otherwise?  What error do you get?

Erik
On Jan 5, 2005, at 4:47 AM, Peter Veentjer - Anchor Men wrote:
What is the best way to implement: searching while indexing.
I have read the mailinglist for a while but haven`t got a good answer 
to
my question.

It is not allowed to index, while searching. But I don`t understand 
why.
All the segments are immutable, so after I have created a Reader it
could use all the segments that are available at the moment. The reader
maintains references to those segments, and if the reader is not needed
anymore (or the writer says: I`m finished creating new indices... you
should can search through a newer set of segments) the reader could
delete all the old segments. The writer can create new segments based 
on
the immutable-old ones and based on the new documents. After it has
created a new set, it can signal the reader to use the newer segments.

So why is the above scenario not possible? Why are segments immutable?
And what is the best way to add documents to a (big index >20 gig)
without copying the index, and without blocking the search?

Met vriendelijke groet,
Peter Veentjer
Anchor Men Interactive Solutions - duidelijk in zakelijke
internetoplossingen
Praediniussingel 41
9711 AE Groningen
T: 050-3115222
F: 050-5891696
E: [EMAIL PROTECTED]
I : www.anchormen.nl http://www.anchormen.nl/>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Réf. : Re: do a simple search

2005-01-05 Thread Stephane . Giner
>On Jan 5, 2005, at 3:41 AM, [EMAIL PROTECTED] wrote:
>> I would like to search a word in differents fields of a document with 
>> an
>> OR operator.
>>
>> My fields are "id", "shortDesc" and "longDesc".
>> In java I want to search a word simultanly in "shortDesc" and 
>> "longDesc"
>> field.
>>
>> for example:
>>
>> doc1:   id:1
>> shortDesc: a foo desc
>> longDesc: a doc long desc
>>
>> doc2:   id:2
>> shortDesc:a doc short desc
>> longDesc:a foo long desc
>>
>> doc3:   id:3
>> shortDesc:another short desc
>> longDesc:another long desc
>>
>> if the search word is "foo" i want to retreive doc1 and doc3.

>You meant doc1 and doc2.
yes (sorry)

>> in my program, fields are stored in fieldName list.
>> associated values are stored in fieldValue.


>What's the question?  The code you show below, at first glance, looks 
>reasonable, or at least close.  What is the value of query.toString()

>Erik

Sorry but I found the pb.

I have alway another field "type" who is the type of the searched 
document.
I must change the request to made search like this

type=value AND (shortDesc=value OR longDesc=value)

but I don't know how to do this ?

here is the query with fields values

Field name: type
Field value: product
Field name: shortDesc
Field value: toto
Field name: longDesc
Field value: toto
IndexManager query = type:product shortDesc:toto longDesc:toto


> private static Hits search(List fieldName, List fieldValue) {
> Hits hits = null;
>
> int fieldNameSize  = fieldName.size();
> int fieldValueSize = fieldValue.size();
> if (fieldNameSize != fieldValueSize) {
> return null;
> }
>
> IndexSearcher searcher = getSearcher();
> if (searcher != null) {
> BooleanQuery query = new BooleanQuery();
> //populate the query with all terms
> for (int i=0; i String currentFieldName  = (String)
> fieldName.get(i);
> String currentFieldValue = (String)
> fieldValue.get(i);
>
> StringTokenizer tokenizer = new
> StringTokenizer(currentFieldValue);
> while (tokenizer.hasMoreTokens()) {
> String currentToken =
> tokenizer.nextToken();
> Term currentTerm = new
> Term(currentFieldName,currentToken);
> TermQuery termQuery = new
> TermQuery(currentTerm);
>
> 
> query.add(termQuery,false,false);
> }
> }
>
> //do the search
> try {
> //System.out.println("IndexManager 
> query =
> " + query.toString());
> hits = searcher.search(query);
> }
> catch (IOException ioe) {
>  LogManager.log(LogManager.LOG_ERROR,"Cannot search in index.",ioe);
> }
> finally {
> try {
> searcher.close();
> }
> catch (IOException ioe) {
>  LogManager.log(LogManager.LOG_WARNING,"Cannot close searcher in search
> method.",ioe);
> }
> }
> }
>
> return hits;
> }


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




searching while indexing.

2005-01-05 Thread Peter Veentjer - Anchor Men
What is the best way to implement: searching while indexing. 
 
I have read the mailinglist for a while but haven`t got a good answer to
my question. 
 
It is not allowed to index, while searching. But I don`t understand why.
All the segments are immutable, so after I have created a Reader it
could use all the segments that are available at the moment. The reader
maintains references to those segments, and if the reader is not needed
anymore (or the writer says: I`m finished creating new indices... you
should can search through a newer set of segments) the reader could
delete all the old segments. The writer can create new segments based on
the immutable-old ones and based on the new documents. After it has
created a new set, it can signal the reader to use the newer segments.
 
So why is the above scenario not possible? Why are segments immutable?
And what is the best way to add documents to a (big index >20 gig)
without copying the index, and without blocking the search?
 
 
 

Met vriendelijke groet,

Peter Veentjer
Anchor Men Interactive Solutions - duidelijk in zakelijke
internetoplossingen

Praediniussingel 41
9711 AE Groningen

T: 050-3115222
F: 050-5891696
E: [EMAIL PROTECTED]
I : www.anchormen.nl http://www.anchormen.nl/> 

 


Re: do a simple search

2005-01-05 Thread Erik Hatcher
On Jan 5, 2005, at 3:41 AM, [EMAIL PROTECTED] wrote:
I would like to search a word in differents fields of a document with 
an
OR operator.

My fields are "id", "shortDesc" and "longDesc".
In java I want to search a word simultanly in "shortDesc" and 
"longDesc"
field.

for example:
doc1:   id:1
shortDesc: a foo desc
longDesc: a doc long desc
doc2:   id:2
shortDesc:a doc short desc
longDesc:a foo long desc
doc3:   id:3
shortDesc:another short desc
longDesc:another long desc
if the search word is "foo" i want to retreive doc1 and doc3.
You meant doc1 and doc2.
in my program, fields are stored in fieldName list.
associated values are stored in fieldValue.

What's the question?  The code you show below, at first glance, looks 
reasonable, or at least close.  What is the value of query.toString()

Erik

private static Hits search(List fieldName, List fieldValue) {
Hits hits = null;
int fieldNameSize  = fieldName.size();
int fieldValueSize = fieldValue.size();
if (fieldNameSize != fieldValueSize) {
return null;
}
IndexSearcher searcher = getSearcher();
if (searcher != null) {
BooleanQuery query = new BooleanQuery();
//populate the query with all terms
for (int i=0; i
StringTokenizer tokenizer = new
StringTokenizer(currentFieldValue);
while (tokenizer.hasMoreTokens()) {
String currentToken =
tokenizer.nextToken();
Term currentTerm = new
Term(currentFieldName,currentToken);
TermQuery termQuery = new
TermQuery(currentTerm);

query.add(termQuery,false,false);
}
}

//do the search
try {
//System.out.println("IndexManager 
query =
" + query.toString());
hits = searcher.search(query);
}
catch (IOException ioe) {
 LogManager.log(LogManager.LOG_ERROR,"Cannot search in index.",ioe);
}
finally {
try {
searcher.close();
}
catch (IOException ioe) {
 LogManager.log(LogManager.LOG_WARNING,"Cannot close searcher in search
method.",ioe);
}
}
}

return hits;
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


do a simple search

2005-01-05 Thread Stephane . Giner
hello

I would like to search a word in differents fields of a document with an 
OR operator.

My fields are "id", "shortDesc" and "longDesc".
In java I want to search a word simultanly in "shortDesc" and "longDesc" 
field.

for example:

doc1:   id:1
shortDesc: a foo desc
longDesc: a doc long desc

doc2:   id:2
shortDesc:a doc short desc
longDesc:a foo long desc

doc3:   id:3
shortDesc:another short desc
longDesc:another long desc

if the search word is "foo" i want to retreive doc1 and doc3.

in my program, fields are stored in fieldName list. 
associated values are stored in fieldValue.

thanks
private static Hits search(List fieldName, List fieldValue) {
Hits hits = null;
 
int fieldNameSize  = fieldName.size();
int fieldValueSize = fieldValue.size();
if (fieldNameSize != fieldValueSize) {
return null;
}
 
IndexSearcher searcher = getSearcher();
if (searcher != null) {
BooleanQuery query = new BooleanQuery();
//populate the query with all terms
for (int i=0; i