Re: Using the highlighter from the sandbox with a prefix query.

2005-02-17 Thread mark harwood
See the highlighter's package.html for a description
of how query.rewrite should be used to solve this.

Cheers,
Mark


 --- lucuser4851 <[EMAIL PROTECTED]> wrote: 
> Dear All,
>  We have been using the highlighter from the lucene
> sandbox, which works
> very nicely most of the time. However when we try
> and use it with a
> prefix query (which is what you get having parsed a
> wild-card query), it
> doesn't return any highlighted sections. Has anyone
> else experienced
> this problem, or found a way around it?
> 
> Thanks a lot for your suggestions!!
> 
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
>  





___ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! 
http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using the highlighter from the sandbox with a prefix query.

2005-02-17 Thread Daniel Naber
On Thursday 17 February 2005 08:37, lucuser4851 wrote:

>  We have been using the highlighter from the lucene sandbox, which works
> very nicely most of the time. However when we try and use it with a
> prefix query (which is what you get having parsed a wild-card query), it
> doesn't return any highlighted sections. Has anyone else experienced
> this problem, or found a way around it?

You need to call rewrite() on the query before you pass it to the highlighter.

Regards
 Daniel

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



ParrellelMultiSearcher Question

2005-02-17 Thread Youngho Cho
Hello,

I would like to use ParrellelMultiSearcher with few RemoteSearchables.

If one of the remote server is down, 
Can I parrellelMultiSearcher set close() and 
make new ParrellelMultiSearcher with other live RemoteSearchables ?

Thanks.

Youngho

RE: Concurrent searching & re-indexing

2005-02-17 Thread Paul Mellor
Otis,

Looking at your reply again, I have a couple of questions -

"IndexSearcher (IndexReader, really) does take a snapshot of the index state
when it is opened, so at that time the index segments listed in segments
should be in a complete state.  It also reads index files when searching, of
course."

1. If IndexReader takes a snapshot of the index state when opened and then
reads the files when searching, what would happen if the files it takes a
snapshot of are deleted before the search is performed (as would happen with
a reindexing in the period between opening an IndexSearcher and using it to
search)?

2. Does a similar potential problem exist when optimising an index, if this
combines all the segments into a single file?

Many thanks

Paul

-Original Message-
From: Paul Mellor [mailto:[EMAIL PROTECTED]
Sent: 16 February 2005 17:37
To: 'Lucene Users List'
Subject: RE: Concurrent searching & re-indexing


But all write access to the index is synchronized, so that although multiple
threads are creating an IndexWriter for the same directory and using it to
totally recreate that index, only one thread is doing this at once.

I was concerned about the safety of using an IndexSearcher to perform
queries on an index that is in the process of being recreated from scratch,
but I guess that if the IndexSearcher takes a snapshot of the index when it
is created (and in my code this creation is synchronized with the write
operations as well so that the threads wait for the write operations to
finish before instantiating an IndexSearcher, and vice versa) this can't be
a problem.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: 16 February 2005 17:30
To: Lucene Users List
Subject: Re: Concurrent searching & re-indexing


Hi Paul,

If I understand your setup correctly, it looks like you are running
multiple threads that create IndexWriter for the ame directory.  That's
a "no no".

This section (first hit) describes all various concurrency issues with
regards to adds, updates, optimization, and searches:
  http://www.lucenebook.com/search?query=concurrent

IndexSearcher (IndexReader, really) does take a snapshot of the index
state when it is opened, so at that time the index segments listed in
segments should be in a complete state.  It also reads index files when
searching, of course.

Otis


--- Paul Mellor <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I've read from various sources on the Internet that it is perfectly
> safe to
> simultaneously search a Lucene index that is being updated from
> another
> Thread, as long as all write access to the index is synchronized. 
> But does
> this apply only to updating the index (i.e. deleting and adding
> documents),
> or to a complete re-indexing (i.e. create a new IndexWriter with the
> 'create' argument true and then re-add all the documents)?
> 
> I have a class which encapsulates all access to my index, so that
> writes can
> be synchronized.  This class also exposes a method to obtain an
> IndexSearcher for the index.  I'm running unit tests to test this
> which
> create many threads - each thread does a complete re-indexing and
> then
> obtains an IndexSearcher and does a query.
> 
> I'm finding that with sufficiently high numbers of threads, I'm
> getting the
> occasional failure, with the following exception thrown when
> attempting to
> construct a new IndexWriter (during the reindexing) -
> 
> java.io.IOException: couldn't delete _a.f1
> at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
> at
>
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135)
> at
>
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:151)
> ...
> 
> The exception occurs quite infrequently (usually for somewhere
> between 1-5%
> of the Threads).
> 
> Does the IndexSearcher take a 'snapshot' of the index at creation? 
> Or does
> it access the filesystem whilst searching?  I am also synchronizing
> creation
> of the IndexSearcher with the write lock, so that the IndexSearcher
> is not
> created whilst the index is being recreated (and vice versa).  But do
> I need
> to ensure that the IndexSearcher cannot search whilst the index is
> being
> recreated as well?
> 
> Note that a similar unit test where the threads update the index
> (rather
> than recreate it from scratch) works fine, as expected.
> 
> This is running on Windows 2000.
> 
> Any help would be much appreciated!
> 
> Paul
> 
> This e-mail and any files transmitted with it are confidential and
> intended
> solely for the use of the individual or entity to whom they are
> addressed.
> If you are not the intended recipient, you should not copy,
> retransmit or
> use the e-mail and/or files transmitted with it  and should not
> disclose
> their contents. In such a case, please notify
> [EMAIL PROTECTED]
> and delete the message from your own system. Any opinions expressed
> in this
> e-mai

Re: Strange Index problem

2005-02-17 Thread Geir Ove Grønmo
On Tue, 25 Jan 2005 13:54:00 +0100, Nestel, Frank  IZ/HZA-IOL  
<[EMAIL PROTECTED]> wrote:

In one project we've a system which incrementally updates
an index every night. This has been working fine. We've
upgraded to Lucene 1.4.2 when it was there without observing a
difference instantly. But now we regularly run into trouble.
It seems like our index has "captured" a very defunc document
and as long as you work around this document the index
is still working, but as soon as you touch that particular
document, you run into trouble:
java.lang.IndexOutOfBoundsException: Index: 114, Size: 19
at java.util.ArrayList.RangeCheck   (ArrayList.java:547)at
java.util.ArrayList.get(ArrayList.java:322) at
org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
at  org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185
)   at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:487)
at
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
I've just found a similar traceback in one of our deployed systems (using
version 1.4.3):
java.lang.IndexOutOfBoundsException: Index: 104, Size: 11
at java.util.ArrayList.RangeCheck(ArrayList.java:507)
at java.util.ArrayList.get(ArrayList.java:324)
at  
org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66)
at  
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237)
at  
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
at  
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
at  
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:487)
at  
org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)

After having gotten this error from the optimize() call, it is no longer  
possible
to search:

java.io.IOException: read past EOF
at org.apache.lucene.store.InputStream.refill(InputStream.java:154)
at  
org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at  
org.apache.lucene.store.InputStream.readBytes(InputStream.java:57)
at  
org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:356)
at org.apache.lucene.index.MultiReader.norms(MultiReader.java:159)
at  
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64)
at  
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:165)
at  
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:165)
at  
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
at org.apache.lucene.search.Hits.(Hits.java:43)
at org.apache.lucene.search.Searcher.search(Searcher.java:33)
at org.apache.lucene.search.Searcher.search(Searcher.java:27)

Any ideas?
--
Geir O.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Concurrent searching & re-indexing

2005-02-17 Thread Morus Walter
Paul Mellor writes:
> 
> 1. If IndexReader takes a snapshot of the index state when opened and then
> reads the files when searching, what would happen if the files it takes a
> snapshot of are deleted before the search is performed (as would happen with
> a reindexing in the period between opening an IndexSearcher and using it to
> search)?
> 
On unix, open files are still there, even if they are deleted (that is,
there is no link (filename) to the file anymore but the file's content
still exists), on windows you cannot delete open files, so Lucene 
AFAIK (I don't use windows) postpones the deletion to a time, when the 
file is closed.
 
> 2. Does a similar potential problem exist when optimising an index, if this
> combines all the segments into a single file?
> 
AFAIK optimising creates new files.

The only problem that might occur, is opening a reader during index change
but that's handled by a lock.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Concurrent searching & re-indexing

2005-02-17 Thread Paul Mellor
"on windows you cannot delete open files, so Lucene AFAIK (I don't use
windows) postpones the deletion to a time, when the file is closed"

If Lucene does not in fact postpone the deletion, that would explain the
exception I'm seeing ("java.io.IOException: couldn't delete _a.f1") - the
IndexWriter is attempting to delete the files but the IndexReader has them
open.

Does this then mean that re-indexing whilst searching is inherently unsafe,
but only on Windows?

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED]
Sent: 17 February 2005 10:38
To: Lucene Users List
Subject: RE: Concurrent searching & re-indexing


Paul Mellor writes:
> 
> 1. If IndexReader takes a snapshot of the index state when opened and then
> reads the files when searching, what would happen if the files it takes a
> snapshot of are deleted before the search is performed (as would happen
with
> a reindexing in the period between opening an IndexSearcher and using it
to
> search)?
> 
On unix, open files are still there, even if they are deleted (that is,
there is no link (filename) to the file anymore but the file's content
still exists), on windows you cannot delete open files, so Lucene 
AFAIK (I don't use windows) postpones the deletion to a time, when the 
file is closed.
 
> 2. Does a similar potential problem exist when optimising an index, if
this
> combines all the segments into a single file?
> 
AFAIK optimising creates new files.

The only problem that might occur, is opening a reader during index change
but that's handled by a lock.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


_
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning
Services - powered by MessageLabs. For further information visit
http://www.mci.com

This e-mail and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the intended recipient, you should not copy, retransmit or
use the e-mail and/or files transmitted with it  and should not disclose
their contents. In such a case, please notify [EMAIL PROTECTED]
and delete the message from your own system. Any opinions expressed in this
e-mail and/or files transmitted with it that do not relate to the official
business of this company are those solely of the author and should not be
interpreted as being endorsed by this company.


reuse of TokenStream

2005-02-17 Thread Harald Kirsch
Hi,

is it thread safe to reuse the same TokenStream object for several
fields of a document or does the IndexWriter try to parallelise
tokenization of the fields of a single document?

Similar question: Is it safe to reuse the same TokenStream object for
several documents if I use IndexWriter.addDocument() in a loop?  Or
does addDocument only put the work into a queue where tasks are taken
out for parallel indexing by several threads?

  Thanks,
  Harald.

-- 

Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593
BioMed Information Extraction: http://www.ebi.ac.uk/Rebholz-srv/whatizit

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Storing info about the index in the index

2005-02-17 Thread Sanyi
Hi!

Is there any way to store info about the index in the index?
(You know, like in .doc files on Windows. You can store title, author, etc...)
I need to store the last indexed database UID in the index and maybe some other 
useful infos too.
I don't want to store them separately in the database or in another file 
because of administrative
reasons.

Regards,
Sanyi



__ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Storing info about the index in the index

2005-02-17 Thread Erik Hatcher
On Feb 17, 2005, at 8:43 AM, Sanyi wrote:
Hi!
Is there any way to store info about the index in the index?
(You know, like in .doc files on Windows. You can store title, author, 
etc...)
I need to store the last indexed database UID in the index and maybe 
some other useful infos too.
I don't want to store them separately in the database or in another 
file because of administrative
reasons.
There is currently no feature to store additional information in the 
index like this, though you could use a special document in the index 
to do this.
You could also keep a .properties or .xml file alongside the index.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Concurrent searching & re-indexing

2005-02-17 Thread Jim Lynch
Hi, Paul,
I brought this point up a while back and didn't get a response.  I've 
found that I frequently get a "file not found" exception when searching 
at the same time an indexing and/or optimize operation is running.  I 
fixed it by trapping the exception and looping until it didn't fail.

Jim.
Paul Mellor wrote:
Otis,
1. If IndexReader takes a snapshot of the index state when opened and then
reads the files when searching, what would happen if the files it takes a
snapshot of are deleted before the search is performed (as would happen with
a reindexing in the period between opening an IndexSearcher and using it to
search)?
 

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Concurrent searching & re-indexing

2005-02-17 Thread Jim Lynch
It failed for me on Linux.
Paul Mellor wrote:
"on windows you cannot delete open files, so Lucene AFAIK (I don't use
windows) postpones the deletion to a time, when the file is closed"
If Lucene does not in fact postpone the deletion, that would explain the
exception I'm seeing ("java.io.IOException: couldn't delete _a.f1") - the
IndexWriter is attempting to delete the files but the IndexReader has them
open.
Does this then mean that re-indexing whilst searching is inherently unsafe,
but only on Windows?
 

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Searches Contain Special Characters

2005-02-17 Thread Luke Shannon
Hi All;

How could I handle doing a wildcard search on the input *mario?

Basically I would be interested in finding all the Documents containing
*mario

Here is an example of such a Query generated:

+(type:138) +(name:**mario*)

How can I let Lucene know that the star closest to Mario on the left is to
be treated as a string, not a matching character?

Thanks,

Luke



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searches Contain Special Characters

2005-02-17 Thread Volodymyr Bychkoviak
Use \ to escape special symbols.
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
Regards,
Volodymyr Bychkoviak
Luke Shannon wrote:
Hi All;
How could I handle doing a wildcard search on the input *mario?
Basically I would be interested in finding all the Documents containing
*mario
Here is an example of such a Query generated:
+(type:138) +(name:**mario*)
How can I let Lucene know that the star closest to Mario on the left is to
be treated as a string, not a matching character?
Thanks,
Luke

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Concurrent searching & re-indexing

2005-02-17 Thread Luke Francl
On Thu, 2005-02-17 at 04:44, Paul Mellor wrote:
> "on windows you cannot delete open files, so Lucene AFAIK (I don't use
> windows) postpones the deletion to a time, when the file is closed"
> 
> If Lucene does not in fact postpone the deletion, that would explain the
> exception I'm seeing ("java.io.IOException: couldn't delete _a.f1") - the
> IndexWriter is attempting to delete the files but the IndexReader has them
> open.
> 
> Does this then mean that re-indexing whilst searching is inherently unsafe,
> but only on Windows?

Using Lucene 1.3 final, I ran across what I believe to be this problem.

Under heavy load on Windows, deleting the segments file would fail
sometimes. 

I tried to duplicate the problem with an attached debugger, but I was
unable to do so.

There's more details about my problem in this message:
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=11986

Any advice would still be appreciated. Currently, I'm catching the error
and doing a retry in the finally block, but I am not confident in this
solution due to the difficulty of reproducing the problem.

Regards,
Luke Francl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using the highlighter from the sandbox with a prefix query.

2005-02-17 Thread lucuser4851
Thanks very much Marc and Daniel. That solved the problem!!


On Thu, 2005-02-17 at 08:55 +, mark harwood wrote:
> See the highlighter's package.html for a description
> of how query.rewrite should be used to solve this.
> 
> Cheers,
> Mark
> 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: bookkeeping documents cause problem in Sort

2005-02-17 Thread aurora
I found the answer. FieldCacheImpl is trying to look for a sample of the  
term to be sorted to determine the sort type. It run into problem in a  
specical case when the is only a few book keeping documents but no actual  
document with a date term.

I don't seem to remember a problem when the index is completely empty.  
There is probably code to check for empty index but it failed when the is  
actually some documents but they don't have the field to be sorted.

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Query Question

2005-02-17 Thread Luke Shannon
Hello;

Why won't this query find the document below?

Query:
+(type:203) +(name:*home\**)

Document (relevant fields):
Keyword
Keyword

I was hoping by escaping the * it would be treated as a string. What am I
doing wrong?

Thanks,

Luke



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Question

2005-02-17 Thread Erik Hatcher
On Feb 17, 2005, at 2:44 PM, Luke Shannon wrote:
Hello;
Why won't this query find the document below?
Query:
+(type:203) +(name:*home\**)
Is that what the query toString is?  Or is that what you handed to 
QueryParser?

Depending on your analyzer, 203 may go away.  QueryParser doesn't 
support leading asterisks, so "*home" would fail to parse.

Document (relevant fields):
Keyword
Keyword
I was hoping by escaping the * it would be treated as a string. What 
am I
doing wrong?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Query Question

2005-02-17 Thread Luke Shannon
That is a query toString(). I created the Query using a Wildcard Query
object.

Luke

- Original Message - 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 17, 2005 3:00 PM
Subject: Re: Query Question


>
> On Feb 17, 2005, at 2:44 PM, Luke Shannon wrote:
>
> > Hello;
> >
> > Why won't this query find the document below?
> >
> > Query:
> > +(type:203) +(name:*home\**)
>
> Is that what the query toString is?  Or is that what you handed to
> QueryParser?
>
> Depending on your analyzer, 203 may go away.  QueryParser doesn't
> support leading asterisks, so "*home" would fail to parse.
>
> > Document (relevant fields):
> > Keyword
> > Keyword
> >
> > I was hoping by escaping the * it would be treated as a string. What
> > am I
> > doing wrong?
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



java.io.IOException: Stale NFS file handle

2005-02-17 Thread Michael Celona
Has anyone seen this..

 

java.io.IOException: Stale NFS file handle

at java.io.RandomAccessFile.readBytes(Native Method)

at java.io.RandomAccessFile.read(RandomAccessFile.java:307)

at
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:420)

at org.apache.lucene.store.InputStream.refill(InputStream.java:158)

at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)

at
org.apache.lucene.store.InputStream.readBytes(InputStream.java:57)

at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou
ndFileReader.java:220)

at org.apache.lucene.store.InputStream.refill(InputStream.java:158)

at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)

at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)

at
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:102)

at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:361)

at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:366)

at
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:2
68)

at
org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:343)

at
org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedHitQu
eue.java:327)

at
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSorted
HitQueue.java:170)

at
org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java
:58)

at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)

at
org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:141)

at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)

at org.apache.lucene.search.Hits.(Hits.java:51)

at org.apache.lucene.search.Searcher.search(Searcher.java:49)

 

 

I get this during a load test or 5 simultaneous users.  I have the index NFS
mounted from an "indexer box" which holds the index to an application server
(tomcat).  My index is constantly being  added to.  Search performance is in
the 4 second range ( queryString of "the" ) on an index of about 2G (as of
now). does anyone know how I can speed this up. 

 

Any insight would be greatly appreciated.

 

Michael

 

 



Re: Concurrent searching & re-indexing

2005-02-17 Thread Doug Cutting
Paul Mellor wrote:
I've read from various sources on the Internet that it is perfectly safe to
simultaneously search a Lucene index that is being updated from another
Thread, as long as all write access to the index is synchronized.  But does
this apply only to updating the index (i.e. deleting and adding documents),
or to a complete re-indexing (i.e. create a new IndexWriter with the
'create' argument true and then re-add all the documents)?
[ ...]
java.io.IOException: couldn't delete _a.f1
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
[...]
This is running on Windows 2000.
On Windows one cannot delete a file while it is still open.  So, no, on 
Windows one cannot remove an index entirely while an IndexReader or 
Searcher is still open on it, since it is simply impossible to remove 
all the files in the index.

We might attempt to patch this by keeping a list of such files and 
attempt to delete them later (as is done when updating an index).  But 
this could cause problems, as a new index will eventually try to use 
these same file names again, and it would then conflict with the open 
IndexReader.  This is not a problem when updating an existing index, 
since filenames (except for a few which are not kept open, like 
"segments") are never reused in the lifetime of an index.  So, in order 
for such a fix to work we would need to switch to globally unique 
segment names, e.g., long random strings, rather than increasing integers.

In the meantime, the safe way to rebuild an index from scratch while 
other processes are reading it is simply to delete all of its documents, 
then start adding new ones.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Storing info about the index in the index

2005-02-17 Thread Sanyi
> you could use a special document in the index to do this.

I was thinking about this way, but I feel this solution very ugly :)

> You could also keep a .properties or .xml file alongside the index.

Can I store such a file inside the index directory?
Will Lucene delete my file at some event?
(at optimize, or whatever)

Regards,
Sanyi



__ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Query Question

2005-02-17 Thread Luke Shannon
Hello;

My manager is now totally stuck about being able to query data with * in it.

Here are two queries.

TermQuery(new Term("type", "203"));
WildcardQuery(new Term("name", "*home\**"));

They are joined in a boolean query. That query gives this result when you
call the toString():

+(type:203) +(name:*home\**)

This looks right to me.

Any theories as to why the it would not match:

Document (relevant fields):
Keyword
Keyword

Is the \ escaping both * characters?

Thanks,

Luke




- Original Message - 
From: "Luke Shannon" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 17, 2005 2:44 PM
Subject: Query Question


> Hello;
>
> Why won't this query find the document below?
>
> Query:
> +(type:203) +(name:*home\**)
>
> Document (relevant fields):
> Keyword
> Keyword
>
> I was hoping by escaping the * it would be treated as a string. What am I
> doing wrong?
>
> Thanks,
>
> Luke
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Storing info about the index in the index

2005-02-17 Thread aurora
On Thu, 17 Feb 2005 08:53:41 -0500, Erik Hatcher  
<[EMAIL PROTECTED]> wrote:

On Feb 17, 2005, at 8:43 AM, Sanyi wrote:
Hi!
Is there any way to store info about the index in the index?
(You know, like in .doc files on Windows. You can store title, author,  
etc...)
I need to store the last indexed database UID in the index and maybe  
some other useful infos too.
I don't want to store them separately in the database or in another  
file because of administrative
reasons.
There is currently no feature to store additional information in the  
index like this, though you could use a special document in the index to  
do this.
You could also keep a .properties or .xml file alongside the index.

	Erik
I stored the info in some special documents. They have separate field name  
from the main document set so that they would be fetched by regular  
search. I ran into a small problem "bookkeeping documents cause problem in  
Sort" that I posted half day ago. But right everything seems to work fine.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Adding documents in batch, how often?

2005-02-17 Thread aurora
I believe the common wisdom for adding documents is to do them in a large  
batch rather than adding individually. I'm wondering to what extend this  
is significant? What would be the difference between adding 100 documents  
a day to adding 10 documents 10 times a day? Is there a lot of house  
keeping going on for IndexWriter.close()?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Lius

2005-02-17 Thread Rida Benjelloun
Hi,
I've just release an indexing framework based on Lucene witch is named LIUS.
LIUS is written in Java and it adds to Lucene many files format indexing 
functionalities as: Ms Word, Ms Excel, Ms PowerPoint, RTF, PDF, XML, HTML, 
TXT, Open Office suite and JavaBeans. All the indexation process is based 
on a configuration file.
You can visit this links for more informations about LIUS, documentation is 
available in English and French:
www.bibl.ulaval.ca/lius/index.en.html
www.sourceforge.net/projects/lius 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ParrellelMultiSearcher Question

2005-02-17 Thread Youngho Cho
Hello,

Is there any pointer 
how closing an index and how the server deals with index updates
for using ParrellelMultiSearcher with built in RemoteSearcher ??

Need your help.

Thanks,

Youngho

- Original Message - 
From: "Youngho Cho" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 17, 2005 6:29 PM
Subject: ParrellelMultiSearcher Question


> Hello,
> 
> I would like to use ParrellelMultiSearcher with few RemoteSearchables.
> 
> If one of the remote server is down, 
> Can I parrellelMultiSearcher set close() and 
> make new ParrellelMultiSearcher with other live RemoteSearchables ?
> 
> Thanks.
> 
> Youngho

Re: Query Question

2005-02-17 Thread Erik Hatcher
On Feb 17, 2005, at 5:51 PM, Luke Shannon wrote:
My manager is now totally stuck about being able to query data with * 
in it.
He's gonna have to wait a bit longer, you've got a slightly tricky 
situation on your hands

WildcardQuery(new Term("name", "*home\**"));
The \* is the problem.  WildcardQuery doesn't deal with escaping like 
you're trying.  Your query is essentially this now:

home\*
Where backslash has no special meaning at all... you're literally 
looking for all terms that start with home followed by a backslash.  
Two asterisks at the end really collapse into a single one logically.

Any theories as to why the it would not match:
Document (relevant fields):
Keyword
Keyword
Is the \ escaping both * characters?
So, again, no escaping is being done here.  You're a bit stuck in this 
situation because * (and ?) are special to WildcardQuery, and it does 
no escaping.  Two options I think of:

	- Build your own clone of WildcardQuery that does escaping - or 
perhaps change the wildcard characters to something you do not index 
and use those instead.

	- Replace asterisks in the terms indexed with some other non-wildcard 
character, then replace it on your queries as appropriate.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ParrellelMultiSearcher Question

2005-02-17 Thread Erik Hatcher
If you close a Searcher that goes through a RemoteSearchable, you'll 
close the remote index.  I learned this by experimentation for Lucene 
in Action and added a warning there:

http://www.lucenebook.com/search?query=RemoteSearchable+close
On Feb 17, 2005, at 8:27 PM, Youngho Cho wrote:
Hello,
Is there any pointer
how closing an index and how the server deals with index updates
for using ParrellelMultiSearcher with built in RemoteSearcher ??
Need your help.
Thanks,
Youngho
- Original Message -
From: "Youngho Cho" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, February 17, 2005 6:29 PM
Subject: ParrellelMultiSearcher Question

Hello,
I would like to use ParrellelMultiSearcher with few RemoteSearchables.
If one of the remote server is down,
Can I parrellelMultiSearcher set close() and
make new ParrellelMultiSearcher with other live RemoteSearchables ?
Thanks.
Youngho

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


select where from query type in lucene

2005-02-17 Thread Miro Max
Hi,

i've problem with my my classes using lucene.
my index looks like:

type   |   content
-
document   |  x
document   |  x
view   |  x
view   |  x
dbentry|  x
dbentry|  x

my question now:

how can i search for content where type=document or
(type=document OR type=view).
actually i can do it with: "(type:document OR
type:entry) AND queryText" as QueryString.
but does exist any other better way to realize this?

thx

miro




___ 
Gesendet von Yahoo! Mail - Jetzt mit 250MB Speicher kostenlos - Hier anmelden: 
http://mail.yahoo.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



how to get stored fields

2005-02-17 Thread Miro Max
Hello again,

i'm indexing my content as unstored fiels. now i want
to get this fields matching to the query and copy it
to a new index.

do i have to reconstruct this content or can i copy
this content as field to a new index -->

Field f = hits.doc(i).getField("content");
d.add(f);

miro







___ 
Gesendet von Yahoo! Mail - Jetzt mit 250MB Speicher kostenlos - Hier anmelden: 
http://mail.yahoo.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ParrellelMultiSearcher Question

2005-02-17 Thread Youngho Cho
Hello Erik,

Yes.  I read it.
And tried to close the remote index from remote server and client both.
But when I search again,
I received the
IOException: Bad file descriptor

Maybe I am wrong.

Is there any demo sample ?

Thanks.

Youngho

- Original Message - 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Friday, February 18, 2005 11:47 AM
Subject: Re: ParrellelMultiSearcher Question


> If you close a Searcher that goes through a RemoteSearchable, you'll 
> close the remote index.  I learned this by experimentation for Lucene 
> in Action and added a warning there:
> 
> http://www.lucenebook.com/search?query=RemoteSearchable+close
> 
> 
> On Feb 17, 2005, at 8:27 PM, Youngho Cho wrote:
> 
> > Hello,
> >
> > Is there any pointer
> > how closing an index and how the server deals with index updates
> > for using ParrellelMultiSearcher with built in RemoteSearcher ??
> >
> > Need your help.
> >
> > Thanks,
> >
> > Youngho
> >
> > - Original Message -
> > From: "Youngho Cho" <[EMAIL PROTECTED]>
> > To: "Lucene Users List" 
> > Sent: Thursday, February 17, 2005 6:29 PM
> > Subject: ParrellelMultiSearcher Question
> >
> >
> >> Hello,
> >>
> >> I would like to use ParrellelMultiSearcher with few RemoteSearchables.
> >>
> >> If one of the remote server is down,
> >> Can I parrellelMultiSearcher set close() and
> >> make new ParrellelMultiSearcher with other live RemoteSearchables ?
> >>
> >> Thanks.
> >>
> >> Youngho
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

The problem of using Cyber Neko HTML Parser parse HTML files

2005-02-17 Thread Jingkang Zhang
When I was using Cyber Neko HTML Parser parse HTML
files( created by Microsoft word ), if the file
contains HTML built-in entity references(for example:
 ) , node value may contain unknown character. 

Like this:
source html:

-rw-r--r--    1 root
root  
50 Jan 21 16:12
_1e.f6


after parsing html:
-rw-r--r--??1 root?? root? 50 Jan 21 16:12
_1e.f6

How can I avoid it?

_
Do You Yahoo!?
150万曲MP3疯狂搜,带您闯入音乐殿堂
http://music.yisou.com/
美女明星应有尽有,搜遍美图、艳图和酷图
http://image.yisou.com
1G就是1000兆,雅虎电邮自助扩容!
http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/event/mail_1g/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: The problem of using Cyber Neko HTML Parser parse HTML files

2005-02-17 Thread Jason Polites
This is not an unknown character.. it is a non breaking space (unicode value 
0x00A0)

- Original Message - 
From: "Jingkang Zhang" <[EMAIL PROTECTED]>
To: 
Sent: Friday, February 18, 2005 5:12 PM
Subject: The problem of using Cyber Neko HTML Parser parse HTML files


When I was using Cyber Neko HTML Parser parse HTML
files( created by Microsoft word ), if the file
contains HTML built-in entity references(for example:
 ) , node value may contain unknown character.
Like this:
source html:

-rw-r--r--    1 root
root  
50 Jan 21 16:12
_1e.f6

after parsing html:
-rw-r--r--ç?1 rootçç rootç 50 Jan 21 16:12
_1e.f6
How can I avoid it?
_
Do You Yahoo!?
150äæMP3ççæïåæéåéäæå
http://music.yisou.com/
çåææåæåæïæéçåãèååéå
http://image.yisou.com
1Gåæ1000åïéèçéèåæåï
http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/event/mail_1g/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ParrellelMultiSearcher Question

2005-02-17 Thread Youngho Cho
Hi,

I found my problem,

The remoteServer index wasn't closed expectedly.
Also after reopen the remoteServer searcher,
the client side searcher also should reconnected.

Thanks.

Youngho

- Original Message - 
From: "Youngho Cho" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Friday, February 18, 2005 1:38 PM
Subject: Re: ParrellelMultiSearcher Question


> Hello Erik,
> 
> Yes.  I read it.
> And tried to close the remote index from remote server and client both.
> But when I search again,
> I received the
> IOException: Bad file descriptor
> 
> Maybe I am wrong.
> 
> Is there any demo sample ?
> 
> Thanks.
> 
> Youngho
> 
> - Original Message - 
> From: "Erik Hatcher" <[EMAIL PROTECTED]>
> To: "Lucene Users List" 
> Sent: Friday, February 18, 2005 11:47 AM
> Subject: Re: ParrellelMultiSearcher Question
> 
> 
> > If you close a Searcher that goes through a RemoteSearchable, you'll 
> > close the remote index.  I learned this by experimentation for Lucene 
> > in Action and added a warning there:
> > 
> > http://www.lucenebook.com/search?query=RemoteSearchable+close
> > 
> > 
> > On Feb 17, 2005, at 8:27 PM, Youngho Cho wrote:
> > 
> > > Hello,
> > >
> > > Is there any pointer
> > > how closing an index and how the server deals with index updates
> > > for using ParrellelMultiSearcher with built in RemoteSearcher ??
> > >
> > > Need your help.
> > >
> > > Thanks,
> > >
> > > Youngho
> > >
> > > - Original Message -
> > > From: "Youngho Cho" <[EMAIL PROTECTED]>
> > > To: "Lucene Users List" 
> > > Sent: Thursday, February 17, 2005 6:29 PM
> > > Subject: ParrellelMultiSearcher Question
> > >
> > >
> > >> Hello,
> > >>
> > >> I would like to use ParrellelMultiSearcher with few RemoteSearchables.
> > >>
> > >> If one of the remote server is down,
> > >> Can I parrellelMultiSearcher set close() and
> > >> make new ParrellelMultiSearcher with other live RemoteSearchables ?
> > >>
> > >> Thanks.
> > >>
> > >> Youngho
> > 
> > 
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]

Re: Re: The problem of using Cyber Neko HTML Parser parse HTML files

2005-02-17 Thread Jingkang Zhang
Thank you. But how can I view correct output? If my
html files using different encode method (Like :
UTF-8, ISO8859-1, GBK , JIS, etc) , how can I treat
it?



 --- Jason Polites <[EMAIL PROTECTED]> 的正文:
> This is not an unknown character.. it is a non
> breaking space (unicode value 
> 0x00A0)
> 
> 
> - Original Message - 
> From: "Jingkang Zhang" <[EMAIL PROTECTED]>
> To: 
> Sent: Friday, February 18, 2005 5:12 PM
> Subject: The problem of using Cyber Neko HTML Parser
> parse HTML files
> 
> 
> > When I was using Cyber Neko HTML Parser parse HTML
> > files( created by Microsoft word ), if the file
> > contains HTML built-in entity references(for
> example:
> >  ) , node value may contain unknown
> character.
> >
> > Like this:
> > source html:
> > 
> >  > size=3>-rw-r--r--    1 root > style="mso-spacerun: yes">
> > root  
> > 50 Jan 21 16:12
> > _1e.f6
> > 
> >
> > after parsing html:
> > -rw-r--r--??1 root??? root50
> Jan 21 16:12
> > _1e.f6
> >
> > How can I avoid it?
> >
> >
>
_
> > Do You Yahoo!?
> > 150涓??MP3??甯??充?娈垮?
> > http://music.yisou.com/
> >
>
缇?コ???搴??灏芥?锛??俱€???惧??峰?
> > http://image.yisou.com
> > 1G灏辨?1000???甸?╁?锛?> >
>
http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/event/mail_1g/
> >
> >
>
-
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> > 
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
>  

_
Do You Yahoo!?
150万曲MP3疯狂搜,带您闯入音乐殿堂
http://music.yisou.com/
美女明星应有尽有,搜遍美图、艳图和酷图
http://image.yisou.com
1G就是1000兆,雅虎电邮自助扩容!
http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/event/mail_1g/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]