RE: Reopen IndexWriter after delete?

2003-11-12 Thread Wilton, Reece
I agree it's a bit of a strange design.

It seems that there should be one class that handles all modifications
of the index.  Usually you'd only have one instance of this so you
wouldn't need to open and close it all the time (I'm basically writing
one of these classes myself to simplify my code.  I'm sure other people
have written a similar class).  There should be another class that is
responsible for searching.  You may have multiple instances of this so
you can have multiple headends searching the index.

The IndexWriter and IndexReader almost do this separation.  It seems
that if the IndexWriter had the delete functionality, instead of the
IndexReader, things would be a lot simplier (from a synchronization
standpoint).  Maybe Otis, Erik or Doug could suggest why this may or may
not be a good idea.

-Reece

-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 12, 2003 12:06 PM
To: Lucene Users List
Subject: Re: Reopen IndexWriter after delete?

Which begs the question: why do you need to use an IndexReader rather
than an IndexWriter to delete an item?

On Tue, Nov 11, 2003 at 02:46:37PM -0800, Otis Gospodnetic wrote:
> > 1).  If I delete a term using an IndexReader, can I use an existing
> > IndexWriter to write to the index?  Or do I need to close and reopen
> > the IndexWriter?
> 
> No.  You should close IndexWriter first, then open IndexReader, then
> call delete, then close IndexReader, and then open a new IndexWriter.
> 
> > 2).  Is it safe to call IndexReader.delete(term) while an
IndexWriter
> > is
> > writing?  Or should I be synchronizing these two tasks so only one
> > occurs at a time?
> 
> No, it is not safe.  You should close the IndexWriter, then delete the
> document and close IndexReader, and then get a new IndexWriter and
> continue writing.
> 
> Incidentally, I just wrote a section about concurrency issues and
about
> locking in Lucene for the upcoming Lucene book.
> 
> Otis
> 
> 
> __
> Do you Yahoo!?
> Protect your identity with Yahoo! Mail AddressGuard
> http://antispam.yahoo.com/whatsnewfree
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Index pdf files with your content in lucene.

2003-11-11 Thread Wilton, Reece
Some of us have corporate firewalls that are stripping out attachments.  If possible, 
put these on a web site somewhere so we can download them.  Thanks!

-Original Message-
From: Ernesto De Santis [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 11, 2003 11:07 AM
To: Lucene Users List
Subject: Index pdf files with your content in lucene.

Classes for index Pdf and word files in lucene.
Ernesto.

- Original Message -
From: "Ernesto De Santis" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, October 29, 2003 12:04 PM
Subject: Re: [opencms-dev] Index pdf files with your content in lucene.


Hello all,

Thans very much Stephan for your valuable help.
Attached you will find the PDFDocument, and WordDocument class source code

Ernesto.


- Original Message -
From: "Hartmann, Waehrisch & Feykes GmbH" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, October 28, 2003 11:10 AM
Subject: Re: [opencms-dev] Index pdf files with your content in lucene.


> Hi Ernesto,
>
> the IndexManager retrieves a list of files of a folder by calling the
method
> getFilesInFolder of CmsObject. This method returns only empty files, i.e.
> with empty content. To get the content of a pdf file you have to reread
the
> file:
> f = cms.readFile(f.getAbsolutePath());
>
> Bye,
> Stephan
>
> Am Montag, 27. Oktober 2003 19:18 schrieben Sie:
>
> > > Hello
> >
> > Thanks for the previous reply.
> >
> > Now, i use
> > - version 1.4 of lucene searche module. (the version attached in this
list)
> > - new version of registry.xml format for module. (like you write me)
> > - the pdf files are stored with the binary type.
> >
> > But i have the next problem:
> > i canĀ“t make a InputStream for the cmsfile content.
> > For this i write this code in de Document method of my class
PDFDocument:
> >
> > -
> >
> > InputStream in = new ByteArrayInputStream(f.getContents()); //f is the
> > parameter CmsFile of the Document method
> >
> > PDFExtractor extractor = new PDFExtractor(); //PDFExtractor is lib i
use.
> > in file system work fine.
> >
> >
> > bodyText = extractor.extractText(in);
> >
> > 
> >
> > Is correct use ByteArrayInputStream for make a InputStream for a
CmsFile?
> >
> > The error ocurr in the third line.
> > In the PDFParcer.
> > the error menssage in tomcat is:
> >
> > java.io.IOException: Error: Header is corrupt ''
> > at PDFParcer.parse
> > at PDFExtractor.extractText
> > at PDFDocument.Document (my class)
> > at.
> >
> > By, and thanks.
> > Ernesto.
> >
> >
> > - Original Message -
> >   From: Hartmann, Waehrisch & Feykes GmbH
> >   To: [EMAIL PROTECTED]
> >   Sent: Friday, October 24, 2003 4:45 AM
> >   Subject: Re: [opencms-dev] Index pdf files with your content in
lucene.
> >
> >
> >   Hello Ernesto,
> >
> >   i assume you are using the unpatched version 1.3 of the search module.
> >   As i mentioned yesterday, the plainDocFactory does only index cmsFiles
of
> > type "plain" but not of type "binary". PDF files are stored as binary. I
> > suggest to use the version i posted yesterday. Then your registry.xml
would
> > have to look like this: ...
> >   
> >   ...
> >  
> >   ...
> >  
> >  
> > 
> >.pdf
> >
net.grcomputing.opencms.search.lucene.PDFDocument
> > 
> >  
> >   ...
> >   
> >
> >   Important: The type attribute must match the file types of OpenCms
(also
> > defined in the registry.xml).
> >
> >   Bye,
> >   Stephan
> >
> > - Original Message -
> > From: Ernesto De Santis
> > To: Lucene Users List
> > Cc: [EMAIL PROTECTED]
> > Sent: Thursday, October 23, 2003 4:16 PM
> > Subject: [opencms-dev] Index pdf files with your content in lucene.
> >
> >
> > Hello
> >
> > I am new in opencms and lucene tecnology.
> >
> > I won index pdf files, and index de content of this files.
> >
> > I work in this way:
> >
> > Make a PDFDocument class like JspDocument class.
> > use org.textmining.text.extraction.PDFExtractor class, this class
work
> > fine out of vfs.
> >
> > and write my registry.xml for pdf document, in plainDocFactory tag.
> >
> > 
> > .pdf
> > 
> >
> > net.grcomputing.opencms.search.lucene.PDFDocument
> > 
> >
> > my PDFDocument content this code:
> > I think that the probrem is how take the content from CmsFile?, what
> > InputStream use? PDFExtractor work with extractText(InputStream) method.
> >
> > public class PDFDocument implements I_DocumentConstants,
> > I_DocumentFactory {
> >
> > public PDFDocument(){
> >
> > }
> >
> >
> > public Document Document(CmsObject cmsobject, CmsFile cmsfile)
> >
> > throws CmsException
> >
> > {
> >
> > return Document(cmsobject, cmsfile, null);
> >
> > }
> >
> > public Document Document(CmsObject cmsobject, CmsFile cmsfile,
HashMap
> > hashmap)
> >
> > throws CmsException
>

Reopen IndexWriter after delete?

2003-11-11 Thread Wilton, Reece
Hi,

A couple questions...

1).  If I delete a term using an IndexReader, can I use an existing
IndexWriter to write to the index?  Or do I need to close and reopen the
IndexWriter?

2).  Is it safe to call IndexReader.delete(term) while an IndexWriter is
writing?  Or should I be synchronizing these two tasks so only one
occurs at a time?

Any help is appreciated!
-Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Exact Match

2003-10-22 Thread Wilton, Reece
Good idea!  Thanks! 

-Original Message-
From: Tate Avery [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2003 9:54 AM
To: Lucene Users List
Subject: RE: Exact Match


To ensure I understand...

If you have:

1)  A B C
2)  B C
3)  B C D
4)  C

You want "B C" to match #2 only
But, "C" to match #1, #2, #3, and #4

If so, you can have a tokenized field and an untokenized one...

Use the untokenized for matching 'exact' strings
Use the tokenized for finding a single word in the string

I.e.  check "B C" against untokenized
  check "C" against tokenized

That is, if you don't mind indexing the same data into 2 different
fields.


-Original Message-
From: Wilton, Reece [mailto:[EMAIL PROTECTED]
Sent: October 22, 2003 12:49 PM
To: Lucene Users List
Subject: RE: Exact Match


If I use an untokenized field, would "fox" match this as well?  I need
to support both exact match searches and searches where one word exists
in the field.

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2003 9:44 AM
To: Lucene Users List
Subject: Re: Exact Match

Wilton, Reece wrote:
> Does Lucene support exact matching on a tokenized field?
> 
> So for example... if I add these three phrases to the index:
> - "The quick brown fox"
> - "The quick brown fox jumped"
> - "brown fox"
> 
> I want to be able to do an exact field match so when I search for
"brown
> fox" I only get the last one returned.  I can do this in my own code
by
> storing the data and then comparing it to the search phrase.  Is that
> the best way of doing this?

Why not just use an untokenized field?  Then just use a TermQuery, 
searching for the term "brown fox".

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Exact Match

2003-10-22 Thread Wilton, Reece
Yes, that's what I'm doing.  Just wanted to see what other ideas where
out there. 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2003 9:12 AM
To: Lucene Users List
Subject: Re: Exact Match

There is no direct support for that.  However, if one of your documents
contains _only_: "brown fox", won't a search for "brown fox" give that
document the highest score, as it is the closest match, allowing you to
just pop the first hit?  It's no guarantee that the first hit is the
exact match (what if there are no exact matches in the index), but
that's a simple check to perform in your application.

Otis


--- "Wilton, Reece" <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> Does Lucene support exact matching on a tokenized field?
> 
> So for example... if I add these three phrases to the index:
> - "The quick brown fox"
> - "The quick brown fox jumped"
> - "brown fox"
> 
> I want to be able to do an exact field match so when I search for
> "brown
> fox" I only get the last one returned.  I can do this in my own code
> by
> storing the data and then comparing it to the search phrase.  Is that
> the best way of doing this?
> 
> Thanks,
> Reece
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Exact Match

2003-10-22 Thread Wilton, Reece
If I use an untokenized field, would "fox" match this as well?  I need
to support both exact match searches and searches where one word exists
in the field.

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2003 9:44 AM
To: Lucene Users List
Subject: Re: Exact Match

Wilton, Reece wrote:
> Does Lucene support exact matching on a tokenized field?
> 
> So for example... if I add these three phrases to the index:
> - "The quick brown fox"
> - "The quick brown fox jumped"
> - "brown fox"
> 
> I want to be able to do an exact field match so when I search for
"brown
> fox" I only get the last one returned.  I can do this in my own code
by
> storing the data and then comparing it to the search phrase.  Is that
> the best way of doing this?

Why not just use an untokenized field?  Then just use a TermQuery, 
searching for the term "brown fox".

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Exact Match

2003-10-22 Thread Wilton, Reece
Hi,

Does Lucene support exact matching on a tokenized field?

So for example... if I add these three phrases to the index:
- "The quick brown fox"
- "The quick brown fox jumped"
- "brown fox"

I want to be able to do an exact field match so when I search for "brown
fox" I only get the last one returned.  I can do this in my own code by
storing the data and then comparing it to the search phrase.  Is that
the best way of doing this?

Thanks,
Reece



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Too Many Open Files

2003-10-09 Thread Wilton, Reece
Thanks Doug!  Reducing the MergeFactor to 10 reduced the number of files
in the index dramatically.
-Reece 

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 07, 2003 8:20 PM
To: Lucene Users List
Subject: Re: Too Many Open Files

Wilton, Reece wrote:
> The index directory that Lucene created has 2,322 files in it.  When I
> try to open it I get the dreaded "Too Many Open Files" problem:
> java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
> files)
> 
> The index has about 50,000 docs in it.  It was created with a merge
> factor of 5,000.  Is there a way that I can reduce the number of files
> or increase the number of files that windows can open?

5000 is way too large for the merge factor.  Please read the FAQ and 
other messages on this list for guidelines.  I've personally never found

use for a merge factor larger than 50.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Too Many Open Files

2003-10-08 Thread Wilton, Reece
Ok, I lowered my MergeFactor down to 10 and am re-indexing (takes
several hours).  Will lowering the MergeFactor reduce the total number
of files in the index directory?
-Reece 

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 07, 2003 8:20 PM
To: Lucene Users List
Subject: Re: Too Many Open Files

Wilton, Reece wrote:
> The index directory that Lucene created has 2,322 files in it.  When I
> try to open it I get the dreaded "Too Many Open Files" problem:
> java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
> files)
> 
> The index has about 50,000 docs in it.  It was created with a merge
> factor of 5,000.  Is there a way that I can reduce the number of files
> or increase the number of files that windows can open?

5000 is way too large for the merge factor.  Please read the FAQ and 
other messages on this list for guidelines.  I've personally never found

use for a merge factor larger than 50.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Too Many Open Files

2003-10-07 Thread Wilton, Reece
Hi,

The index directory that Lucene created has 2,322 files in it.  When I
try to open it I get the dreaded "Too Many Open Files" problem:
java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
files)

The index has about 50,000 docs in it.  It was created with a merge
factor of 5,000.  Is there a way that I can reduce the number of files
or increase the number of files that windows can open?

Any help is appreciated!
Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: lucene complains about invalid class file format in rt.jar

2003-10-06 Thread Wilton, Reece
Classes compiled with JDK 1.4 or above may generate classes that are
incompatible with versions below 1.4.  Compile the code with 1.3 and you
should be ok.

-Original Message-
From: Rob Tanner [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 03, 2003 2:27 PM
To: [EMAIL PROTECTED]
Subject: lucene complains about invalid class file format in rt.jar


Hi,

I'm having quite a bit of success with Lucene designing a new search
tool for our website -- the only problem is that I've had to drop down
to java 1.3.6 (all our production system are java 1.4.x)

The typical error I get on the jsp is:

error: Invalid class file format in 
/usr/java/j2sdk1.4.0/jre/lib/rt.jar(java/io/ObjectInputStream.class). 
The major.minor version '48.0' is too recent for this tool to 
understand.
/usr/local/tomcat/work/localhost_8080%2Fluceneweb/_0002fresults_0002ejs
presults_jsp_73.java:8: Class java.io.ObjectInputStream not found in 
import.
import java.io.ObjectInputStream;

I used javacc to process the .jj files and then rebuilt all the source
using java 1.4.0, and I can build the index just fine, but the copy of
the demo results.jsp, which I've been modifying for our needs, 
generates
a number of the above example errors.

What's going on here?

Thanks,
Rob

   _ _ _ _   __ _ _ _ _
  /\_\_\_\_\/\_\ /\_\_\_\_\_\
 /\/_/_/_/_/   /\/_/ \/_/_/_/_/_/  QUIDQUID LATINE DICTUM SIT,
/\/_/__\/_/ __/\/_//\/_/  PROFUNDUM VIDITUR
   /\/_/_/_/_/ /\_\  /\/_//\/_/
  /\/_/ \/_/  /\/_/_/\/_//\/_/ (Whatever is said in Latin
  \/_/  \/_/  \/_/_/_/_/ \/_/  appears profound)

  Rob Tanner
  UNIX Services Manager
  Linfield College, McMinnville OR
  (503) 883-2558 <[EMAIL PROTECTED]>



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: cant rename segments.new to segment

2003-09-19 Thread Wilton, Reece
Are people having this same issue on Linux or is this just a Windows
issue?

-Original Message-
From: Rociel Buico [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 18, 2003 11:51 PM
To: [EMAIL PROTECTED]
Subject: cant rename segments.new to segment


all,
when i'm trying to run my index writer program at idea (IDE) i got this
error (cant rename segments.new to segments, sometimes the deletable
file got an error), but when im going to run the program in the command
prompt, it looks fine, no error returned.
 
im just making 1 index and no threads
is this a bug?


-
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: cant rename segments.new to segment

2003-09-19 Thread Wilton, Reece
This occurs for me on Windows XP when the mergefactor is low or when
there is more than one reader or writer open.  Try increasing the
mergefactor.  Also, don't open and close the writer after each document.
Open the writer once, index all your docs and then close the writer.
You'll have much better success!

-Original Message-
From: Rociel Buico [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 18, 2003 11:51 PM
To: [EMAIL PROTECTED]
Subject: cant rename segments.new to segment


all,
when i'm trying to run my index writer program at idea (IDE) i got this
error (cant rename segments.new to segments, sometimes the deletable
file got an error), but when im going to run the program in the command
prompt, it looks fine, no error returned.
 
im just making 1 index and no threads
is this a bug?


-
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: too many files within one index

2003-09-16 Thread Wilton, Reece
The lower the number, the more often a new file is created.  Since
creating/opening files with Lucene seems to cause issues on Windows, I
increased this to 100.  This solved my file locking problems and still
gave me good performance (I am able to perform over 200 queries/second
from a servlet being hit from a remote machine).

-Original Message-
From: hui [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 15, 2003 6:46 AM
To: Lucene Users List
Subject: Re: too many files within one index


Hi Reece,
Thank you for the clue.
Based on the document from Otis, if I understand welll, it seems a lower
mergeFactor should be used.
http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html

Regards,
Hui

- Original Message - 
From: "Wilton, Reece" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Friday, September 12, 2003 1:37 PM
Subject: RE: too many files within one index


Set the mergefactor to a higher number.

-Original Message-
From: hui [mailto:[EMAIL PROTECTED]
Sent: Friday, September 12, 2003 10:25 AM
To: Lucene Users List
Subject: too many files within one index


Hi,
I am using Lucene 1.3 rc released on March.20, 2003.
For some reason, the files in the index directory increase very very
quickly for some documents and reach over 3000 files within half hour
and even can not optimize the index.Then it grows up to 33000 files
within one index. In most of other testing cases, it normally never
reach 1000 even without optimization.

Any idea about this?

regards,
Hui


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: too many files within one index

2003-09-12 Thread Wilton, Reece
Set the mergefactor to a higher number.

-Original Message-
From: hui [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 12, 2003 10:25 AM
To: Lucene Users List
Subject: too many files within one index


Hi,
I am using Lucene 1.3 rc released on March.20, 2003.
For some reason, the files in the index directory increase very very
quickly for some documents and reach over 3000 files within half hour
and even can not optimize the index.Then it grows up to 33000 files
within one index. In most of other testing cases, it normally never
reach 1000 even without optimization.

Any idea about this?

regards,
Hui


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Index commit.lock

2003-09-12 Thread Wilton, Reece
Are you running on Windows?  Make sure you only have one IndexWriter or
IndexReader open at a time.  When I had both open I would get this
message.  Also, increase the mergefactor to 100 or so.

-Original Message-
From: Rociel Buico [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 11, 2003 11:42 PM
To: Lucene Users List
Subject: Index commit.lock



hello,

my index is being accessed by multiple threads, 

and im getting an ioexception saying about commit.lock.

is there any solution to avoid commit.lock?

tia,

buics


-
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Multiple reads from an IndexSearcher?

2003-09-10 Thread Wilton, Reece
Doh!  I was creating a new IndexSearcher on every request.  Doh!

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 10, 2003 2:28 AM
To: Lucene Users List
Subject: Re: Multiple reads from an IndexSearcher?


Yes, you can use a single IndexSearcher with multiple threads.  I have
done so in servlet environments. This is a common question, so I suggest
you check the archives. Make sure your index is optimized.

Otis

--- "Wilton, Reece" <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I have a single IndexSearcher pointing at an index.  Multiple threads 
> are calling the search method.  Is this safe to do?  I presume it is 
> the right way to do it instead of creating a new IndexSearcher per
> thread.
> 
> But I'm having the dreaded "Too many open files" exception.  I'm 
> running on Windows XP (of course, otherwise it would work!).  Any 
> ideas on how
> to fix this (other than upgrading to Linux)?  Is having one
> IndexSearcher the right thing to do?  No other processes are
> accessing
> this index so I'm surprised that I'm getting this error.
> 
> Here's the exception:
> 
> *E,17:25:19.649,/HyphenTest 5> java.io.IOException: Too many open 
> files
> *E,17:25:19.664,/HyphenTest 5>at
> java.io.WinNTFileSystem.createFileExclusively(Native Method)
> *E,17:25:19.664,/HyphenTest 5>at
> java.io.File.createNewFile(File.java:828)
> *E,17:25:19.664,/HyphenTest 5>at
> org.apache.lucene.store.FSDirectory$1.obtain(FSDirectory.java:324)
> *E,17:25:19.664,/HyphenTest 5>at
> org.apache.lucene.store.Lock.obtain(Lock.java:91)
> *E,17:25:19.664,/HyphenTest 5>at
> org.apache.lucene.store.Lock$With.run(Lock.java:146)
> *E,17:25:19.664,/HyphenTest 5>at
> org.apache.lucene.index.IndexReader.open(IndexReader.java:103)
> *E,17:25:19.664,/HyphenTest 5>at
> org.apache.lucene.index.IndexReader.open(IndexReader.java:91)
> *E,17:25:19.664,/HyphenTest 5>at
> org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:75)
> 
> Any help/advice is appreciated,
> Reece
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multiple reads from an IndexSearcher?

2003-09-09 Thread Wilton, Reece
Hi,

I have a single IndexSearcher pointing at an index.  Multiple threads
are calling the search method.  Is this safe to do?  I presume it is the
right way to do it instead of creating a new IndexSearcher per thread.

But I'm having the dreaded "Too many open files" exception.  I'm running
on Windows XP (of course, otherwise it would work!).  Any ideas on how
to fix this (other than upgrading to Linux)?  Is having one
IndexSearcher the right thing to do?  No other processes are accessing
this index so I'm surprised that I'm getting this error.

Here's the exception:

*E,17:25:19.649,/HyphenTest 5> java.io.IOException: Too many open files
*E,17:25:19.664,/HyphenTest 5>  at
java.io.WinNTFileSystem.createFileExclusively(Native Method)
*E,17:25:19.664,/HyphenTest 5>  at
java.io.File.createNewFile(File.java:828)
*E,17:25:19.664,/HyphenTest 5>  at
org.apache.lucene.store.FSDirectory$1.obtain(FSDirectory.java:324)
*E,17:25:19.664,/HyphenTest 5>  at
org.apache.lucene.store.Lock.obtain(Lock.java:91)
*E,17:25:19.664,/HyphenTest 5>  at
org.apache.lucene.store.Lock$With.run(Lock.java:146)
*E,17:25:19.664,/HyphenTest 5>  at
org.apache.lucene.index.IndexReader.open(IndexReader.java:103)
*E,17:25:19.664,/HyphenTest 5>  at
org.apache.lucene.index.IndexReader.open(IndexReader.java:91)
*E,17:25:19.664,/HyphenTest 5>  at
org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:75)

Any help/advice is appreciated,
Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Exceptions while Updating an Index

2003-08-28 Thread Wilton, Reece
Hi,

I am getting exceptions because Lucene can't rename files.  Here are a
couple of the exceptions that I'm getting:
 - java.io.IOException: couldn't rename _6lr.tmp to _6lr.del
 - java.io.IOException: couldn't rename segments.new to segments

I am able to index many documents successfully on my Windows machine.
The problem occurs for me during the updating process.  My updating
process goes like this:

  for (each xml file i want to index) {
// create new document
parse the xml file
populate a new Lucene document with the fields from my XML file

// remove old document from index
open an index reader
delete the term from the index   // this successfully deletes the
one document
close the index reader

// add new document to index
open an index writer
add the document to the index writer
close the index writer
  }
   
Any ideas on how to stop these exceptions from occuring?  No other
process is reading or writing to the index while this process is
running.

Thanks,
Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Exceptions while Updating an Index

2003-08-27 Thread Wilton, Reece
Hi,

I am getting exceptions because Lucene can't rename files.  Here are a
couple of the exceptions that I'm getting:
 - java.io.IOException: couldn't rename _6lr.tmp to _6lr.del
 - java.io.IOException: couldn't rename segments.new to segments

I am able to index many documents successfully on my Windows machine.
The problem occurs for me during the updating process.  My updating
process goes like this:

  for (each xml file i want to index) {
// create new document
parse the xml file
populate a new Lucene document with the fields from my XML file

// remove old document from index
open an index reader
delete the term from the index   // this successfully deletes the
one document
close the index reader

// add new document to index
open an index writer
add the document to the index writer
close the index writer
  }
   
Any ideas on how to stop these exceptions from occuring?  No other
process is reading or writing to the index while this process is
running.

Thanks,
Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Safe to write while optimizing?

2003-07-29 Thread Wilton, Reece
Thanks for the reply.  

Just to clarify:
You are saying that I can optimize and add a document at the same time
as long as both threads use the same IndexWriter.  Is that correct?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 29, 2003 2:27 PM
To: Lucene Users List
Subject: Re: Safe to write while optimizing?


Wilton, Reece wrote:
> Three questions:
> - Is it safe to have two IndexWriters open on the same index?

No.  It is not safe, and the code makes every attempt to prohibit it.

> - Is it safe to have two IndexWriters adding a document concurrently?

No, but you can have two threads adding documents to a single 
IndexWriter concurrently.

> - Is it safe to add a document while another IndexWriter is optimizing

> the index?

No, but, so long as you use a single IndexWriter object, synchronization

should handle things correctly so that one thread can add documents 
while another optimizes.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Safe to write while optimizing?

2003-07-29 Thread Wilton, Reece
Three questions:
- Is it safe to have two IndexWriters open on the same index?
- Is it safe to have two IndexWriters adding a document concurrently?
- Is it safe to add a document while another IndexWriter is optimizing
the index?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Advice on updating an index?

2003-07-21 Thread Wilton, Reece
Thanks Ype.  What do you mean by multiple documents?  Do you mean
multiple indexes?

Or do you mean, delete several documents from the index and then add the
new copies of them.  And then get the next batch of new documents.

-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED] 
Sent: Saturday, July 12, 2003 10:19 AM
To: Lucene Users List; Wilton, Reece
Subject: Re: Advice on updating an index?


Reece,

On Friday 11 July 2003 16:05, Wilton, Reece wrote:
> Hi,
>
> I'm having a bit of trouble figuring out the logic for deleting 
> documents from an index.  Any advice is appreciated!



> 4) I created an index with an IndexWriter and then optimized it and 
> closed it. For each document:
> - I create a new IndexReader, delete the document and close the
> IndexReader
> - I create a new IndexWriter, add the document and close the
IndexWriter
> At the end I open the index with an IndexWriter and then optimize it
and
> close it.
>
> This works!  But it is pretty slow (compared to the other three 
> tests). Is this the best way of doing this?

AFAIK, yes.
You can speed this up by using multiple documents, ie. use
a document set.
Also, you don't need to close the index writer before optimizing.

One variation: you might leave the IndexReader open in case you need it
for searching, but I wouldn't recommend that under Windows because there
an open file cannot be deleted from a directory. Lucene deletes such
files during later optimizations.

Kind regards,
Ype

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Advice on updating an index?

2003-07-11 Thread Wilton, Reece
Hi,

I'm having a bit of trouble figuring out the logic for deleting
documents from an index.  Any advice is appreciated!

1) I created an index with an IndexWriter and then optimized it and
closed it.  Then I opened an IndexReader and deleted each document using
indexReader.delete(new Term("ID", id)).  Then I opened an IndexWriter
again and added the docs.

This worked wonderfully and was fast!  But when I'm updating an index,
I'd prefer to delete the old document and then add the new one.  I don't
want to remove all the docs and then re-add them because people who are
currently searching will get no results.

2) I created an index with an IndexWriter and then optimized it and
closed it.  Then I opened an IndexReader and IndexWriter.  
For each document:
- I delete the document using the IndexReader
- I add the document using the IndexWriter
At the end I close the reader and use the writer to optimize.

This doesn't work. :-( The IndexReader never finds anything to delete.
I presume its because the IndexWriter has the index open.

3) I created an index with an IndexWriter and then optimized it and
closed it.  Then I opened the index with an IndexWriter.
For each document:
- I create a new IndexReader, delete the document and close the
IndexReader
- I add the document using the IndexWriter
At the end I use the writer to optimize.

This doesn't work. :-( The IndexReader never finds anything to delete.
I presume its because the IndexWriter has the index open.

4) I created an index with an IndexWriter and then optimized it and
closed it.
For each document:
- I create a new IndexReader, delete the document and close the
IndexReader
- I create a new IndexWriter, add the document and close the IndexWriter
At the end I open the index with an IndexWriter and then optimize it and
close it.

This works!  But it is pretty slow (compared to the other three tests).
Is this the best way of doing this?

BTW, I'm using Lucene 1.3 rc1 on Windows XP with JDK 1.4.2.

Thanks,
Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Getting back a Date object?

2003-07-10 Thread Wilton, Reece
Hi,

There is a Field.Keyword factory method that takes a date.  I'm using it
like this:

  searchDoc.add(Field.Keyword("TIMESTAMP", new Date()));

The JavaDoc for this method says it is stored in the index, for return
with hits.  So how do I get it out of the Hits?

The only thing I can see is getting a String or a Field from Hits.  And
Field only allows you to get a String from it.  Should I be using the
readerValue() for this?

Any help is appreciated!
Reece

-Original Message-
From: Francesco Bellomi [mailto:[EMAIL PROTECTED] 
Sent: Sunday, July 06, 2003 1:50 PM
To: Lucene Users List
Subject: Directory implementation using NIO


Hi,

I developed a Directory implementation that accesses an index stored on
the filesystem using memory-mapped files (as provided by the NIO API,
introduced in Java 1.4).

You can download the complied jar and the source from here:
www.fran.it/lucene-NIO.zip

Basically there are 3 new classes: NIODirectory, NIOInputStream and
NIOOutputStream. They are heavily based on FSDirectory, FSInputStream
and FSOutputStream.

NIOInputStram provides memory-mapped access to files. It does not rely
on Lucene InputStream's caching feature, since direct access to the
memory-mapped file should be faster. Also, cloned stream with
independent positions are implemented using NIO buffer duplication (a
buffer duplicate holds the same content but has its own position), and
so the implementation logic is much simpler than FSInputStream's.

Some methods of Directory have been overridden to replace the caching
feature. Some of then were final in Directory, so I have used a slightly
modified version of Directory.java (BTW, I wonder why so many methods in
Lucene are made final...)

These classes only works with the recently released Java 1.4.2. This is
due to the fact that buffers connected with memory-mapped files could
not be programmatically unmapped in previous releases, (they were
unmapped only through finalization) and actively mapped files cannot be
deleted. These issue are partially resolved with 1.4.2.

NIOOutputStream is the same as FSOutputStream; I don't know any way to
take advantege of NIO for writing indexes (memory mapped buffers have a
static size, so they are not useful if your file is growing).

I don't have a benchmarking suite for Lucene, so I can't accurately
evaluate the speed of this implementation. I tested it on a small
application I am developing and it seems to work well, but I think my
test are not significative. Of course only the searching feature is
expected to be faster, since the index writing is unchanged.

Francesco

-
Francesco Bellomi
"Use truth to show illusion,
and illusion to show truth."



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: lucene handling different document formats

2003-07-07 Thread Wilton, Reece
The Lucene FAQ on Java Guru gives some hints on this:
http://www.jguru.com/faq/Lucene

-Original Message-
From: Maurice Coyle [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 07, 2003 9:07 AM
To: [EMAIL PROTECTED]
Subject: lucene handling different document formats


could anyone tell me if there's some sort of repository somewhere that
contains parsers for document types such as .doc, .pdf, .xls?  or how
i'd begin to go about thinking to write one (tutorials etc much
appreciated)
 
thanks,
maurice


 
IncrediMail - Email has finally evolved - Click Here
  



Results sorted by date instead of score?

2003-07-02 Thread Wilton, Reece
Search hits come back ordered by score.  How do I get my results sorted
by the date of the article?  I have added the article date as a keyword
field.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Getting an exact field match

2003-07-02 Thread Wilton, Reece
Hi,

I am indexing XML files.  The XML files have a Location element.  For
example, the Location is /Foo/Bar.html in one of the files.

When I update the index, I want to remove the existing document.  I
search for the Location and delete the existing document like this:

Query query = QueryParser.parse(location, "LOCATION", new
StandardAnalyzer());
Hits hits = searcher.search(query);
for (int i = 0; i < hits.length(); i++) {
indexReader.delete(hits.id(i));
}

But I never get anything returned from the searcher.  I'm passing in the
exact value that is in the field.  How do I get an exact match of the
field?  Should I be adding Location as Text or Keyword?  I've tried both
but can't get it to return what I want.

Is the problem because I have slashes ("/") in the field?  Does the
StandardAnalyzer filter those out or something?

Any help is appreciated!
Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]