Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread Julian_Mitchell



Hi Petite,

 SZFinder.findObjectsWithSpecificationInStore:
java.io.FileNotFoundException: _2.f14 (Too many open files)

I don't know what environment you're using Lucene in. However, we had this too
many open files problem on our Solaris box, and increasing the number of file
descriptors through the ulimit -n command fixed it.

regards, Julian



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread petite_abeille

 I don't know what environment you're using Lucene in. However, we had 
 this too
 many open files problem on our Solaris box, and increasing the number 
 of file
 descriptors through the ulimit -n command fixed it.

Thanks. That should help. However, I have a little desktop app and it 
will be very cumbersome to require users to change some system 
parameters just to run it... :-(

Thanks in any case.

PA


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: too many open files in system

2002-04-29 Thread petite_abeille

 On Tuesday, 9. April 2002 14:08, you wrote:
 root wrote:
 Doesn't Lucene releases the filehandles??

 because I get too many open files in system after running lucene a
 while!

 Are you closing the readers and writers after you've finished using 
 them?

 cheers,

 Chris


 Yes I close the readers and writers!


By the way, did you ever solved this problem? I want through that thread 
and everybody seem to be passing the buck to somebody else... :-(

PA.


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Italian web sites

2002-04-29 Thread [EMAIL PROTECTED]

The first one.

Bye Laura


 What does it mean? Italian website can be:
   - site that use italian language
   - site owned by an italian organization
   - site hosted in a italian geographical site
 Every definition has a different solution.
 
 Date sent:Wed, 24 Apr 2002 11:02:32 +0200
 From: [EMAIL PROTECTED] [EMAIL PROTECTED]
 Subject:  Italian web sites
 To:   [EMAIL PROTECTED]
 Send reply to:Lucene Users List lucene-
[EMAIL PROTECTED]
 
  Hi all,
 
  I'm using Jobo for spidering web sites and lucene for indexing. The
  problem is that I'd like spidering only Italian web sites.
  How can I see discover the country of a web site?
 
  Dou you know some method that tou can suggest me?
 
  Thanks
 
 
  Laura
 
 
 
 --
 Marco Ferrante ([EMAIL PROTECTED])
 CSITA (Centro Servizi Informatici e Telematici d'Ateneo)
 Università degli Studi di Genova - Italy
 Via Brigata Salerno, ponte - 16147 Genova
 tel (+39) 0103532621 (interno tel. 2621)
 --
 
 
 --
 To unsubscribe, e-mail:   mailto:lucene-user-
[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:lucene-user-
[EMAIL PROTECTED]
 
 


Re: FileNotFoundException: code example

2002-04-29 Thread petite_abeille

  I would add some logging to the code

You lost me here... Where should I add some logging?

  to get more idea of which Lucene methods are
 actually being called, when, in what sequence.

I typical sequence looks like that:

- search()
- deleteIndexWithID()
- indexValuesWithID()

PA


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




[Off-List] Too Many Open Files

2002-04-29 Thread Pier Fumagalli

Heya Folks...

Julian (sitting in front of me and looking bad... hi Jules) told me that
one of you guys had a problem with Lucene and a Too Many Files Open
exception... Reading back from the archives, I found this:

http://nagoya.apache.org:8080/eyebrowse/ReadMsg?listName=lucene-user@jakart
a.apache.orgmsgNo=1348

Petite, I just doublechecked on my OS/X box (well, the one I'm writing you
from). Definitely a ulimit problem (number of file descriptors accessable by
a single process on the system).

If you run ulimit -n, you'll see that the maximum number of file
descriptors usable by a single process is set to 256. You can increase it by
issuing ulimit -n 512 (for example). You can set it up easily into a shell
script launching your application. If (for instance) you're building an
application run with java -jar ... or using the Cocoa Java framework,
there might be a couple of OS/X specific tricks that might be worth
exploiting.

Anyway my best recommendation is to use lsof when you get one of those
IOExceptions: first of all be sure to catch it so that the JVM won't crash
when you get it, then under MacOS/X you can use the lsof command to see
what files you have opened: find out your Java VM process number (use ps)
and do an lsof -p PID where PID is the process number of your VM...

This will tell you WHAT files you actually have opened, and it'll help you
keeping your operating resources low (you sure you closed all files you
don't need to use?)...

Well, that's my .2c... Sorry, I'm not subbed to this list, so, keep me
posted at [EMAIL PROTECTED] if you need some more help...

Pier


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: too many open files in system

2002-04-29 Thread petite_abeille

 how many open files you think can be used at your process??

Not sure. It varies with usage pattern. I will check it out in any case.

 cat /proc/sys/fs/file-max

cat: /proc/sys/fs/file-max: No such file or directory

 echo 5  /proc/sys/fs/file-max

Unfortunately, I cannot use this kind of quick fix as my app is a 
desktop app and can access the user account only.

PA.


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: FileNotFoundException: code example

2002-04-29 Thread Jagadesh Nandasamy

Hi petite,
I will try to be brief...
In lucene the number of files created depends on the number of 
fields the document has
so lets take an example you want to index 100 files
if each file contains 10 fields
 document.add(Field.Text(UNIQUE_ID, 12345678))
 ...
 ...
 ...
 document.add(new Field(UNIQUE_TYPE, x, true, 
true, false));
 document.add(new Field(PATH, c:\xxx\yyy\zzz.doc, true, 
true, false));

if in all the 100 documents, if all the 10 fields created have 
their field's key or name
(ie UNIQUE_ID, UNIQUE_TYPE, PATH)the same then the number of 
files created by lucene
remains under control. (MIND YOU the values of the fields can be 
different)
Say for the above scenerio if the number of index files created 
are about 80(for 100 documents
with 10 field's each), If you add another million documents with 
same 10 fields the number
of index files would remain the same it would not create any 
more _f12 , _xxx files.
   
In contrast say for the same number of documents if you create 
10 fields that are different
for different documents like for the first document if you 
create a field like
   
document.add(new Field(Doc_PATH_1, c:\xxx\yyy\zzz.doc, 
true, true, false));
   
and
document.add(new Field(Doc_PATH_2, c:\xxx\yyy\zzz.doc, 
true, true, false));
for second document.   
   
I think for each new field that is created about 3 files are 
created in index directory
so you would end up having 1000's of files in index directory 
which would cause the
Too many files opened problem.
   
And i think you dont have to be bothered about which OS you are 
using.
   
   
Hope this helps...
   
-Jaggi
   


petite_abeille wrote:

 Hello again,

 attached is the source code of the only class interacting directly 
 with Lucene in my app. Sorry for not providing a complete test case as 
 it's hard for me to come up with something self contained. Maybe there 
 is something that's obviously wrong in what I'm doing.

 Thanks for any help.

 PA




//
// ===
//
// Title:  SZIndex.java
// Description:[Description]
// Author: Raphael Szwarc [EMAIL PROTECTED]
// Creation Date:  Wed Sep 12 2001
// Legal:  Copyright (C) 2001 Raphael Szwarc. All Rights Reserved.
//
// ---
//

package alt.dev.szobject;

import com.lucene.store.Directory;
import com.lucene.store.FSDirectory;
import com.lucene.store.RAMDirectory;
import com.lucene.document.Field;
import com.lucene.document.DateField;
import com.lucene.document.Document;
import com.lucene.analysis.Analyzer;
import com.lucene.analysis.standard.StandardAnalyzer;
import com.lucene.index.IndexWriter;
import com.lucene.index.IndexReader;
import com.lucene.index.Term;
import com.lucene.search.IndexSearcher;
import com.lucene.search.MultiSearcher;
import com.lucene.search.Searcher;
import com.lucene.search.Query;
import com.lucene.search.Hits;

import java.io.FilenameFilter;
import java.io.File;
import java.io.IOException;

import java.util.Map;
import java.util.Collection;
import java.util.Date;
import java.util.Iterator;

import alt.dev.szfoundation.SZHexCoder;
import alt.dev.szfoundation.SZDate;
import alt.dev.szfoundation.SZSystem;
import alt.dev.szfoundation.SZLog;

final class SZIndex extends Object
{

// ===
// Constant(s)
// ---

   private static final String Extension = .index;

// ===
// Class variable(s)
// ---

   private static final Filter _filter = new Filter();

// ===
// Instance variable(s)
// ---

   private String  _path = null;
   private transient File  _directory = null;
   private transient Directory _indexDirectory = null;
   private transient IndexWriter   _writer = null;
   
   private transient IndexReader   _reader = null;
   private transient Searcher  _searcher = null;

   private transient Directory _ramDirectory = null;
   private transient IndexWriter   _ramWriter = null;
   private transient int   _counter = 0;

// 

Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread Otis Gospodnetic


--- petite_abeille [EMAIL PROTECTED] wrote:
  I don't know what environment you're using Lucene in.
 
 The problem seems to be specially bad on osx (10.1.4 + JRE 1.3.1 + 
 latest updates).

Does this mean you tried it on other OSs and it worked?
Which ones?
What JDK did those have and what was their ulimit and what is the
ulimit on your OSX machine?
Just curious.

Otis


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread petite_abeille

 Does this mean you tried it on other OSs and it worked?

Yes.

 Which ones?

Win2k SP2

 What JDK did those have

jre 1.4.0

  and what was their ulimit and what is the
 ulimit on your OSX machine?
 Just curious.

I don't know. Does it matter?

PA


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: FileNotFoundException: code example

2002-04-29 Thread Otis Gospodnetic

Hello,

I'll put my comments inline...

--- petite_abeille [EMAIL PROTECTED] wrote:
 Hello again,
 
 attached is the source code of the only class interacting directly
 with 
 Lucene in my app. Sorry for not providing a complete test case as
 it's 
 hard for me to come up with something self contained. Maybe there is 
 something that's obviously wrong in what I'm doing.
 
 Thanks for any help.
 
 PA
 
  //
 //

===
 //
 //Title:  SZIndex.java
 //Description:[Description]
 //Author: Raphael Szwarc [EMAIL PROTECTED]
 //Creation Date:  Wed Sep 12 2001
 //Legal:  Copyright (C) 2001 Raphael Szwarc. All Rights Reserved.
 //
 //

---
 //
 
 package alt.dev.szobject;
 
 import com.lucene.store.Directory;
 import com.lucene.store.FSDirectory;
 import com.lucene.store.RAMDirectory;
 import com.lucene.document.Field;
 import com.lucene.document.DateField;
 import com.lucene.document.Document;
 import com.lucene.analysis.Analyzer;
 import com.lucene.analysis.standard.StandardAnalyzer;
 import com.lucene.index.IndexWriter;
 import com.lucene.index.IndexReader;
 import com.lucene.index.Term;
 import com.lucene.search.IndexSearcher;
 import com.lucene.search.MultiSearcher;
 import com.lucene.search.Searcher;
 import com.lucene.search.Query;
 import com.lucene.search.Hits;
 
 import java.io.FilenameFilter;
 import java.io.File;
 import java.io.IOException;
 
 import java.util.Map;
 import java.util.Collection;
 import java.util.Date;
 import java.util.Iterator;
 
 import alt.dev.szfoundation.SZHexCoder;
 import alt.dev.szfoundation.SZDate;
 import alt.dev.szfoundation.SZSystem;
 import alt.dev.szfoundation.SZLog;
 
 final class SZIndex extends Object
 {
 
 //

===
 //Constant(s)
 //

---
 
   private static final String Extension = .index;
 
 //

===
 //Class variable(s)
 //

---
 
   private static final Filter _filter = new Filter();
 
 //

===
 //Instance variable(s)
 //

---
 
   private String  _path = null;
   private transient File  _directory = null;
   private transient Directory _indexDirectory = null;
   private transient IndexWriter   _writer = null;
   
   private transient IndexReader   _reader = null;
   private transient Searcher  _searcher = null;
 
   private transient Directory _ramDirectory = null;
   private transient IndexWriter   _ramWriter = null;
   private transient int   _counter = 0;
 
 //

===
 //Constructor method(s)
 //

---
 
   private SZIndex()
   {
   super();
   }
 
 //

===
 //Class method(s)
 //

---
 
   static FilenameFilter filter()
   {
   return _filter;
   }
   
   static String stringByDeletingPathExtension(String aPath)
   {
   if ( aPath != null )
   {
   int anIndex = aPath.lastIndexOf( SZIndex.Extension );
   
   if ( anIndex  0 )
   {
   aPath = aPath.substring( 0, anIndex );
   }
   
   return aPath;
   }
   
   throw new IllegalArgumentException(
 SZIndex.stringByDeletingPathExtension: null path. );
   }
 
   static SZIndex indexWithNameInDirectory(String aName, File
 aDirectory)
   {
   if ( aName != null )
   {
   if ( aDirectory != null )
   {
   String  anEncodedName = SZHexCoder.encode( 
aName.getBytes() );
   //StringaPath = aDirectory.getPath() + 
File.separator +
 anEncodedName + SZIndex.Extension + File.separator;
   String  aPath = aDirectory.getPath() + File.separator 
+ aName +
 SZIndex.Extension + File.separator;
   SZIndex anIndex = new SZIndex();
   
   anIndex.setPath( aPath );
   
   

Re: rc4 and FileNotFoundException: an update

2002-04-29 Thread Otis Gospodnetic

Hello,

   and what was their ulimit and what is the
  ulimit on your OSX machine?
  Just curious.
 
 I don't know. Does it matter?

Of course it does - a low (u)limit is a part of your problem, perhaps.

Otis
P.S.
I don't know how Winblows deals with file descriptors.  Try your
application on some other flavour of Unix, if possible.


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-04-29 Thread petite_abeille

First of, thanks to Jagadesh Nandasamy who directed me to the right 
direction.

It seems, that in my situation, more homogeneous indexes work better 
than fewer heterogeneous indexes:

I have a dozen class that I'm indexing. They vary from two fields to 
more than a dozen field per document (aka object). I went through 
different indexing strategy with them (per class, per date, per root 
class, ... ) to see how it goes. In any case, while trying to use my 
stuff with rc4 I consolidated all my different class indexes into one 
root class index to see if I could reduce my resources consumption. Less 
indexes, less RandomAccessFile was the rational. Well, I was wrong. In 
fact the exact opposite seems to hold true: more -homogeneous- indexes 
use overall less RandomAccessFile than less -heterogeneous- indexes...

One of those -not so obvious- thing you have to learn the hard way I 
guess... ;-)

In any case, I would like to thanks again Jagadesh for his insight. Also 
thanks to Pier Fumagalli for pointing out LSOF. A very handy tool 
indeed.

As a final note, several people suggested to increase the number of file 
descriptors per process with something like ulimit... From what I 
learned today, I think it's a *bad* idea to have to change some system 
parameters just because your/my app is written in such a way that it may 
run out of some system resources. Your/my app has to fit in the system. 
Hacking ulimit and/or other system parameters is just a quick patch 
that will -at best- delay dealing with the real problem that's usually 
one of design.

Just my two cents.

PA.



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Lucene's scalability

2002-04-29 Thread Joel Bernstein

Is there a known limit to the number of documents that Lucene can handle
efficiently?  I'm looking to index around 15 million, 2K docs which contain
7-10 searchable fields. Should I be attempting this with Lucene?

Thanks,

Joel




Re: Lucene's scalability

2002-04-29 Thread Joel Bernstein

Great,

Thanks for the quick response, I am very interested in hearing
how lucene handles itself in the 15-20 million doc range.

I will be doing some testing this week with lucene and will report my
findings as
well.  I am also testing FAST and AltaVista and I will post some
comparison details.  I would be very happy to find that we did not need to
buy a commercial engine because Lucene could do the job.

Joel

- Original Message -
From: Armbrust, Daniel C. [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Monday, April 29, 2002 2:37 PM
Subject: RE: Lucene's scalability


 I currently have an index of ~ 12 million documents, which are each about
 that size (but in xml form).

 When they are transformed for lucene to index, there are upwards of 50
 searchable fields.

 The index is about 10 GB right now.

 I have not yet had any problems with pushing the limits of lucene.

 In the next few weeks, I will be pushing my number of indexed documents up
 into the 15-20 million range.  I can let you know if any problems arise.

 Dan



 -Original Message-
 From: Joel Bernstein [mailto:[EMAIL PROTECTED]]
 Sent: Monday, April 29, 2002 1:32 PM
 To: [EMAIL PROTECTED]
 Subject: Lucene's scalability


 Is there a known limit to the number of documents that Lucene can handle
 efficiently?  I'm looking to index around 15 million, 2K docs which
contain
 7-10 searchable fields. Should I be attempting this with Lucene?

 Thanks,

 Joel


 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)

2002-04-29 Thread Steven J. Owens

petite,

On Mon, Apr 29, 2002 at 07:54:43PM +0200, petite_abeille wrote:
 As a final note, several people suggested to increase the number of file 
 descriptors per process with something like ulimit...

 Just be glad you aren't doing this on Solaris with JDK 1.1.6,
where I first ran into ulimit issues - back when I encountered this
problem, the solaris default ulimit setting was 24 files, and JDK
1.1.6 reported the problem as an OutOfMemory error!  Looks like
things are improving :-).

 From what I learned today, I think it's a *bad* idea to have to
 change some system parameters just because your/my app is written in
 such a way that it may run out of some system resources. Your/my app
 has to fit in the system.  Hacking ulimit and/or other system
 parameters is just a quick patch that will -at best- delay dealing
 with the real problem that's usually one of design.

 Yes and no.  Setting ulimit to a reasonable number of open files
is not only not a patch, it's the right way to do it.  I understand
where you're coming from, really, and in a certain way, it makes
sense, BUT... sometimes the impulse for clean, good design takes you
too far down a blind alley.  Sometimes there is no elegant solution.
Sometimes there is no best way, only one of a limited set of options
with different tradeoffs.

 By definition, Lucene is an application that trades off up front
CPU (for indexing) and file resources (for storage) for request-time
speed.  The OS's job is to manage resources, and open files are one of
those resources.  That's the tradeoff here, and it's reasonable and
expected.  Most serious applications have to have some sort of OS
variable tweaking, you're just used to having it done invisibly and
painlessly.

 That said, since you're working on a client/desktop application,
not a server application, you need to think about ways to handle this:

 You could figure out the right way to set the system
configuration on install or launch.

 You could look at the alternative techniques for indexing in
Lucene, and see if any approaches there can help - for example, maybe
doing a lot of the more intense indexing work in a RAMDirectory, then
merging it into a normal file-based Directory.

 You could look more closely at what your application is doing,
and see if there's anything you're doing wrong (perhaps opening files
and not closing them, and leaving them for the garbage collector to
eventually get around to closing?) or if you have a pessimal usage
pattern that exacerbates the situation.

 You could take a closer look at the lucene indexing and file
management stuff, and see if you can come up with a scheme to run
Lucene indexing with modified code for keeping track of file
resources. 

 I'll bet Doug and the other developers would rather not add
open-file managmeent as a main, permanent part of lucene, since it
would add overhead to all uses of lucene just to deal with an
anomalous situation (use on a client/desktop machine).  But they might
be interested in a way to offer it as an optional feature, where
people using lucene in a constrained environment could configure
lucene to be careful about how many files it keeps open at any given
time.

Steven J. Owens
[EMAIL PROTECTED]

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]