Re: rc4 and FileNotFoundException: an update
Hi Petite, SZFinder.findObjectsWithSpecificationInStore: java.io.FileNotFoundException: _2.f14 (Too many open files) I don't know what environment you're using Lucene in. However, we had this too many open files problem on our Solaris box, and increasing the number of file descriptors through the ulimit -n command fixed it. regards, Julian -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: rc4 and FileNotFoundException: an update
I don't know what environment you're using Lucene in. However, we had this too many open files problem on our Solaris box, and increasing the number of file descriptors through the ulimit -n command fixed it. Thanks. That should help. However, I have a little desktop app and it will be very cumbersome to require users to change some system parameters just to run it... :-( Thanks in any case. PA -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: too many open files in system
On Tuesday, 9. April 2002 14:08, you wrote: root wrote: Doesn't Lucene releases the filehandles?? because I get too many open files in system after running lucene a while! Are you closing the readers and writers after you've finished using them? cheers, Chris Yes I close the readers and writers! By the way, did you ever solved this problem? I want through that thread and everybody seem to be passing the buck to somebody else... :-( PA. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Italian web sites
The first one. Bye Laura What does it mean? Italian website can be: - site that use italian language - site owned by an italian organization - site hosted in a italian geographical site Every definition has a different solution. Date sent:Wed, 24 Apr 2002 11:02:32 +0200 From: [EMAIL PROTECTED] [EMAIL PROTECTED] Subject: Italian web sites To: [EMAIL PROTECTED] Send reply to:Lucene Users List lucene- [EMAIL PROTECTED] Hi all, I'm using Jobo for spidering web sites and lucene for indexing. The problem is that I'd like spidering only Italian web sites. How can I see discover the country of a web site? Dou you know some method that tou can suggest me? Thanks Laura -- Marco Ferrante ([EMAIL PROTECTED]) CSITA (Centro Servizi Informatici e Telematici d'Ateneo) Università degli Studi di Genova - Italy Via Brigata Salerno, ponte - 16147 Genova tel (+39) 0103532621 (interno tel. 2621) -- -- To unsubscribe, e-mail: mailto:lucene-user- [EMAIL PROTECTED] For additional commands, e-mail: mailto:lucene-user- [EMAIL PROTECTED]
Re: FileNotFoundException: code example
I would add some logging to the code You lost me here... Where should I add some logging? to get more idea of which Lucene methods are actually being called, when, in what sequence. I typical sequence looks like that: - search() - deleteIndexWithID() - indexValuesWithID() PA -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
[Off-List] Too Many Open Files
Heya Folks... Julian (sitting in front of me and looking bad... hi Jules) told me that one of you guys had a problem with Lucene and a Too Many Files Open exception... Reading back from the archives, I found this: http://nagoya.apache.org:8080/eyebrowse/ReadMsg?listName=lucene-user@jakart a.apache.orgmsgNo=1348 Petite, I just doublechecked on my OS/X box (well, the one I'm writing you from). Definitely a ulimit problem (number of file descriptors accessable by a single process on the system). If you run ulimit -n, you'll see that the maximum number of file descriptors usable by a single process is set to 256. You can increase it by issuing ulimit -n 512 (for example). You can set it up easily into a shell script launching your application. If (for instance) you're building an application run with java -jar ... or using the Cocoa Java framework, there might be a couple of OS/X specific tricks that might be worth exploiting. Anyway my best recommendation is to use lsof when you get one of those IOExceptions: first of all be sure to catch it so that the JVM won't crash when you get it, then under MacOS/X you can use the lsof command to see what files you have opened: find out your Java VM process number (use ps) and do an lsof -p PID where PID is the process number of your VM... This will tell you WHAT files you actually have opened, and it'll help you keeping your operating resources low (you sure you closed all files you don't need to use?)... Well, that's my .2c... Sorry, I'm not subbed to this list, so, keep me posted at [EMAIL PROTECTED] if you need some more help... Pier -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: too many open files in system
how many open files you think can be used at your process?? Not sure. It varies with usage pattern. I will check it out in any case. cat /proc/sys/fs/file-max cat: /proc/sys/fs/file-max: No such file or directory echo 5 /proc/sys/fs/file-max Unfortunately, I cannot use this kind of quick fix as my app is a desktop app and can access the user account only. PA. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: FileNotFoundException: code example
Hi petite, I will try to be brief... In lucene the number of files created depends on the number of fields the document has so lets take an example you want to index 100 files if each file contains 10 fields document.add(Field.Text(UNIQUE_ID, 12345678)) ... ... ... document.add(new Field(UNIQUE_TYPE, x, true, true, false)); document.add(new Field(PATH, c:\xxx\yyy\zzz.doc, true, true, false)); if in all the 100 documents, if all the 10 fields created have their field's key or name (ie UNIQUE_ID, UNIQUE_TYPE, PATH)the same then the number of files created by lucene remains under control. (MIND YOU the values of the fields can be different) Say for the above scenerio if the number of index files created are about 80(for 100 documents with 10 field's each), If you add another million documents with same 10 fields the number of index files would remain the same it would not create any more _f12 , _xxx files. In contrast say for the same number of documents if you create 10 fields that are different for different documents like for the first document if you create a field like document.add(new Field(Doc_PATH_1, c:\xxx\yyy\zzz.doc, true, true, false)); and document.add(new Field(Doc_PATH_2, c:\xxx\yyy\zzz.doc, true, true, false)); for second document. I think for each new field that is created about 3 files are created in index directory so you would end up having 1000's of files in index directory which would cause the Too many files opened problem. And i think you dont have to be bothered about which OS you are using. Hope this helps... -Jaggi petite_abeille wrote: Hello again, attached is the source code of the only class interacting directly with Lucene in my app. Sorry for not providing a complete test case as it's hard for me to come up with something self contained. Maybe there is something that's obviously wrong in what I'm doing. Thanks for any help. PA // // === // // Title: SZIndex.java // Description:[Description] // Author: Raphael Szwarc [EMAIL PROTECTED] // Creation Date: Wed Sep 12 2001 // Legal: Copyright (C) 2001 Raphael Szwarc. All Rights Reserved. // // --- // package alt.dev.szobject; import com.lucene.store.Directory; import com.lucene.store.FSDirectory; import com.lucene.store.RAMDirectory; import com.lucene.document.Field; import com.lucene.document.DateField; import com.lucene.document.Document; import com.lucene.analysis.Analyzer; import com.lucene.analysis.standard.StandardAnalyzer; import com.lucene.index.IndexWriter; import com.lucene.index.IndexReader; import com.lucene.index.Term; import com.lucene.search.IndexSearcher; import com.lucene.search.MultiSearcher; import com.lucene.search.Searcher; import com.lucene.search.Query; import com.lucene.search.Hits; import java.io.FilenameFilter; import java.io.File; import java.io.IOException; import java.util.Map; import java.util.Collection; import java.util.Date; import java.util.Iterator; import alt.dev.szfoundation.SZHexCoder; import alt.dev.szfoundation.SZDate; import alt.dev.szfoundation.SZSystem; import alt.dev.szfoundation.SZLog; final class SZIndex extends Object { // === // Constant(s) // --- private static final String Extension = .index; // === // Class variable(s) // --- private static final Filter _filter = new Filter(); // === // Instance variable(s) // --- private String _path = null; private transient File _directory = null; private transient Directory _indexDirectory = null; private transient IndexWriter _writer = null; private transient IndexReader _reader = null; private transient Searcher _searcher = null; private transient Directory _ramDirectory = null; private transient IndexWriter _ramWriter = null; private transient int _counter = 0; //
Re: rc4 and FileNotFoundException: an update
--- petite_abeille [EMAIL PROTECTED] wrote: I don't know what environment you're using Lucene in. The problem seems to be specially bad on osx (10.1.4 + JRE 1.3.1 + latest updates). Does this mean you tried it on other OSs and it worked? Which ones? What JDK did those have and what was their ulimit and what is the ulimit on your OSX machine? Just curious. Otis __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: rc4 and FileNotFoundException: an update
Does this mean you tried it on other OSs and it worked? Yes. Which ones? Win2k SP2 What JDK did those have jre 1.4.0 and what was their ulimit and what is the ulimit on your OSX machine? Just curious. I don't know. Does it matter? PA -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: FileNotFoundException: code example
Hello, I'll put my comments inline... --- petite_abeille [EMAIL PROTECTED] wrote: Hello again, attached is the source code of the only class interacting directly with Lucene in my app. Sorry for not providing a complete test case as it's hard for me to come up with something self contained. Maybe there is something that's obviously wrong in what I'm doing. Thanks for any help. PA // // === // //Title: SZIndex.java //Description:[Description] //Author: Raphael Szwarc [EMAIL PROTECTED] //Creation Date: Wed Sep 12 2001 //Legal: Copyright (C) 2001 Raphael Szwarc. All Rights Reserved. // // --- // package alt.dev.szobject; import com.lucene.store.Directory; import com.lucene.store.FSDirectory; import com.lucene.store.RAMDirectory; import com.lucene.document.Field; import com.lucene.document.DateField; import com.lucene.document.Document; import com.lucene.analysis.Analyzer; import com.lucene.analysis.standard.StandardAnalyzer; import com.lucene.index.IndexWriter; import com.lucene.index.IndexReader; import com.lucene.index.Term; import com.lucene.search.IndexSearcher; import com.lucene.search.MultiSearcher; import com.lucene.search.Searcher; import com.lucene.search.Query; import com.lucene.search.Hits; import java.io.FilenameFilter; import java.io.File; import java.io.IOException; import java.util.Map; import java.util.Collection; import java.util.Date; import java.util.Iterator; import alt.dev.szfoundation.SZHexCoder; import alt.dev.szfoundation.SZDate; import alt.dev.szfoundation.SZSystem; import alt.dev.szfoundation.SZLog; final class SZIndex extends Object { // === //Constant(s) // --- private static final String Extension = .index; // === //Class variable(s) // --- private static final Filter _filter = new Filter(); // === //Instance variable(s) // --- private String _path = null; private transient File _directory = null; private transient Directory _indexDirectory = null; private transient IndexWriter _writer = null; private transient IndexReader _reader = null; private transient Searcher _searcher = null; private transient Directory _ramDirectory = null; private transient IndexWriter _ramWriter = null; private transient int _counter = 0; // === //Constructor method(s) // --- private SZIndex() { super(); } // === //Class method(s) // --- static FilenameFilter filter() { return _filter; } static String stringByDeletingPathExtension(String aPath) { if ( aPath != null ) { int anIndex = aPath.lastIndexOf( SZIndex.Extension ); if ( anIndex 0 ) { aPath = aPath.substring( 0, anIndex ); } return aPath; } throw new IllegalArgumentException( SZIndex.stringByDeletingPathExtension: null path. ); } static SZIndex indexWithNameInDirectory(String aName, File aDirectory) { if ( aName != null ) { if ( aDirectory != null ) { String anEncodedName = SZHexCoder.encode( aName.getBytes() ); //StringaPath = aDirectory.getPath() + File.separator + anEncodedName + SZIndex.Extension + File.separator; String aPath = aDirectory.getPath() + File.separator + aName + SZIndex.Extension + File.separator; SZIndex anIndex = new SZIndex(); anIndex.setPath( aPath );
Re: rc4 and FileNotFoundException: an update
Hello, and what was their ulimit and what is the ulimit on your OSX machine? Just curious. I don't know. Does it matter? Of course it does - a low (u)limit is a part of your problem, perhaps. Otis P.S. I don't know how Winblows deals with file descriptors. Try your application on some other flavour of Unix, if possible. __ Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)
First of, thanks to Jagadesh Nandasamy who directed me to the right direction. It seems, that in my situation, more homogeneous indexes work better than fewer heterogeneous indexes: I have a dozen class that I'm indexing. They vary from two fields to more than a dozen field per document (aka object). I went through different indexing strategy with them (per class, per date, per root class, ... ) to see how it goes. In any case, while trying to use my stuff with rc4 I consolidated all my different class indexes into one root class index to see if I could reduce my resources consumption. Less indexes, less RandomAccessFile was the rational. Well, I was wrong. In fact the exact opposite seems to hold true: more -homogeneous- indexes use overall less RandomAccessFile than less -heterogeneous- indexes... One of those -not so obvious- thing you have to learn the hard way I guess... ;-) In any case, I would like to thanks again Jagadesh for his insight. Also thanks to Pier Fumagalli for pointing out LSOF. A very handy tool indeed. As a final note, several people suggested to increase the number of file descriptors per process with something like ulimit... From what I learned today, I think it's a *bad* idea to have to change some system parameters just because your/my app is written in such a way that it may run out of some system resources. Your/my app has to fit in the system. Hacking ulimit and/or other system parameters is just a quick patch that will -at best- delay dealing with the real problem that's usually one of design. Just my two cents. PA. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Lucene's scalability
Is there a known limit to the number of documents that Lucene can handle efficiently? I'm looking to index around 15 million, 2K docs which contain 7-10 searchable fields. Should I be attempting this with Lucene? Thanks, Joel
Re: Lucene's scalability
Great, Thanks for the quick response, I am very interested in hearing how lucene handles itself in the 15-20 million doc range. I will be doing some testing this week with lucene and will report my findings as well. I am also testing FAST and AltaVista and I will post some comparison details. I would be very happy to find that we did not need to buy a commercial engine because Lucene could do the job. Joel - Original Message - From: Armbrust, Daniel C. [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Monday, April 29, 2002 2:37 PM Subject: RE: Lucene's scalability I currently have an index of ~ 12 million documents, which are each about that size (but in xml form). When they are transformed for lucene to index, there are upwards of 50 searchable fields. The index is about 10 GB right now. I have not yet had any problems with pushing the limits of lucene. In the next few weeks, I will be pushing my number of indexed documents up into the 15-20 million range. I can let you know if any problems arise. Dan -Original Message- From: Joel Bernstein [mailto:[EMAIL PROTECTED]] Sent: Monday, April 29, 2002 1:32 PM To: [EMAIL PROTECTED] Subject: Lucene's scalability Is there a known limit to the number of documents that Lucene can handle efficiently? I'm looking to index around 15 million, 2K docs which contain 7-10 searchable fields. Should I be attempting this with Lucene? Thanks, Joel -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)
petite, On Mon, Apr 29, 2002 at 07:54:43PM +0200, petite_abeille wrote: As a final note, several people suggested to increase the number of file descriptors per process with something like ulimit... Just be glad you aren't doing this on Solaris with JDK 1.1.6, where I first ran into ulimit issues - back when I encountered this problem, the solaris default ulimit setting was 24 files, and JDK 1.1.6 reported the problem as an OutOfMemory error! Looks like things are improving :-). From what I learned today, I think it's a *bad* idea to have to change some system parameters just because your/my app is written in such a way that it may run out of some system resources. Your/my app has to fit in the system. Hacking ulimit and/or other system parameters is just a quick patch that will -at best- delay dealing with the real problem that's usually one of design. Yes and no. Setting ulimit to a reasonable number of open files is not only not a patch, it's the right way to do it. I understand where you're coming from, really, and in a certain way, it makes sense, BUT... sometimes the impulse for clean, good design takes you too far down a blind alley. Sometimes there is no elegant solution. Sometimes there is no best way, only one of a limited set of options with different tradeoffs. By definition, Lucene is an application that trades off up front CPU (for indexing) and file resources (for storage) for request-time speed. The OS's job is to manage resources, and open files are one of those resources. That's the tradeoff here, and it's reasonable and expected. Most serious applications have to have some sort of OS variable tweaking, you're just used to having it done invisibly and painlessly. That said, since you're working on a client/desktop application, not a server application, you need to think about ways to handle this: You could figure out the right way to set the system configuration on install or launch. You could look at the alternative techniques for indexing in Lucene, and see if any approaches there can help - for example, maybe doing a lot of the more intense indexing work in a RAMDirectory, then merging it into a normal file-based Directory. You could look more closely at what your application is doing, and see if there's anything you're doing wrong (perhaps opening files and not closing them, and leaving them for the garbage collector to eventually get around to closing?) or if you have a pessimal usage pattern that exacerbates the situation. You could take a closer look at the lucene indexing and file management stuff, and see if you can come up with a scheme to run Lucene indexing with modified code for keeping track of file resources. I'll bet Doug and the other developers would rather not add open-file managmeent as a main, permanent part of lucene, since it would add overhead to all uses of lucene just to deal with an anomalous situation (use on a client/desktop machine). But they might be interested in a way to offer it as an optional feature, where people using lucene in a constrained environment could configure lucene to be careful about how many files it keeps open at any given time. Steven J. Owens [EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]