Re: rc4 and FileNotFoundException: an update

2002-04-26 Thread petite_abeille

> Have you posted code that demonstrates this problem? If so I missed it.

Thanks for your help.

PA.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




xml parsing examples

2002-04-26 Thread Aruna Raghavan

Hi,
I have a couple of examples of parsing .xml file using SAX/DOM from my code
that uses lucene for indexing. Can I submit these somewhere? Please let me
know.
Aruna.

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: rc4 and FileNotFoundException: an update

2002-04-26 Thread Ian Lea

Have you posted code that demonstrates this problem?
If so I missed it.  If you send, to this list, the
shortest program you can come up with that demonstrates
the problem there is a fair chance that someone may
spot something.  I, and many others, use that release
of Lucene to index far more than 16 objects so I think
that at this stage the assumption has to be that the
problem lies with your code.


--
Ian.
[EMAIL PROTECTED]


> [EMAIL PROTECTED] (petite_abeille) wrote 
>
> Hello again,
> 
> I guess it's really not my day...
> 
> Just to make sure I'm not hallucinating to much, I downloaded the latest 
> and greatest: rc4. Changed all the packages names to org.apache. Updated 
> a method here and there to reflect the APIs changes. And run my little 
> app. I would like to emphasize that except updating to the latest Lucene 
> release, nothing else has changed.
> 
> Well, it's pretty ugly. Whatever I'm doing with Lucene in the previous 
> package (com.lucene) is magnified many folds in rc4. After processing a 
> paltry 16 objects I got:
> 
> "SZFinder.findObjectsWithSpecificationInStore: 
> java.io.FileNotFoundException: _2.f14 (Too many open files)"
> 
> At least in the previous version, I will see that after a couple of 
> thousand of objects...
> 
> So, it seems, that there is something really rotten in the kingdom of 
> Denmark...
> 
> Any help much appreciated.
> 
> Thanks.

--
Searchable personal storage and archiving from http://www.digimem.net/



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 


rc4 and FileNotFoundException: an update

2002-04-26 Thread petite_abeille

Hello again,

I guess it's really not my day...

Just to make sure I'm not hallucinating to much, I downloaded the latest 
and greatest: rc4. Changed all the packages names to org.apache. Updated 
a method here and there to reflect the APIs changes. And run my little 
app. I would like to emphasize that except updating to the latest Lucene 
release, nothing else has changed.

Well, it's pretty ugly. Whatever I'm doing with Lucene in the previous 
package (com.lucene) is magnified many folds in rc4. After processing a 
paltry 16 objects I got:

"SZFinder.findObjectsWithSpecificationInStore: 
java.io.FileNotFoundException: _2.f14 (Too many open files)"

At least in the previous version, I will see that after a couple of 
thousand of objects...

So, it seems, that there is something really rotten in the kingdom of 
Denmark...

Any help much appreciated.

Thanks.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Otis Gospodnetic

Hello,

> > There is no tool to detect index corruption, fixing of indexing,
> nor
> > index rebuilding.
> > The last one anyone can/has to do on their own.
> 
> :-( Well, that *very* sad to say the least... How do I know if my 
> indexes are not corrupted even if everything seems to be working
> fine? 
> Don't tell me I'm the first one to run into this kind of issues?!?
> How 
> can I "trust" an index if there is *no* way of checking its
> integrity? 
> And even if you happen to notice that something is fishy, there is no
> 
> way to rebuild the index -short or re-indexing everything from
> scratch? 
> That does not sound like a very "healthy" situation to me. "Fragile" 
> will be kind for describing it...

Yes, that's all unfortunate.  If you come up with anything, please
share it.  Or, you can use Lucene Sandbox and develop stuff there.

> > I've seen people asking about this on the list, but I never
> encountered
> > this particular exception.
> 
> Lucky you...

:)

> > Maybe it's not a Lucene issue then, although I've seen this
> mentioned
> > so often, which means that documentation could be improved to
> prevent
> > people from making the same mistakes that others have already made.
> 
> Maybe, maybe not. And most likely I'm doing something odd. In any
> case, 
> could you point me to the "mistakes that others have already made"?
> Or 
> did I miss something obvious here?

Nah, the only thing I can suggest is check the lists' archives, that is
where mistakes of others would be recorded.

Otis


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

Hello again,

> There is no tool to detect index corruption, fixing of indexing, nor
> index rebuilding.
> The last one anyone can/has to do on their own.

:-( Well, that *very* sad to say the least... How do I know if my 
indexes are not corrupted even if everything seems to be working fine? 
Don't tell me I'm the first one to run into this kind of issues?!? How 
can I "trust" an index if there is *no* way of checking its integrity? 
And even if you happen to notice that something is fishy, there is no 
way to rebuild the index -short or re-indexing everything from scratch? 
That does not sound like a very "healthy" situation to me. "Fragile" 
will be kind for describing it...

> I've seen people asking about this on the list, but I never encountered
> this particular exception.

Lucky you...

> Maybe it's not a Lucene issue then, although I've seen this mentioned
> so often, which means that documentation could be improved to prevent
> people from making the same mistakes that others have already made.

Maybe, maybe not. And most likely I'm doing something odd. In any case, 
could you point me to the "mistakes that others have already made"? Or 
did I miss something obvious here?

Thanks.

PA


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: FileNotFoundException: Too many open files

2002-04-26 Thread petite_abeille

Hi Otis,

> I looked only at your application's screenshots and based on that my
> guess is that you have a fairly high number of index fields, and if I
> recall correctly that can cause the above error.

Well, I used to have an index per class. And I have around a dozen 
classes that get indexed. When trying to switch to the latest rc (with 
the exact same code base), I ran into so many problems with the now 
infamous "FileNotFoundException" that I consolidated everything in one 
index per object store. And switched back to the com.lucene package that 
-as far as I can personally tell- is *much* more stable. I do not store 
the content of the objects in the index, just some uuid as Field.Keyword 
and other attributes as Field.UnStored. On average, there seem to be 
less than one hundred Lucene files per index.

> This was mentioned on the list once, too.
> I suggested using a shutdown hook in Runtime package, but then somebody
> responded with a drawback of that approach.

I have this one under control... Thanks.

> Not that I know.  If locking is getting in the way maybe you are not
> using Lucene properly.  I haven't downloaded your application yet, so I
> haven't had the chance to peek at the source.

Please feel free to do so... ;-)

> Yes, I believe so - I never encountered any problems with that.

Great. That was my assumption all along...

R.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Otis Gospodnetic

Morning,

> I'm starting to wander how "bullet proof" are Lucene indexes? Do they
> 
> get corrupted easely? If so is there a way to rebuild them?

There is no tool to detect index corruption, fixing of indexing, nor
index rebuilding.
The last one anyone can/has to do on their own.

> I'm started to get the following exception left and right...
> 
> "04/25 18:34:39 (Warning) Indexer.indexObjectWithValues: 
> java.io.IOException: _91.fnm already exists"

I've seen people asking about this on the list, but I never encountered
this particular exception.

> I build a little app (http://homepage.mac.com/zoe_info/) that uses 
> Lucene quiet extensively, and I would like to keep it that way.
> However, 
> I'm starting to have second thought about Lucene's reliability... :-(
> 
> I'm sure I'm doing something wrong somewhere, but I really cannot see
> 
> what...

Maybe it's not a Lucene issue then, although I've seen this mentioned
so often, which means that documentation could be improved to prevent
people from making the same mistakes that others have already made.

Otis


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: FileNotFoundException: Too many open files

2002-04-26 Thread Otis Gospodnetic

Hello,

> I'm running into this exception quiet often while using Lucene (the 
> situation is so bad with the latest rc, that I had to revert to the
> last 
> com.lucene package). I'm sure I have my fair share of bugs in my app,
> 
> but nonetheless, how can I "control" Lucene usage of
> RandomAccessFile? 
> The indexes are optimized and I try to keep a close look at how many 
> IndexWriter/Reader exists at any point in time... Nevertheless, I run
> 
> into that exception much too often :-( Any help appreciated!
> 
> "04/26 00:07:11 (Warning) Finder.findObjectsWithSpecificationInStore:
> 
> java.io.FileNotFoundException:  _la.f9 (Too many open files)"

I looked only at your application's screenshots and based on that my
guess is that you have a fairly high number of index fields, and if I
recall correctly that can cause the above error.
This was mentioned on one of the lists fairly recently, I believe.

> Also, on a somewhat related note, how do I "shut down" Lucene
> properly. 
> Eg, do I need to do anything with the IndexWriter and so on?

This was mentioned on the list once, too.
I suggested using a shutdown hook in Runtime package, but then somebody
responded with a drawback of that approach.

> Last, but not least, is there a way to turn of the file locking in
> the latest rc as it's really getting in the way :-(

Not that I know.  If locking is getting in the way maybe you are not
using Lucene properly.  I haven't downloaded your application yet, so I
haven't had the chance to peek at the source.

> Finally, I just wanted to make sure: Lucene is fully multi-threaded 
> right? I can do search *and* write concurrently in different threads
> at the same time on the same index?

Yes, I believe so - I never encountered any problems with that.

> BTW, should I post this kind of question to user or dev?

I suggest -user until/unless we determine that there is something in
Lucene that we can fix or improve.

Otis


__
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

> also the RAMDir must be kept in memory while indexing and merging, so 
> checking
> the systems free memory is easier that trying to calculate 
> memoryusage

I see... I don't deal with XML so I guess I have a better grasp on the 
memory requirements of my objects. In any case, I'm afraid I might be 
abusing Lucene a bit, as build a kind of oodbms on top of it... Oh, 
well...

Thanks for your help.

PA.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

> ah, now i see, what i have is a server with 512mb of ram, so i have 
> used two
> different approaches and both works ok;

Thanks a lot! I will give it a try...

PA.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Karl Øie

forgot this:

its a bit hard to determine a good number of balance while indexing XML 
documents because the internal relations of a DOM can make a XML document 
become nearly 21 times as big in memory compared to disk (i am not lying, i 
have seen it my self)...

also the RAMDir must be kept in memory while indexing and merging, so checking 
the systems free memory is easier that trying to calculate memoryusage

mvh karl øie



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Karl Øie

ah, now i see, what i have is a server with 512mb of ram, so i have used two 
different approaches and both works ok;

1 - i index a fixed number of documents into a RAMDir, like 10 (each of the 
docs are xml docs about 1,5-2mb) and then i optimize the RAMDir and merge it 
into the FSDir and then optimize the FSDir...

2 - i use the Runtime.freeMemory() and Runtime.totalMemory() to see if i have 
reached more than 80% of the available memory, if so i optimize the RAMDir, 
merge it and optimize the FSDir..., if not i just add more documents to the 
RAMDir

as far as i have tested i have never experienced a failure while merging a 
RAMDir into a FSDir regardless of size, so it's my systems memory that is the 
problem

mvh karl øie


On Friday 26 April 2002 15:33, petite_abeille wrote:
> >> Thanks. What's is your heuristic to flush the RAMDirectory?
> >
> > please explain this because i don't understand english that good :-(
>
> That's ok, I don't really understand English either :-)
>
> Simply put, when do you "flush" the RAMDirectory into the FSDirectory?
> Every five documents? Ten? A thousand? What is a good balance between
> RAM and FS?
>
> Thanks.
>
> PA.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

>> Thanks. What's is your heuristic to flush the RAMDirectory?
> please explain this because i don't understand english that good :-(

That's ok, I don't really understand English either :-)

Simply put, when do you "flush" the RAMDirectory into the FSDirectory? 
Every five documents? Ten? A thousand? What is a good balance between 
RAM and FS?

Thanks.

PA.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Karl Øie

that is a great problem with lucene as it uses a FSDir to store it has no 
sence of transaction handling, for critical indexes i serialize a RAMdir to a 
database blob, so i can performe a rollback if needed, but this is a enourmos 
overhead

> Thanks. What's is your heuristic to flush the RAMDirectory? 
please explain this because i don't understand english that good :-(

mvh karl øie

On Friday 26 April 2002 14:23, petite_abeille wrote:
> > using a RAMDir as a middle man solved my problems...
>
> Thanks. What's is your heuristic to flush the RAMDirectory? Also how do
> you deal with System.exit() or application death? Eg, your are indexing
> something and the application dies or is killed.
>
> Thanks for any input.
>
> R.


package org.apache.lucene.store;

/* 
 * The Apache Software License, Version 1.1
 *
 * Copyright (c) 2001 The Apache Software Foundation.  All rights
 * reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * 1. Redistributions of source code must retain the above copyright
 *notice, this list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form must reproduce the above copyright
 *notice, this list of conditions and the following disclaimer in
 *the documentation and/or other materials provided with the
 *distribution.
 *
 * 3. The end-user documentation included with the redistribution,
 *if any, must include the following acknowledgment:
 *   "This product includes software developed by the
 *Apache Software Foundation (http://www.apache.org/)."
 *Alternately, this acknowledgment may appear in the software itself,
 *if and wherever such third-party acknowledgments normally appear.
 *
 * 4. The names "Apache" and "Apache Software Foundation" and
 *"Apache Lucene" must not be used to endorse or promote products
 *derived from this software without prior written permission. For
 *written permission, please contact [EMAIL PROTECTED]
 *
 * 5. Products derived from this software may not be called "Apache",
 *"Apache Lucene", nor may "Apache" appear in their name, without
 *prior written permission of the Apache Software Foundation.
 *
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
 * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 * 
 *
 * This software consists of voluntary contributions made by many
 * individuals on behalf of the Apache Software Foundation.  For more
 * information on the Apache Software Foundation, please see
 * .
 */

import java.io.Serializable;
import java.io.IOException;
import java.util.Vector;
import java.util.Hashtable;
import java.util.Enumeration;

import org.apache.lucene.store.Directory;
import org.apache.lucene.store.InputStream;
import org.apache.lucene.store.OutputStream;

//--

final public class SerializableRAMDirectory extends Directory implements Serializable {

	Hashtable files = new Hashtable();

	public SerializableRAMDirectory() {
		}

	/** Returns an array of strings, one for each file in the directory. */
	public final String[] list() {
		String[] result = new String[files.size()];
		int i = 0;
		Enumeration names = files.keys();
		while (names.hasMoreElements()) {
			result[i++] = (String)names.nextElement();
			}
		return result;
		}

	/** Returns true iff the named file exists in this directory. */
	public final boolean fileExists(String name) {
		SerializableRAMFile file = (SerializableRAMFile)files.get(name);
		return file != null;
		}

	/** Returns the time the named file was last modified. */
	public final long fileModified(String name) throws IOException {
		SerializableRAMFile file = (SerializableRAMFile)files.get(name);
		return file.lastModified;
		}

	/** Returns the length in bytes of a file in the directory. */
	public final long fileLength(String name) {
		SerializableRAMFile file = (SerializableRAMFile)files.get(name);
		return file.length;
		}

	/** Removes an existing file in the directory. */
	p

Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

> using a RAMDir as a middle man solved my problems...

Thanks. What's is your heuristic to flush the RAMDirectory? Also how do 
you deal with System.exit() or application death? Eg, your are indexing 
something and the application dies or is killed.

Thanks for any input.

R.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Italian web sites

2002-04-26 Thread Marco Ferrante

What does it mean? "Italian website" can be:
  - site that use italian language
  - site owned by an italian organization
  - site hosted in a italian geographical site
Every definition has a different solution.

Date sent:  Wed, 24 Apr 2002 11:02:32 +0200
From:   "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Subject:Italian web sites
To: [EMAIL PROTECTED]
Send reply to:  Lucene Users List <[EMAIL PROTECTED]>

> Hi all,
>
> I'm using Jobo for spidering web sites and lucene for indexing. The
> problem is that I'd like spidering only Italian web sites.
> How can I see discover the country of a web site?
>
> Dou you know some method that tou can suggest me?
>
> Thanks
>
>
> Laura
>


--
Marco Ferrante ([EMAIL PROTECTED])
CSITA (Centro Servizi Informatici e Telematici d'Ateneo)
Università degli Studi di Genova - Italy
Via Brigata Salerno, ponte - 16147 Genova
tel (+39) 0103532621 (interno tel. 2621)
--


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene index integrity... or lack of :-(

2002-04-26 Thread Karl Øie

there are some strange problems with FSDirectory, i have found that building 
chuncks in a RAMDirectory and then merge these into a FSDirectory is more 
stable than indexing directly into the FSDirectory, i ran into your problem 
and the dreaded "too many open files" problems when indexing large documents 
with many fields

using a RAMDir as a middle man solved my problems...

mvh karl øie

On Friday 26 April 2002 13:54, petite_abeille wrote:
> Hello,
>
> I'm starting to wander how "bullet proof" are Lucene indexes? Do they
> get corrupted easely? If so is there a way to rebuild them?
>
> I'm started to get the following exception left and right...
>
> "04/25 18:34:39 (Warning) Indexer.indexObjectWithValues:
> java.io.IOException: _91.fnm already exists"
>
> I build a little app (http://homepage.mac.com/zoe_info/) that uses
> Lucene quiet extensively, and I would like to keep it that way. However,
> I'm starting to have second thought about Lucene's reliability... :-(
>
> I'm sure I'm doing something wrong somewhere, but I really cannot see
> what...
>
> Any help or insight greatly appreciated.
>
> Thanks.
>
> PA.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Lucene index integrity... or lack of :-(

2002-04-26 Thread petite_abeille

Hello,

I'm starting to wander how "bullet proof" are Lucene indexes? Do they 
get corrupted easely? If so is there a way to rebuild them?

I'm started to get the following exception left and right...

"04/25 18:34:39 (Warning) Indexer.indexObjectWithValues: 
java.io.IOException: _91.fnm already exists"

I build a little app (http://homepage.mac.com/zoe_info/) that uses 
Lucene quiet extensively, and I would like to keep it that way. However, 
I'm starting to have second thought about Lucene's reliability... :-(

I'm sure I'm doing something wrong somewhere, but I really cannot see 
what...

Any help or insight greatly appreciated.

Thanks.

PA.


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




FileNotFoundException: Too many open files

2002-04-26 Thread petite_abeille

Hello,

I'm running into this exception quiet often while using Lucene (the 
situation is so bad with the latest rc, that I had to revert to the last 
com.lucene package). I'm sure I have my fair share of bugs in my app, 
but nonetheless, how can I "control" Lucene usage of RandomAccessFile? 
The indexes are optimized and I try to keep a close look at how many 
IndexWriter/Reader exists at any point in time... Nevertheless, I run 
into that exception much too often :-( Any help appreciated!

"04/26 00:07:11 (Warning) Finder.findObjectsWithSpecificationInStore: 
java.io.FileNotFoundException:  _la.f9 (Too many open files)"

Also, on a somewhat related note, how do I "shut down" Lucene properly. 
Eg, do I need to do anything with the IndexWriter and so on?

Last, but not least, is there a way to turn of the file locking in the 
latest rc as it's really getting in the way :-(

Finally, I just wanted to make sure: Lucene is fully multi-threaded 
right? I can do search *and* write concurrently in different threads at 
the same time on the same index?

Any insight much appreciated!

Thanks.

PA.

BTW, should I post this kind of question to user or dev?


--
To unsubscribe, e-mail:   
For additional commands, e-mail: