Re: Latest StopAnalyzer.java

2004-07-06 Thread Erik Hatcher
On Jul 6, 2004, at 1:08 AM, Karthik N S wrote:
Can SomeBody Tell me Where Can I find Latest copy of
StopAnalyzer.java
which can be used with Lucene1_4-final,
On Lucene-Sandbox I am not able to Find it.

[ My Company Prohibits me from using CVS ]
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/src/java/org/apache/ 
lucene/analysis/StopAnalyzer.java?rev=1.6view=auto

All of Jakarta's CVS can be browsed this way.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Running OutOfMemory while optimizing and searching

2004-07-06 Thread Otis Gospodnetic
Note that force is really just 'suggest'.  Regardless, I have seen apps
running under 1.3.1 JVM where this worked.

Otis

--- David Spencer [EMAIL PROTECTED] wrote:
 This in theory should not help, but anyway, just in case, the idea is
 to 
 call gc() periodically to force gc - this is the code I use which 
 tries to force it...
 
 
 public static long gc()
   {
   long bef = mem();
   System.gc();
   sleep( 100);
   System.runFinalization();
   sleep( 100);
   System.gc();
   long aft= mem();
   return aft-bef;
   }
 
 Mark Florence wrote:
 
  Thanks, Jim. I'm pretty sure I'm throwing OOM for real,
  and not because I've run out of file handles. I can easily
  recreate the latter condition, and it is always reported
  accurately. I've also monitored the OOM as it occurs using
  top and I can see memory usage climbing until it is
  exhausted -- if you will excuse the pun!
  
  I'm not familiar with the new compound file format. Where
  can I look to find more information?
  
  -- Mark
  
  -Original Message-
  From: James Dunn [mailto:[EMAIL PROTECTED]
  Sent: Friday, July 02, 2004 01:29 pm
  To: Lucene Users List
  Subject: Re: Running OutOfMemory while optimizing and searching
  
  
  Ah yes, I don't think I made that clear enough.  From
  Mark's original post, I believe he mentioned that he
  used seperate readers for each simultaneous query.
  
  His other issue was that he was getting an OOM during
  an optimize, even when he set the JVM heap to 2GB.  He
  said his index was about 10.5GB spread over ~7000
  files on Linux.  
  
  My guess is that OOM might actually be a too many
  open files error.  I have seen that type of error
  being reported by the JVM as an OutOfMemory error on
  Linux before.  I had the same problem but once I
  switched to the new Lucene compound file format, I
  haven't had that problem since.  
  
  Mark, have you tried switching to the compound file
  format?  
  
  Jim
  
  
  
  
  --- Doug Cutting [EMAIL PROTECTED] wrote:
  
   What do your queries look like?  The memory
 required
   for a query can be computed by the following
 equation:
  
   1 Byte * Number of fields in your query * Number
 of
   docs in your index
  
   So if your query searches on all 50 fields of
 your 3.5
   Million document index then each search would
 take
   about 175MB.  If your 3-4 searches run
 concurrently
   then that's about 525MB to 700MB chewed up at
 once.
 
 That's not quite right.  If you use the same
 IndexSearcher (or 
 IndexReader) for all of the searches, then only
 175MB are used.  The 
 arrays in question (the norms) are read-only and can
 be shared by all 
 searches.
 
 In general, the amount of memory required is:
 
 1 byte * Number of searchable fields in your index *
 Number of docs in 
 your index
 
 plus
 
 1k bytes * number of terms in query
 
 plus
 
 1k bytes * number of phrase terms in query
 
 The latter are for i/o buffers.  There are a few
 other things, but these 
 are the major ones.
 
 Doug
 
 
 
  
 
 -
  
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
  
  
  
  
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
  
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search Hit Score

2004-07-06 Thread Karthik N S

Hi
Dev Guys

Apologies.

I have 3 Questions for u.

1)
  I have a situation in here where I am suppose to group  unique indexerd
Documents
  depending upon the number of  hit's per document.

  To Breifly Explain this

  All documet with n  hits  for a Search word would be grouped under
Catagory A

 and all document with  hits n+1  for the same Search Word should be
grouped under  Catagory B.

 Can Lucene provide some means internally to handle this situation.


2) What is this weight /Boost factor  avaliable for the hits  ,and how to
use this Effectively.


3) Is there any thing in Lucene Core which reveles the version numbering of
current used jar files

   something like on command prompt  Java -version  displaying the
version.





with regards
Karthik




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 06, 2004 4:22 PM
To: Lucene Users List
Subject: Re: Latest StopAnalyzer.java


On Jul 6, 2004, at 2:53 AM, Morus Walter wrote:
 Karthik N S writes:

 Can SomeBody Tell me Where Can I find Latest copy of
 StopAnalyzer.java
 which can be used with Lucene1_4-final,
 On Lucene-Sandbox I am not able to Find it.

 [ My Company Prohibits me from using CVS ]

 There is no lucene 1.4 final but
 org.apache.lucene.analysis.StopAnalyzer
 is part of the lucene core.

Actually Doug did create Lucene 1.4 final:

http://jakarta.apache.org/lucene/docs/index.html

I'll try to squeeze in some time today to make it more official by
ensuring the binaries are mirrored and such.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



upgrade from Lucene 1.3 final to 1.4rc3 problem

2004-07-06 Thread Alex Aw Seat Kiong
Hi!

I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the 
lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
We can re-compile it successfuly. but when will try to index the document. It give the 
error as below:
java.lang.NullPointerException
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
Which wrong? Pls help.

Thanks.

Regards,
Alex





Upgrade from Lucene 1.3 final to 1.4 problem

2004-07-06 Thread Karthik N S
Hey

Apologies 

  Same with me tooo...

  The no of Hits on set of Documents  indexed using 1.3-final  is not same
on  1.4-final  version
  [ The only modification done to the src is , I have upgraded my
CustomAnalyzer  on basis of StopAnalyzer avaliable in 1.4 ]
  Does doing this effect the performance.


  Some body please explain.


with regards
Karthik

-Original Message-
From: Alex Aw Seat Kiong [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 9:50 AM
To: Lucene Users List
Subject: upgrade from Lucene 1.3 final to 1.4rc3 problem


Hi!

I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite
the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
We can re-compile it successfuly. but when will try to index the document.
It give the error as below:
java.lang.NullPointerException
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
Which wrong? Pls help.

Thanks.

Regards,
Alex





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Most efficient way to index 14M documents (out of memory/file handles)

2004-07-06 Thread Kevin A. Burton
I'm trying to burn an index of 14M documents.
I have two problems.
1.  I have to run optimize() every 50k documents or I run out of file 
handles.  this takes TIME and of course is linear to the size of the 
index so it just gets slower by the time I complete.  It starts to crawl 
at about 3M documents.

2.  I eventually will run out of memory in this configuration.
I KNOW this has been covered before but for the life of me I can't find 
it in the archives, the FAQ or the wiki. 

I'm using an IndexWriter with a mergeFactor of 5k and then optimizing 
every 50k documents.

Does it make sense to just create a new IndexWriter for every 50k docs 
and then do one big optimize() at the end?

Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]