from:"Nader Henein"

Re: Re-Indexing a moving target???

2005-02-01 Thread Nader Henein

details?
Yousef Ourabi wrote:
Saad,
Here is what I got. I will post again, and be more
specific.
-Y
--- Nader Henein <[EMAIL PROTECTED]> wrote:
 

We'll need a little more detail to help you, what
are the sizes of your 
updates and how often are they updated.

1) No just re-open the index writer every time to
re-index, according to 
you it's moderately changing index, just keep a flag
on the rows and 
batch indexing every so often.
2) It all comes down to your needs, more detail
would help us help you.

Nader Henein
Yousef Ourabi wrote:
   

Hey,
We are using lucene to index a moderatly changing
database, and I have a couple of questions on a
performance strategy.
1) Should we just have one index writer open unil
 

the
   

system comes down...or create a new index writer
 

each
   

time we re-index our data-set.
2) Does anyone have anythoughts...multi-threading
 

and
   

segments instead of one index?
Thanks for your time and help.
Best,
Yousef
 

-
   

To unsubscribe, e-mail:
 

[EMAIL PROTECTED]
   

For additional commands, e-mail:
 

[EMAIL PROTECTED]
   



 

--
Nader S. Henein
Senior Applications Developer
Bayt.com

   

-
 

To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]
   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Re-Indexing a moving target???

2005-01-28 Thread Nader Henein

We'll need a little more detail to help you, what are the sizes of your 
updates and how often are they updated.

1) No just re-open the index writer every time to re-index, according to 
you it's moderately changing index, just keep a flag on the rows and 
batch indexing every so often.
2) It all comes down to your needs, more detail would help us help you.

Nader Henein
Yousef Ourabi wrote:
Hey,
We are using lucene to index a moderatly changing
database, and I have a couple of questions on a
performance strategy.
1) Should we just have one index writer open unil the
system comes down...or create a new index writer each
time we re-index our data-set.
2) Does anyone have anythoughts...multi-threading and
segments instead of one index?
Thanks for your time and help.
Best,
Yousef
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

--
Nader S. Henein
Senior Applications Developer
Bayt.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: QUERYPARSIN & BOOSTING

2005-01-11 Thread Nader Henein

From the text on the Lucene Jakarta Site : 
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

Lucene provides the relevance level of matching documents based on the 
terms found. To boost a term use the caret, "^", symbol with a boost 
factor (a number) at the end of the term you are searching. The higher 
the boost factor, the more relevant the term will be.

   Boosting allows you to control the relevance of a document by
   boosting its term. For example, if you are searching for


jakarta apache


   and you want the term "jakarta" to be more relevant boost it using
   the ^ symbol along with the boost factor next to the term. You would
   type:


jakarta^4 apache


   This will make documents with the term jakarta appear more relevant.
   You can also boost Phrase Terms as in the example:


"jakarta apache"^4 "jakarta lucene"


   By default, the boost factor is 1. Although the boost factor must be
   positive, it can be less than 1 (e.g. 0.2)
Regards.
Nader Henein
Karthik N S wrote:
Hi Guys

Apologies...
This Question may be asked million times on this form ,need some
clarifications.
1) FieldType =  keyword  name =  vendor
2)FieldType =  text  name = contents
Question:
1) How to Construct a Query which would allow hits  avaliable for the VENDOR
to  appear  first ?.
2) If boosting is to be applied How TO   ?.
3) Is the Query Constructed Below correct?.
+Contents:shoes +((vendor:nike)^10)

Please Advise.
Thx in advance.
WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Advice on indexing content from a database

2005-01-05 Thread Nader Henein

Hibernate + Lucene,
Use Hibernate to read from your DB, this will pull out the data you need 
in nice and clean objects, and then loop through your object collection 
and create Lucene documents, you can add Quartz to the equation and have 
this process run scheduled on chunks of your data till it's all been 
indexed and then continue on with incremental updates / deletes.


Nader Henein
[EMAIL PROTECTED] wrote:
Hi
I'm working on integrating lucene with a cms. All the data is stored in a
database. I'm looking at about 2 million records. Any advice on an
effective technique to index this (incrementally or using threads) that
would not overload my server.
Thanks
Aneesha
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: time of indexer

2004-12-28 Thread Nader Henein

Download Luke, it makes life easy when you inspect the index, so you an 
actually look at what you've indexed, as opposed to what you may think 
you indexed.

Nader
Daniel Cortes wrote:
Hi to everybody, and merry christmas for all(and specially people who 
that me today are "working"  instead of stay with the family).

I don't understand because my search in the index give this bad results:
I index 112 php files how a txt.
with this machine
Pentium 4 2,4GHz 512 RAM running during the index Windows XP and Eclipse
Tiempo de bÃsqueda total: 80882 ms
the fields that I use are
doc.add(Field.Keyword("filename", file.getCanonicalPath()));
doc.add(Field.UnStored("body", bodyText));
doc.add(Field.Text("titulo", title));
What I'm doing bad?
thks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: index question

2004-12-27 Thread Nader Henein

ok, so you can index the whole document in one shot, but you should 
store certain fields like what you display in the search results in the 
index to avoid a round trip to the DB.

so for example you would store "title" "synopsis" "link" "doc_id" "date" 
and then just index what you want to be searchable, the reason why you 
would have title stored in one field and indexed again in another so if 
you stem that field it will become useless for display purposes.  So the 
logical representation of your index would look something like this:


stored/ indexed
stored/ un-indexed
stored/ un-indexed
    stored / indexed
 indexed / un stored

Enjoy
Nader Henein
Daniel Cortes wrote:
thks nader
I need a general search of documents, it's for this that I ask yours 
recomendations, because fields are only for info in the search. 
Tipically search on Google for example

search:casa
La casa roja
..habÃa una vez una casa roja que tenia 
htttp:\\go.to\casaModification date:25-12-04
for do this  what fields and options (keybord,text,unindex,unstored) 
do you should use?

thks
Nader Henein wrote:
It comes down to your searching needs, do you need to have your 
documents searcheable by these fields or do you need a general search 
of the whole document, your decisions will impact the size of the 
index and the speed of indexing and searching so give it due thought, 
start from your GUI requirement and design the index that responds to 
your user needs best.

Nader
Daniel Cortes wrote:
I want to know In the case that you use Lucene for index files how a 
general searcher, what fields (or keys) do you use to index.
For example, in my case are html,pdf,doc,ppt and txt and I'm thinked 
to use Field Autor, Field title, field url, field content, field 
modification date.
Something more? some recommendation?
thks
and Merry Xmas for all.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: index question

2004-12-27 Thread Nader Henein

It comes down to your searching needs, do you need to have your 
documents searcheable by these fields or do you need a general search of 
the whole document, your decisions will impact the size of the index and 
the speed of indexing and searching so give it due thought, start from 
your GUI requirement and design the index that responds to your user 
needs best.

Nader
Daniel Cortes wrote:
I want to know In the case that you use Lucene for index files how a 
general searcher, what fields (or keys) do you use to index.
For example, in my case are html,pdf,doc,ppt and txt and I'm thinked 
to use Field Autor, Field title, field url, field content, field 
modification date.
Something more? some recommendation?
thks
and Merry Xmas for all.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: MergerIndex + Searchables

2004-12-21 Thread Nader Henein

As obvious as it may seem, you could always store the index ID in which 
you are indexing the document in the document itself and have that 
fetched with the search results, or is there something stopping you from 
doing that.

Nader Henein
Karthik N S wrote:
Hi Guys
Apologies...
I have several MERGERINDEXES [  MGR1,MGR2,MGR3].
for searching across these MERGERINDEXES I use the following Code
IndexSearcher[] indexToSearch = new IndexSearcher[CNTINDXDBOOK];
for(int all=0;all
MultiSearcher searcher = new MultiSearcher(indexToSearch);
Question :
When on Search Process , How to Display that this relevan  Document Id
Originated from Which MRG???
[ Some thing like this : -  Search word  'ISBN12345' is avalible from
"MRGx" ]

 WITH WARM REGARDS
 HAVE A NICE DAY
 [ N.S.KARTHIK]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: LUCENE1.4.1 - LUCENE1.4.2 - LUCENE1.4.3 Exception

2004-12-15 Thread Nader Henein

This is a OS file system error not a Lucene issue (not for this board) , 
Google it for Gentoo specifically you a get a whole bunch of results one 
of which is this thread on the Gentoo Forums, 
http://forums.gentoo.org/viewtopic.php?t=9620

Good Luck
Nader Henein
Karthik N S wrote:
Hi Guys
Some body tell me what this Exception am Getting Pleae
Sys Specifications
O/s Linux Gentoo
Appserver Apache Tomcat/4.1.24
Jdk build 1.4.2_03-b02
Lucene 1.4.1 ,2, 3
Note: - This Exception is displayed on Every 2nd Query after Tomcat is
started
java.io.IOException: Stale NFS file handle
   at java.io.RandomAccessFile.readBytes(Native Method)
   at java.io.RandomAccessFile.read(RandomAccessFile.java:307)
   at
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:420)
   at
org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
   at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou
ndFileReader.java:220)
   at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
   at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
   at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
   at
org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:142)
   at
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115)
   at
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143)
   at
org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:137)
   at
org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:253)
   at
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:69)
   at org.apache.lucene.search.Similarity.idf(Similarity.java:255)
   at
org.apache.lucene.search.TermQuery$TermWeight.sumOfSquaredWeights(TermQuery.
java:47)
   at org.apache.lucene.search.Query.weight(Query.java:86)
   at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)
   at
org.apache.lucene.search.MultiSearcherThread.run(ParallelMultiSearcher.java:
251)


 WITH WARM REGARDS
 HAVE A NICE DAY
 [ N.S.KARTHIK]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Opinions: Using Lucene as a thin database

2004-12-14 Thread Nader Henein

How big do you expect it to get and how often do you expect to update 
it, we've been using Lucene for about 1 M records (19 fields each) with 
incremental updates every 10 minutes, the performance during updates 
wasn't wonderful, so it took some seriously intense code to sort that 
out, as you mentioned, it comes down to why you need the Thin DB for, 
Lucene is a wonderful search engine, but if I were looking at a fast and 
dirty relational DB, MySQL wins hands down, put them both together and 
you've really got something.

My 2 cents
Nader Henein
Kevin L. Cobb wrote:
I use Lucene as a legitimate search engine which is cool. But, I am also
using it as a simple database too. I build an index with a couple of
keyword fields that allows me to retrieve values based on exact matches
in those fields. This is all I need to do so it works just fine for my
needs. I also love the speed. The index is small enough that it is
wicked fast. Was wondering if anyone out there was doing the same of it
there are any dissenting opinions on using Lucene for this purpose. 




 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: HITCOLLECTOR+SCORE+DELIMMA

2004-12-13 Thread Nader Henein

Dude, and I say this with love, it's open source, you've got the code, 
take the initiative, DIY, be creative and share your findings with the 
rest of us.

Personally I would be interested to see how you do this, keep your 
changes documented and share.

Nader Henein
Karthik N S wrote:
Hi Erik
Apologies..
I got Confused with the last mail.
 

Iterate over Hits.  returns large hit values and Iteration on Hits for
 

scores consumes time ,
so How Do I Limit my Search Between [ X.xf to Y.yf ] prior getting the Hits.
Note:- The search is being done on Field Type 'Text' ,consists of 'Contents'
from various Html documents
Please Advise me
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, December 13, 2004 5:05 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMA

On Dec 13, 2004, at 1:16 AM, Karthik N S wrote:
 

So u say I have to Build a Filter to Collect all the Scores between
the 2
Ranges [ 0.2f to 1.0f]
   

My message is being misinterpreted.  I said "filter" as a verb, not a
noun.  :)  In other words, I was not intending to mean write a Filter -
a Filter would not be able to filter on score.
 

so the API for the same would be
Hits hit = search(Query query, Filter filtertoGetScore)
But while writing the Filter  Score again depends on Hits  >
Score =
hits.score(x);
   

Again, you cannot write a Filter (capital 'F') to deal with score.
Please re-read what I said below...
 

Hits are in descending score
order, so you may just want to use Hits and filter based on the score
provided by hits.score(i).
   

Iterate over Hits... when you encounter scores below your desired
range, stop iterating.  Why is this simple procedure not good enough
for what you are trying to achieve?
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SEARCH CRITERIA

2004-11-30 Thread Nader Henein

they probably create a list of similar results by doing some sort of 
data mining on the search criteria that people use in succession, so for 
example someone, or they have a list of searches that are too general (a 
search for the word kid is at best stupid) but you can't call your users 
stupid so you try to guess what they're searching for based on other 
searches conducted  (kid rock, kid games, star wars kid, karate kid ) 
that contain the initial search string "kid". You can use fuzzy search 
in Lucene, but that won't do that really, the short answer is DIY 
depending on your needs.

My two galiuns
Nader Henein
Karthik N S wrote:
Hi Guys
Apologies.
On yahoo and Altavista ,if searched upon a word like 'kid'  returns the
search with
similar as below.
  Also try: kid rock, kid games, star wars kid, karate kid   More...

 How to obtain the similar search criteria using Lucene.
Thx in advance
Warm regards
Karthik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: disadvantages

2004-11-21 Thread Nader Henein

You may singe your fingers if you touch the keyboard during indexing
Nader
Miguel Angel wrote:
What are disadvantages the Lucene?? 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Optimized??

2004-11-20 Thread Nader Henein

The down and dirty answer is it's like defragmenting your harddrive, 
you're basically compacting and sorting out index references. What you 
need to know is that it makes searching so much faster after you've 
updating the index.

Nader Henein
Miguel Angel wrote:
What`s mean Optimized index in Lucene¿?
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Need help with filtering

2004-11-16 Thread Nader Henein

Well if the document ID is number (even if it isn't really) you could 
use a range query, or just rebuild your index using that specific filed 
as a sorted field but if it numeric be aware that if you use integer it 
limits how high your numbers can get.

nader
Edwin Tang wrote:
Hello,
I have been using DateFilter to limit my search results to a certain date
range. I am now asked to replace this filter with one where my search results
have document IDs greater than a given document ID. This document ID is
assigned during indexing and is a Keyword field.
I've browsed around the FAQs and archives and see that I can either use
QueryFilter or BooleanQuery. I've tried both approaches to limit the document
ID range, but am getting the BooleanQuery.TooManyClauses exception in both
cases. I've also tried bumping max number of clauses via setMaxClauseCount(),
but that number has gotten pretty big.
Is there another approach to this? Or am I setting this up incorrectly? Snippet
of one of my approaches follows:
queryFilter = new QueryFilter(new RangeQuery(new Term("id", sLastSearchedId),
null, false));
docs = searcher.search(parser.parse(sSearchPhrase), queryFilter,
utility.iMaxResults, new Sort(sortFields));
Thanks in advance,
Ed
		
__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: _4c.fnm missing

2004-11-16 Thread Nader Henein

That's it, you need to batch your updates, it comes down to do you need to give 
your users search accuracy to the second, take your  database and put an 
is_dirty row on the master table of the object you're indexing and run a 
scheduled task every x minutes and have your process read the objects that are 
set to dirty and then re set the flag once they've been indexed correctly.
my two cents
Nader

Otis Gospodnetic wrote:
'Concurrent' and 'updates' in the same sentence sounds like a possible
source of the problem.  You have to use a single IndexWriter and it
should not overlap with an IndexReader that is doing deletes.
Otis
--- Luke Shannon <[EMAIL PROTECTED]> wrote:
 

It conistantly breaks when I run more than 10 concurrent incremental
updates.
I can post the code on Bugzilla (hopefully when I get to the site it
will be
obvious how I can post things).
Luke
- Original Message - 
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, November 16, 2004 3:20 PM
Subject: Re: _4c.fnm missing

   

Field names are stored in the field info file, with suffix .fnm. -
 

see
   

http://jakarta.apache.org/lucene/docs/fileformats.html
The .fnm should be inside the .cfs file (cfs files are compound
 

files
   

that contain all index files described at the above URL).  Maybe
 

you
   

can provide the code that causes this error in Bugzilla for
 

somebody to
   

look at.  Does it consistently break?
Otis
--- Luke Shannon <[EMAIL PROTECTED]> wrote:
 

I received the error below when I was attempting to over whelm my
system with incremental update requests.
What is this file it is looking for? I checked the index. It
contains:
_4c.del
_4d.cfs
deletable
segments
Where does _4c.fnm come from?
Here is the error:
Unable to create the create the writer and/or index new content
/usr/tomcat/fb_hub/WEB-INF/index/_4c.fnm (No such file or
   

directory).
   

Thanks,
Luke
   

 

-
   

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
 

[EMAIL PROTECTED]
   

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: _4c.fnm missing

2004-11-16 Thread Nader Henein

what kind of incremental updates are you doing, because we update our index 
every 15 minutes with 100 ~ 200 documents and we're writing to a 6 GB memory 
resident index, the IndexWriter runs one instance at a time, so what kind of 
increments are we talking about it takes a bit of doing to overwhelm Lucene.
What's your update schedule, how big is the index, and after how many updates 
does the system crash?
Nader Henein

Luke Shannon wrote:
It conistantly breaks when I run more than 10 concurrent incremental
updates.
I can post the code on Bugzilla (hopefully when I get to the site it will be
obvious how I can post things).
Luke
- Original Message - 
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Tuesday, November 16, 2004 3:20 PM
Subject: Re: _4c.fnm missing

 

Field names are stored in the field info file, with suffix .fnm. - see
http://jakarta.apache.org/lucene/docs/fileformats.html
The .fnm should be inside the .cfs file (cfs files are compound files
that contain all index files described at the above URL).  Maybe you
can provide the code that causes this error in Bugzilla for somebody to
look at.  Does it consistently break?
Otis
--- Luke Shannon <[EMAIL PROTECTED]> wrote:
   

I received the error below when I was attempting to over whelm my
system with incremental update requests.
What is this file it is looking for? I checked the index. It
contains:
_4c.del
_4d.cfs
deletable
segments
Where does _4c.fnm come from?
Here is the error:
Unable to create the create the writer and/or index new content
/usr/tomcat/fb_hub/WEB-INF/index/_4c.fnm (No such file or directory).
Thanks,
Luke
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Backup strategies

2004-11-16 Thread Nader Henein

We've recently implemented something similar with the backup process 
creating a file (much like the lock files during indexing) that the 
IndexWriter recognizes (tweak) and doesn't attempt to start and indexing 
or a delete while it's there, wasn't that much work actually.

Nader
Doug Cutting wrote:
Christoph Kiehl wrote:
I'm curious about your strategy to backup indexes based on 
FSDirectory. If I do a file based copy I suspect I will get corrupted 
data because of concurrent write access.
My current favorite is to create an empty index and use 
IndexWriter.addIndexes() to copy the current index state. But I'm not 
sure about the performance of this solution.

How do you make your backups?

A safe way to backup is to have your indexing process, when it knows 
the index is stable (e.g., just after calling IndexWriter.close()), 
make a checkpoint copy of the index by running a shell command like 
"cp -lpr index index.YYYMMDDHHmmSS".  This is very fast and requires 
little disk space, since it creates only a new directory of hard 
links.  Then you can separately back this up and subsequently remove it.

This is also a useful way to replicate indexes.  On the master 
indexing server periodically perform "cp -lpr" as above.  Then search 
slaves can use rsync to pull down the latest version of the index.  If 
a very small mergefactor is used (e.g., 2) then the index will have 
only a few segments, so that searches are fast.  On the slave, 
periodically find the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ 
index.YYYMMDDHHmmSS" and 'rsync --delete master:index.YYYMMDDHHmmSS 
index.YYYMMDDHHmmSS' to efficiently get a local copy, and finally "ln 
-fsn index.YYYMMDDHHmmSS index" to publish the new version of the index.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to efficiently get # of search results, per attribute

2004-11-13 Thread Nader Henein

It depends on how many results they're looking through, here are two 
scenarios I see:

1] If you don't have that many records you can fetch all the results and 
then do a post parsing step the determine totals

2] If you have a lot of entries in each category and you're worried 
about fetching thousands of records every time, you can just have 
seperate indecies per category and search them in in parallel (not 
Lucene Parallel Search) and you can get up to 100 hits for each one 
(efficiency) but you'll also have the total from the search to display.

Either way you can boost up speed using RamDirectory if you need more 
speed from the search, but whichever approach you choose I would 
recommend that you sit down and do some number crunching to figure out 
which way to go.

Hope this helps
Nader Henein

Chris Lamprecht wrote:
I'd like to implement a search across several types of "entities",
let's say, classes, professors, and departments.  I want the user to
be able to enter a simple, single query and not have to specify what
they're looking for.  Then I want the search results to be something
like this:
Search results for: "philosophy boyer"
Found: 121 classes - 5 professors - 2 departments

I know I could iterate through every hit returned and count them up
myself, but that seems inefficient if there are lots of results.  Is
there some other way to get this kind of information from the search
result set?  My other ideas are: doing a separate search each result
type, or storing different types in different indexes.  Any
suggestions?  Thanks for your help!
-Chris
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UPDATION+MERGERINDEX

2004-11-07 Thread Nader Henein

Well if you do all the steps in one run, I guess optimizing once at the 
end would be faster overall, but all you have to do is test it out and 
time it, performance wise, I don't think that step 3 (OPTIMIZE) in 
scenario (a) will really improve the performance of the new index merge.

my 2 cents
Nader Henein
Karthik N S wrote:
Hi Guys
Apologies.
a) 

1) SEARCH FOR SUBINDEX IN A  OPTIMISED MERGED INDEX
2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX
3) OPTIMISE THE MERGERINDEX
4) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX
5) OPTIMISE THE MERGERINDEX

b)
1) SEARCH FOR SUBINDEX IN A  OPTIMISED MERGED INDEX
2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX
3) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX
4) OPTIMISE THE MERGERINDEX
a  OR  b  WHICH IS BETTER CHOICE 


THX IN ADVANCE
   
 WITH WARM REGARDS 
 HAVE A NICE DAY 
 [ N.S.KARTHIK] 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: commit lock, graceful handler

2004-11-02 Thread Nader Henein

Graceful, no, I started a discussion on this about two years ago, what 
I'm doing is a batched indexing so if a crash occurs the next time the 
application starts up I have an  LuceneInit class that goes and ensures 
that all indecies have no locks on them by simply deleting the lock file 
and optimizing the index, this has worked for us well for the past two 
years in a production environment and the next indexing run will pick up 
the same batch and re-index it, which doesn't hurt the index because 
every time I add a document to the index, I actually delete it first to 
ensure that there are no repetitions, we've never had an index go 
corrupt on us but we do have six indecies being updated in parallel in 
addition to nightly backups by our hosting facility during a one hour 
window where we do no updates/deletes on the index to ensure that the 
backup is kosher.

It may not be graceful as Oracle Rollback Tables but it's functional and 
a lot less complicated.

Nader
Jackson Earnst wrote:
I'm testing fault tollerance aspects of an application using Lucene. 
Consider if power is pulled form the server/workstation and it
immediately shuts down hard or crashes.

I'm faced with a situation of a commit.lock file exising in the temp
directory.  Lucene is throwing an exception when a writer is first
created against this index.  An IOexception comes about and "Lock
obtained timed out" error occurs.
REading some docs anf FAQs I see that this could be deleted and the
index will be in a usable state.
Any advice/comments/thoughts?
Is there a graceful way to handle this?  

Thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Atomicity in Lucene operations

2004-10-18 Thread Nader Henein

As soon as I've cleaned up the code, I'll publish it, it needs a little 
more documentation as well.

Nader
Roy Shan wrote:
Maybe you can contribute it to sandbox?
On Mon, 18 Oct 2004 08:31:30 -0700 (PDT), Yonik Seeley
<[EMAIL PROTECTED]> wrote:
 

Hi Nader,
I would greatly appreciate it if you could CC me on
the docs or the code.
Thanks!
Yonik
--- Nader Henein <[EMAIL PROTECTED]> wrote:
   

It's pretty integrated into our system at this
point, I'm working on
Packaging it and cleaning up my documentation and
then I'll make it
available, I can give you the documents and if you
still want the code
I'll slap together a ruff copy for you and ship it
across.
Nader Henein
Roy Shan wrote:
 

Hello, Nader:
I am very interested in how you implement the
   

atomicity. Could you
 

send me a copy of your code?
Thanks in advance.
Roy
   

   
__
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   


 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: simultanous search and indexing

2004-10-17 Thread Nader Henein

you can do both at the same time, it's thread safe, you will face 
different issues depending on the frequency or your indexing and the 
load on the search, but that shouldn't come into play till your index 
gets nice and heavy. So basically code on.

Nader Henein
Miro Max wrote:
hi,
i'm using servlet to search my index and i wish to be
able to create an index at the same time.
do i have to use threads - i'm beginner
thx



___
Gesendet von Yahoo! Mail - Jetzt mit 100MB Speicher kostenlos - Hier anmelden: 
http://mail.yahoo.de
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Atomicity in Lucene operations

2004-10-17 Thread Nader Henein

It's pretty integrated into our system at this point, I'm working on
Packaging it and cleaning up my documentation and then I'll make it
available, I can give you the documents and if you still want the code
I'll slap together a ruff copy for you and ship it across.
Nader Henein
Roy Shan wrote:
Hello, Nader:
I am very interested in how you implement the atomicity. Could you
send me a copy of your code?
Thanks in advance.
Roy

On Sat, 16 Oct 2004 01:20:09 +0400, Nader Henein <[EMAIL PROTECTED]> wrote:
 

We use Lucene over 4 replicated indecies and we have to maintain
atomicity on deletion and updates with multiple fallback points. I'll
send you the right up, it's too big to CC the entire board.
nader henein

Christian Rodriguez wrote:
   

Hello guys,
I need additions and deletions of documents to the index to be ATOMIC
(they either happen to completion or not at all).
On top of this, I need updates (which I currently implement with a
deletion of the document followed by an addition) to be ATOMIC and
DURABLE (once I return from the "update" function its because the
operation happened to completion and stays in the index).
Notice that I dont really need all the ACID properties for all the operations.
I have tried to solve the problem by using the Lucene + BDB package
written by Andi Vajda and using transactions, but the BDB database
gets corrupted if I insert random System.exit() to simulate a crash of
the application before aborting or commiting transactions.
So I have two questions:
1. Has anyone been able to use the Lucene + BDB WITH transactions and
simulate random crashes at different points in the process of addding
items and found it to be robust (specially, have you been able to
always recover after a crash, with uncommited txns rolled back and
commited ones present in the DB)?
2. Can anyone suggest other solutions (beside using BDB) that may
work? For example: are any of these operations already atomic in
Lucene (using an FSDirectory)?
Thanks for any help you can give me!
Xtian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   


 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Atomicity in Lucene operations

2004-10-15 Thread Nader Henein

We use Lucene over 4 replicated indecies and we have to maintain 
atomicity on deletion and updates with multiple fallback points. I'll 
send you the right up, it's too big to CC the entire board.

nader henein
Christian Rodriguez wrote:
Hello guys,
I need additions and deletions of documents to the index to be ATOMIC
(they either happen to completion or not at all).
On top of this, I need updates (which I currently implement with a
deletion of the document followed by an addition) to be ATOMIC and
DURABLE (once I return from the "update" function its because the
operation happened to completion and stays in the index).
Notice that I dont really need all the ACID properties for all the operations.
I have tried to solve the problem by using the Lucene + BDB package
written by Andi Vajda and using transactions, but the BDB database
gets corrupted if I insert random System.exit() to simulate a crash of
the application before aborting or commiting transactions.
So I have two questions:
1. Has anyone been able to use the Lucene + BDB WITH transactions and
simulate random crashes at different points in the process of addding
items and found it to be robust (specially, have you been able to
always recover after a crash, with uncommited txns rolled back and
commited ones present in the DB)?
2. Can anyone suggest other solutions (beside using BDB) that may
work? For example: are any of these operations already atomic in
Lucene (using an FSDirectory)?
Thanks for any help you can give me!
Xtian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Encrypted indexes

2004-10-13 Thread Nader Henein

Well, are you "storing" any data for retrieval from the index, because 
you could encrypt the actual data and then encrypt the search string 
public key style.

Nader Henein
Weir, Michael wrote:
We need to have index files that can't be reverse engineered, etc. An
obvious approach would be to write a 'FSEncryptedDirectory' class, but
sounds like a performance killer.
Does anyone have experience in making an index secure?
Thanks for any help,
Michael Weir 
 
  This message may contain privileged and/or confidential information.  If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so.  Thank you. 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: sorting and score ordering

2004-10-12 Thread Nader Henein

As far as my testing showed, the sort will take priority, because it's 
basically an opt-in sort as opposed to the defaulted score sort. So 
you're basically displaying a sorted set over all your results as 
opposed to sorting the most relevant results.

Hope this helps
Nader Henein
Chris Fraschetti wrote:
If I use a Sort instance on my searcher, what will have priority?
Score or Sort? Assuming I have a pages with .9, .9, and .5 scores, ...
if the .5 has a higher 'sort' value, will it return higher than one of
the .9 lucene score values if they are lower?
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Arabic analyzer

2004-10-07 Thread Nader Henein

I'd be happy to help anyone test this out, my Arabic is pretty good.
Nader
Andrzej Bialecki wrote:
Dawid Weiss wrote:

nothing to do with each other furthermore, Arabic uses phonetic 
indicators on each letter called diacritics that change the way you 
pronounce the word which in turn changes the words meaning so two 
word spelled exactly the same way with different diacritics will 
mean two separate things, 

Just to point out the fact: most slavic languages also use diacritic 
marks (above, like 'acute', or 'dot' marks, or below, like the Polish 
'ogonek' mark). Some people argue that they can be stripped off the 
text upon indexing and that the queries usually disambiguate the 
context of the word.

Hmm. This brings up a question: the algorithmic stemmer package from 
Egothor works quite well for Polish (http://www.getopt.org/stempel), 
wouldn't it work well for Arabic, too?

I lack the necessary expertise to evaluate results (knowing only two 
or three arabic words ;-) ), but I can certainly help someone to get 
started with testing...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Arabic analyzer

2004-10-07 Thread Nader Henein

There is a way of writing an Arabic stemmer, it's just not a weekend 
project, I've seen the translate/stem option as well, and even tried it 
with Lucene, we've implemented Lucene on our database and we have about 
a million records in our DB with 19 indexed fields (some of which are 
clobs) in each record, the free text fields in each record are in many 
cases Arabic, we do not provide stemming on those just because I 
couldn't find a valid stemming or translation option, which held up to 
proper testing, some were ok, but after collecting data from user 
searches (averaging out at 5 searches per second) the Arabic stemming 
options would not be able to manage user expectations, which is what it 
comes down to, sometimes theory does not translate well to practice.

Nader Henein
Dawid Weiss wrote:

nothing to do with each other furthermore, Arabic uses phonetic 
indicators on each letter called diacritics that change the way you 
pronounce the word which in turn changes the words meaning so two 
word spelled exactly the same way with different diacritics will mean 
two separate things, 

Just to point out the fact: most slavic languages also use diacritic 
marks (above, like 'acute', or 'dot' marks, or below, like the Polish 
'ogonek' mark). Some people argue that they can be stripped off the 
text upon indexing and that the queries usually disambiguate the 
context of the word.

It is just a digression. Now back to the arabic stemmer -- there has 
to be a way of doing it. I know Vivisimo has clustering options for 
arabic. They must be using a stemmer (and an English translation 
dictionary), although it might be a commercial one. Take a look:

http://vivisimo.com/search?v:file=cnnarabic
D.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Arabic analyzer

2004-10-06 Thread Nader Henein

I worked on trying to develop one and it became a colossal pain, a 
conclusive Arabic dictionary is about 20 volumes roughly the size of an 
encyclopedia, just to give you some background when you search for a 
word in the encyclopedia you have to reduce it to either it's 2 or three 
letter root, then you can look for your desired word underneath that 
root, reducing the words to that root as part of the stemming is useless 
because words belonging to the same root more often than not have 
nothing to do with each other furthermore, Arabic uses phonetic 
indicators on each letter called diacritics that change the way you 
pronounce the word which in turn changes the words meaning so two word 
spelled exactly the same way with different diacritics will mean two 
separate things, I've seen Arabic stemmers that kinda of work, but none 
of them are open source, this is a good paper from Berkeley that 
outlines the work and the challenges, 
http://metadata.sims.berkeley.edu/papers/trec2002.pdf, hope it helps.

Nader Henein
Scott Smith wrote:
Is anyone aware of an open source (non-GPL; i.e.., free for commercial
use) Arabic analyzer for Lucene?  Does Arabic really require a stemmer
as well (some of the reading I've seen on the web would suggest that a
stemmer is almost a necessity with Arabic to get anything useful where
it is not with other languages).

Scott 



 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Moving from a single server to a cluster

2004-09-08 Thread Nader Henein

be a pleasure, just didn't want to mislead someone down the wrong way.
Give me a few days and I'll have the new version up.
Nader
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Moving from a single server to a cluster

2004-09-08 Thread Nader Henein

Hey Ben,
We've been using a distributed environment with three servers and three 
separate indecies for the past 2 years since the first stable Lucene 
release and it has been great, recently and for the past two months I've 
been working on a redesign for our Lucene App and I've shared my 
findings and plans with Otis, Doug and Erik, they pointed out a few 
faults in my logic which you will probably come across soon enough that 
mainly have to do with keeping you updates atomic (not too hard) and 
your deletes atomic (a little more tricky), give me a few days and I'll 
send you both the early document and the newer version that deals 
squarely with Lucene in a distributed environment with high volume index.

Regards.
Nader Henein
Ben Sinclair wrote:
My application currently uses Lucene with an index living on the
filesystem, and it works fine. I'm moving to a clustered environment
soon and need to figure out how to keep my indexes together. Since the
index is on the filesystem, each machine in the cluster will end up
with a different index.
I looked into JDBC Directory, but it's not tested under Oracle and
doesn't seem like a very mature project.
What are other people doing to solve this problem?
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Most efficient way to index 14M documents (out of memory/file handles)

2004-07-06 Thread Nader Henein

Here's the thread you want :
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1722573
Nader Henein
Kevin A. Burton wrote:
I'm trying to burn an index of 14M documents.
I have two problems.
1.  I have to run optimize() every 50k documents or I run out of file 
handles.  this takes TIME and of course is linear to the size of the 
index so it just gets slower by the time I complete.  It starts to 
crawl at about 3M documents.

2.  I eventually will run out of memory in this configuration.
I KNOW this has been covered before but for the life of me I can't 
find it in the archives, the FAQ or the wiki.
I'm using an IndexWriter with a mergeFactor of 5k and then optimizing 
every 50k documents.

Does it make sense to just create a new IndexWriter for every 50k docs 
and then do one big optimize() at the end?

Kevin
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: incrementally indexing a million documents

2004-06-15 Thread Nader Henein

How are you documents named? is it alphabetical or numerical, Mine where 
numerical so I I creates n directories like so
11 , 12, 13, 14,  19, 21 , 22 , 23 ..  99   you get the idea 
and I stored the files into the directories that each belonged to 
depending on the last two numbers in the file name (you could use file 
size to shuffle the files around as well (ie, use the 2 rightmost  
numbers in the file size in bytes)  so at this point you'll have 
shuffled your million docs into 100 directories and then  Lucene can 
spider through each set of directories indexing let's say 5000 files at 
a time and then deleting them or moving them into another location, it 
you get 100 million files simply up the precision on the directory to a 
3 digit setup or a 4 digit setup (once you automate it, sky's the limit) 

Hope this helps
Nader Henein
Michael Wechner wrote:
I try to index around a million documents. The problem is
that I run out of memory during sorting by uid when I go through
the directory recursively.
Well, I could add more memory, but this wouldn't really solve my problem,
because at some point I will always run out of memory (e.g. 10 million 
documents).

Is there another approach than sorting by uid?
Thanks
Michi
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Devnagari Search?

2004-06-10 Thread Nader Henein

Have faith in the UNICODE standard it's well thought out and if you have 
any internationalization queries there was an excellent article on Java 
World entitled end-to-end internationalization here's the link: 
http://www.javaworld.com/javaworld/jw-05-2004/jw-0524-i18n_p.html  have 
a read it helps clear out some myths.

Nader Henein
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Disappearing segments

2004-04-30 Thread Nader Henein

Could you share you're indexing code, and just to make sure id there
anything running on your machine that could delete these files, like an a
cron job that'll back up the index.

You could go by process of elimination and shut down your server and see if
the files disappear, coz if the problem is contained within the server you
know that you can safely go on the DEBUG rampage.

Nader 

-Original Message-
From: Kelvin Tan [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 30, 2004 9:15 AM
To: Lucene Users List
Subject: Re: Disappearing segments

An update:

Daniel Naber suggested using IndexWriter.setUseCompoundFile() to see if it
happens with the compound index format. Before I had a chance to try it out,
this happened: 

java.io.FileNotFoundException: C:\index\segments (The system cannot find the
file specified)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:200)
at
org.apache.lucene.store.FSInputStream$Descriptor.(FSDirectory.j
ava:321)
at
org.apache.lucene.store.FSInputStream.(FSDirectory.java:329)
at
org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:71)
at
org.apache.lucene.index.IndexWriter$1.doBody(IndexWriter.java:154)
at org.apache.lucene.store.Lock$With.run(Lock.java:116)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:149)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:131)

so even the segments file somehow got deleted. Hoping someone can shed some
light on this...

Kelvin

On Thu, 29 Apr 2004 11:45:36 +0800, Kelvin Tan said:
> Errr, sorry for the cross-post to lucene-dev as well, but I realized 
> this mail really belongs on lucene-user...
> 
> I've been experiencing intermittent disappearing segments which result 
> in the following stacktrace:
> 
> Caused by: java.io.FileNotFoundException: C:\index\_1ae.fnm (The 
> system cannot find the file specified) at 
> java.io.RandomAccessFile.open(Native Method) at 
> java.io.RandomAccessFile.(RandomAccessFile.java:200)
> at
> org.apache.lucene.store.FSInputStream$Descriptor.(FSDirectory.ja
> va:321) at 
> org.apache.lucene.store.FSInputStream.(FSDirectory.java:329)
> at org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268)
> at org.apache.lucene.index.FieldInfos.(FieldInfos.java:78)
> at 
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:104)
> at org.apache.lucene.index.SegmentReader.(SegmentReader.java:95)
> at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:112)
> at org.apache.lucene.store.Lock$With.run(Lock.java:116)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:103)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:91)
> at 
> org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:75)
> 
> The segment that disappears (_1ae.fnm) varies.
> 
> I can't seem to reproduce this error consistently, so don't have a 
> clue what might cause it, but it usually happens after the application 
> has been running for some time. Has anyone experienced something 
> similar, or can anyone point
me
> in the right direction?
> 
> When this occurs, I need to rebuild the entire index for it to be 
> usable. Very troubling indeed...
> 
> Kelvin
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: read only file system

2004-04-30 Thread Nader Henein

I hate  to speak after Otis, but the way we deal with this is by clearing
locks on server restart, in case a server crash occurs mid indexing and we
also optimize on server restart, it doesn't happen often (God bless Resin)
but when it has we faced no problems from Lucene.

Just fir the record we have a validate function that the LuceneInit calls it
looks something like this:

try {
Directory directory =
FSDirectory.getDirectory(indexPath,false);
if ( directory.list().length == 0 ) clear() ;
Lock writeLock = directory.makeLock(writeFileName); 
if (!writeLock.obtain()) {
IndexReader.unlock(directory) ;
} else {
writeLock.release() ;
}
} catch (IOException e) {
logger.error("Index Validate",e) ;
}


Nader 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 30, 2004 4:09 PM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: read only file system

If you have a very recent Lucene, then you can disable locks with command
line parameters.  I believe a page describing various command line
parameters is on Lucene's Wiki.

Otis

--- Supun Edirisinghe <[EMAIL PROTECTED]> wrote:
> I think I'm alittle confused on how and index is put into use on a 
> readonly file system
> 
> I'm using Lucene in my web application. Our indexes are built off our 
> database nightly and copied into our web app servers.
> 
> I think our web app dies from time to time and sometimes a lock is 
> left behind from Lucene in /tmp/.
> 
> I have read that there is a disableLuceneLocks System Property(is that 
> the full name or is it something like 
> org.apache.jakarta...disableLuceneLocks?). But, I'm still not sure how 
> I can set that. Do I give it as commandline arg to the java VM?
> 
> thanks
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Multi-Threading

2003-08-19 Thread Nader Henein

Why do you have concurency problems? are you trying to
have each user initiate the indexing himself? because
that will create issues, how about you put all the new
files you want to index in a directory and then have a
schedule procedure on the webserver run the lucene
indexer on that directory, our application hasn't had
any concurrency problems at all, because we index based
on a pull system, rather than the user puching
documents to the indexer.

I hope I understood your problem correctly, so that the
answer is useful

Nader

On Tue, 19 Aug 2003 12:55:09 +0200, Damien Lust wrote:

> 
> Hello,
> 
> I developed an Client-Server application on the web,
> with a search  
> module using Lucene. In the same application, the
users
> can index new  
> text.
> 
> So, multiple sessions can acces to the Index and
> concurrences problems  
> can be possible.
> 
> I used Threads in Java. Is it the best solutions?
> 
> I call :
> 
> IndexFiles indexFiles = new IndexFiles();
> indexFiles.run();
> 
> Here you are an extract of my code.
> 
> Thanks.
> 
> public class IndexFiles extends Thread{
>  public IndexFiles(){
>  }
> 
>  public void run(){
>   
>
SynchronizedIndexWriter.insertDocument(currentIndexDocument(),"tmp/ 
> IndexPath",new MainAnalyser());
>  }
> 
> }
> 
> 
> 
> public class SynchronizedIndexWriter {
> 
>  static synchronized  void
> insertDocument(IndexDocument  
> document,String indexLocValue,Analyzer analyzerValue){
>  File f=new File(indexLocValue);
>  if (f.exists())  
>
addDocumentToIndex(document,indexLocValue,analyzerValue,false);
>  else  
>
addDocumentToIndex(document,indexLocValue,analyzerValue,true);
>  }
> 
> 
>  static  synchronized void
> addDocumentToIndex(IndexDocument  
> document,String indexLocValue,Analyzer
> analyzerValue,boolean  
> createNewIndex){
>  try{
>  IndexWriter indexWriter = new  
>
IndexWriter(indexLocValue,analyzerValue,createNewIndex);
> 
> indexWriter.addDocument(document.getDocument());
>  indexWriter.optimize();
>  indexWriter.close();
>  }
>  catch(IOException io){
>   // If IndexWrite don't know write on index
because
> it's locked,  
> recall of the function
>=> It's not very safe
>   
>
addDocumentToIndex(document,indexLocValue,analyzerValue,createNewIndex);
>  }
>  catch(Exception e){
> 
>  }
> 
>  }
> }

The information contained above is proprietary to BAYT.COM
and confidential.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

38 matches

Mail list logo