Mark,

Thanks for sharing your valuable exp. and thoughts.
Frankly our system already has most of the functionalities LuceneIndexAcessor offers. The only thing I am looking for is to sync the searchers' close. That's why I am little worried about the way accessor handles the searcher sync.
I will probably give it a try to see how it performs in our system.

Thanks!

Jay

Mark Miller wrote:
The method is synched, but this is because each thread *does* share the same Searcher. To maintain a cache of searchers across multiple threads, you've got to sync -- to reference count, you've got to sync. The performance hit of LuceneIndexAcessor is pretty minimal for its functionality, and frankly, for the functionality you want, you have to pay a cost. Thats not even the end of it really...your going to need to maintain a cache of Accessor objects for each index as well...and if you dont know all the indexes at startup time, access to this will also need to be synched. I wouldn't worry though -- searches are still lightening fast...that won't be the bottleneck. I'll work on getting you some code, but if your worried, try some benchmarking on the original code.

Also, to be clear, I don't have the code in front of me, but getting a Searcher does not require waiting for a Writer to be released. Searchers are cached and resused (and instantly available) until a Writer is released. When this happens, the release Writer method waits for all the Searchers to return (happens pretty quick as searches are pretty quick), the Searcher cache is cleared, and then subsequent calls to getSearcher create new Searchers that can see what the Writer added.

The key is use your Writer/Searcher/Reader quickly and then release it (unless your bulk loading). I've had such a system with 5+ million docs on a standard machine and searches where still well below a second after the first Searcher is cached (and even the first search is darn quick). And that includes a lot of extra crap I am doing.

- Mark

Jay Yu wrote:
Mark,

After reading the implementation of LuceneIndexAccessor.getSearcher(),
I realized that the method is synchronized and wait for writingDirector to be released. That means if we getSearcher for each query in each thread, there might be a contention and performance hit. In fact, even the method of release(searcher) is costly. On the other hand, if multiple threads share share one searcher then it'd defeat the
purpose of using LuceneIndexAccessor.
Do I miss sth here? What's your suggested use case for LuceneIndexAccessor?

Thanks!

Jay
Mark Miller wrote:
Ill respond a point at a time:

1.

****************************** Hi Maik,

So what happens in this case:

IndexAccessProvider accessProvider = new IndexAccessProvider(directory,

analyzer);

LuceneIndexAccessor accessor = new LuceneIndexAccessor(accessProvider);

accessor.open();

IndexWriter writer = accessor.getWriter();

// reference to the same instance?

IndexWriter writer2 = accessor.getWriter();

writer.addDocument(....);

writer2.addDocument(....);



// I didn't release the writer yet

// will this block?

IndexReader reader = accessor.getReader();

reader.delete(....);

************

This is not really an issue. First, if you are going to delete with a Reader you need to call getWritingReader and not getReader. When you do that, the getWritingReader call will block until writer and writer2 are released. If you are just adding a couple docs before releasing the writers, this is no
problem because the block will be very short. If you are loading tons of
docs and you want to be able to delete with a Reader in a timely manner, you should release the writers every now and then (release and re-get the Writer
every 100 docs or something). An interactive index should not hog the
Writer, while something that is just loading a lot could hog the Writer.
This is no different than normal…you cannot delete with a Reader while
adding with a Writer with Lucene. This code just enforces those semantics.
The best solution is to just use a Writer to delete – I never get a
ReadingWriter.

2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3

This is no big deal either. I just added another getWriter call that takes a
create Boolean.

3. I don't think there is a latest release. This has never gotten much
official attention and is not in the sandbox. I worked straight from the
originally submitted code.

4. I will look into getting together some code that I can share. The
multisearcher changes that are need are a couple of one liners really, so at
a minimum I will give you the changes needed.



-       Mark



On 9/19/07, Jay Yu <[EMAIL PROTECTED]> wrote:

Mark,



thanks for sharing your insight and experience about LuceneIndexAccessor!

I remember seeing some people reporting some issues about it, such as:

http://www.archivum.info/[EMAIL PROTECTED]/2005-05/msg00114.html

http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3



Have those issues been resolved?



Where did you get the latest release? It is not in the official Lucene

sandbox/contrib.



Finally, are you willing to share your extended version to include your

tweak relating to the MultiSearcher?



Thanks a lot!



Jay



Mark Miller wrote:

I use option 3 extensivley and find it very effective. There is a tweak or

two required to get it to work right with MultiSearchers, but other than

that, the code is great. I have built a lot on top of it. I'm on the list

all the time and would be happy to answer any questions you have in
regards

to LuceneIndexAccessor. Frankly, I think its overlooked far too much.


- Mark



On 9/19/07, Jay Yu <[EMAIL PROTECTED]> wrote:


In a multithread app like web app, a shared IndexSearcher could throw a

AlreadyClosedException when another thread is trying to update the

underlying IndexReader by closing the shared searcher after the index is

updated. Searching over the past discussions on this mailing list, I

found several approaches to solve the problem.

1. use solr

2. use DelayCloseIndexSearcher

3. use LuceneIndexAccessor



the first one is not feasible for us; some people seemed to have

problems with No. 2 and I do not find a lot of discussions around No.3.


I wonder if anyone has good experience on No 2 and 3?

Or do I miss other better solutions?


Thanks for any suggestion/comment!


Jay


---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]

For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]

For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to