subject:"Re\: DefaultIndexAccessor"

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar

Mark,

  We deployed our indexer (using defaultIndexAccessor) on one of the
production site and getting this error,

Caused by: java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at 
org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:514)


This is happening repeatedly every time the indexer runs.

This is running your latest IndexAccessor-021508 code.  Any ideas
(it's kind of urgent for us)?

Thanks,
-vivek


On Fri, Feb 15, 2008 at 6:50 PM, vivek sar [EMAIL PROTECTED] wrote:
 Mark,

  Thanks for the quick fix. Actually, it is possible that there might
  had been simultaneous queries using the MultiSearcher. I assumed it
  was thread-safe, thus was re-using the same instance. I'll update my
  application code as well.

  Thanks,
  -vivek



  On Feb 15, 2008 5:56 PM, Mark Miller [EMAIL PROTECTED] wrote:
   Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026
  
  
   vivek sar wrote:
Mark,
   
   There seems to be some issue with DefaultMultiIndexAccessor.java. I
got following NPE exception,
   
 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] 
 ReportServiceImpl -
java.lang.NullPointerException
at 
 org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89)
   
Looks like the IndexAccessor for one of the Searcher in the
MultiSearcher returned null. Not sure how is that possible, any ideas
how is that possible?
   
In my case it caused a critical error as the writer thread was stuck
forever (we found out after couple of days) because of this,
   
PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait()
[0x47533000..0x47533b80]
at java.lang.Object.wait(Native Method)
- waiting on 0x2aab3e5c7700 (a
org.apache.lucene.indexaccessor.DefaultIndexAccessor)
at java.lang.Object.wait(Unknown Source)
at 
 org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593)
at 
 org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510)
- locked 0x2aab3e5c7700 (a
org.apache.lucene.indexaccessor.DefaultIndexAccessor)
   
The only way to recover was to re-start the application.
   
I use both MultiSearcher and IndexSearcher in my application, I've
looked at your code but not able to pinpoint how can it go wrong? Of
course, you do have to check for null in the
MultiIndexAccessor.release, but how could you get null index accessor
at first place?
   
I do call IndexAccessor.close during partitioning of indexes, but the
close should wait for all Searchers to close before doing anything.
   
Do you have any updates to your code since 02/04/2008?
   
Thanks,
-vivek
   
On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote:
   
Thanks for your clarifications, Mark!
   
   
Jay
   
   
Mark Miller wrote:
   
5. Although currently IndexSearcher.close() does almost nothing except
to close the internal index reader, it might be a safer to close
searcher itself as well in closeCachedSearcher(), just in case, the
searcher may have other resources to release in the future version of
Lucene.
   
Didn't catch that as well. You are right, great idea Jay, thanks.
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   
   
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   
   
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar

Mark,

 Some more information,

  1) I run indexwriter every 5 mins
  2) After every cycle I check if I need to partition (based on
the index size)
  3) In the partition interface,
a)  I first call close on the index accessor (so all the
searchers can close before I move that index)
  accessor =
IndexAccessorFactory.getInstance().getAccessor(dir.getFile());
  accessor.close();
b) Then I re-open the index accessor,
   accessor = indexFactory.getAccessor(dir.getFile());
   accessor.open();
c) I optimized the my indexes using the Index Writer (that
I get from the accessor).
   masterWriter = this.indexAccessor.getWriter(false);
   masterWriter.optimize(optimizeSegment);
d) Once the optimization is done I release the masterWriter,
this.indexAccessor.release(masterWriter);

 Now here is where I get the RejectedExecutionException.
Reading up little more on this exception,
http://pveentjer.wordpress.com/2008/02/06/are-you-dealing-with-the-rejectedexecutionexception/,
I see this might be happening because something got stuck during the
close cycle, so the ExecutorSerivce is not accepting any new tasks.
I'm not sure how would this happen.

The critical problem is once I get this exception, every release call
throws the same exception (looks like shutdown never gets done).
Because of this my readers are never refreshed and I can not read any
new indexes.

May be I've to check whether the accessor is completely closed before
re-opening?  Could you in your release check whether the pool
(ExecutorService) is in shutdown state? Any thing else I can check?

Thanks,
-vivek

On Thu, Feb 28, 2008 at 1:26 PM, vivek sar [EMAIL PROTECTED] wrote:
 Mark,

   We deployed our indexer (using defaultIndexAccessor) on one of the
  production site and getting this error,

  Caused by: java.util.concurrent.RejectedExecutionException
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown
  Source)
 at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
 at 
 org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:514)


  This is happening repeatedly every time the indexer runs.

  This is running your latest IndexAccessor-021508 code.  Any ideas
  (it's kind of urgent for us)?

  Thanks,
  -vivek




  On Fri, Feb 15, 2008 at 6:50 PM, vivek sar [EMAIL PROTECTED] wrote:
   Mark,
  
Thanks for the quick fix. Actually, it is possible that there might
had been simultaneous queries using the MultiSearcher. I assumed it
was thread-safe, thus was re-using the same instance. I'll update my
application code as well.
  
Thanks,
-vivek
  
  
  
On Feb 15, 2008 5:56 PM, Mark Miller [EMAIL PROTECTED] wrote:
 Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026


 vivek sar wrote:
  Mark,
 
 There seems to be some issue with DefaultMultiIndexAccessor.java. I
  got following NPE exception,
 
   2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] 
 ReportServiceImpl -
  java.lang.NullPointerException
  at 
 org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89)
 
  Looks like the IndexAccessor for one of the Searcher in the
  MultiSearcher returned null. Not sure how is that possible, any ideas
  how is that possible?
 
  In my case it caused a critical error as the writer thread was stuck
  forever (we found out after couple of days) because of this,
 
  PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait()
  [0x47533000..0x47533b80]
  at java.lang.Object.wait(Native Method)
  - waiting on 0x2aab3e5c7700 (a
  org.apache.lucene.indexaccessor.DefaultIndexAccessor)
  at java.lang.Object.wait(Unknown Source)
  at 
 org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593)
  at 
 org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510)
  - locked 0x2aab3e5c7700 (a
  org.apache.lucene.indexaccessor.DefaultIndexAccessor)
 
  The only way to recover was to re-start the application.
 
  I use both MultiSearcher and IndexSearcher in my application, I've
  looked at your code but not able to pinpoint how can it go wrong? Of
  course, you do have to check for null in the
  MultiIndexAccessor.release, but how could you get null index accessor
  at first place?
 
  I do call IndexAccessor.close during partitioning of indexes,

Re: DefaultIndexAccessor

2008-02-28 Thread Mark Miller


Hey vivek,

Sorry you ran into this. I believe the problem is that I had just not 
foreseen the use case of closing and then reopening the Accessor. The 
only time I ever close the Accessors is when I am shutting down the JVM.


What do you do about all of the IndexAccessor requests while it is in a 
closed state? Could their be a better way of accomplishing this without 
closing the Accessor? Would a new method that just stalled everything be 
better? Then you wouldn't have to recreate any resources possibly?


In any case, the problem is that after the Executor gets shutdown it is 
not reopened in the open method. I can certainly change this, but I need 
to look for any other issues as well. I will add an open after a 
shutdown test to investigate. I am going to think about the issue 
further and I will get back to you soon.


Thanks for all of the details.

- Mark

vivek sar wrote:

Mark,

 Some more information,

  1) I run indexwriter every 5 mins
  2) After every cycle I check if I need to partition (based on
the index size)
  3) In the partition interface,
a)  I first call close on the index accessor (so all the
searchers can close before I move that index)
  accessor =
IndexAccessorFactory.getInstance().getAccessor(dir.getFile());
  accessor.close();
b) Then I re-open the index accessor,
   accessor = indexFactory.getAccessor(dir.getFile());
   accessor.open();
c) I optimized the my indexes using the Index Writer (that
I get from the accessor).
   masterWriter = this.indexAccessor.getWriter(false);
   masterWriter.optimize(optimizeSegment);
d) Once the optimization is done I release the masterWriter,
this.indexAccessor.release(masterWriter);

 Now here is where I get the RejectedExecutionException.
Reading up little more on this exception,
http://pveentjer.wordpress.com/2008/02/06/are-you-dealing-with-the-rejectedexecutionexception/,
I see this might be happening because something got stuck during the
close cycle, so the ExecutorSerivce is not accepting any new tasks.
I'm not sure how would this happen.

The critical problem is once I get this exception, every release call
throws the same exception (looks like shutdown never gets done).
Because of this my readers are never refreshed and I can not read any
new indexes.

May be I've to check whether the accessor is completely closed before
re-opening?  Could you in your release check whether the pool
(ExecutorService) is in shutdown state? Any thing else I can check?

Thanks,
-vivek

On Thu, Feb 28, 2008 at 1:26 PM, vivek sar [EMAIL PROTECTED] wrote:
  

Mark,

  We deployed our indexer (using defaultIndexAccessor) on one of the
 production site and getting this error,

 Caused by: java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at 
org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:514)


 This is happening repeatedly every time the indexer runs.

 This is running your latest IndexAccessor-021508 code.  Any ideas
 (it's kind of urgent for us)?

 Thanks,
 -vivek




 On Fri, Feb 15, 2008 at 6:50 PM, vivek sar [EMAIL PROTECTED] wrote:
  Mark,
 
   Thanks for the quick fix. Actually, it is possible that there might
   had been simultaneous queries using the MultiSearcher. I assumed it
   was thread-safe, thus was re-using the same instance. I'll update my
   application code as well.
 
   Thanks,
   -vivek
 
 
 
   On Feb 15, 2008 5:56 PM, Mark Miller [EMAIL PROTECTED] wrote:
Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026
   
   
vivek sar wrote:
 Mark,

There seems to be some issue with DefaultMultiIndexAccessor.java. I
 got following NPE exception,

  2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] 
ReportServiceImpl -
 java.lang.NullPointerException
 at 
org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89)

 Looks like the IndexAccessor for one of the Searcher in the
 MultiSearcher returned null. Not sure how is that possible, any ideas
 how is that possible?

 In my case it caused a critical error as the writer thread was stuck
 forever (we found out after couple of days) because of this,

 PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait()
 [0x47533000..0x47533b80]
 at java.lang.Object.wait(Native Method)
 - waiting on 0x2aab3e5c7700 (a
 org.apache.lucene.indexaccessor.DefaultIndexAccessor)

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar

Mark,

Just for my clarification,

1) Would you have indexStop and indexStart methods? If that's the case
then I don't have to call close() at all. These new methods would
serve as just cleaning up the caches and not closing the thread pool.

I would prefer not to call close() and init() again if possible.

The reason we have to do partition is because our index size grows
over 50G a week and then optimization takes hours. I'd a thread going
on this topic in the mailing list,
http://www.gossamer-threads.com/lists/lucene/java-user/57366?search_string=partition;#57366.

Thanks,
-vivek

On Thu, Feb 28, 2008 at 5:01 PM, Mark Miller [EMAIL PROTECTED] wrote:
I added the Thread Pool recently, so things did probably work before
that. I am certainly willing to put the Thread Pool init in the open
call instead of the constructor.

As for the best method to use, I was thinking of something along the
same lines as what you suggest.

One of the decisions will be how to handle shutting down method calls on
the Accessor. Throw an Exception or block?

In any case, I will put up code that makes the above change and your
code should work as it did. I'll be sure to add this to the test cases.

Just as a personal interest question, what has led you to setup your
index this way? Adding partitions as it grows that is.

- Mark

vivek sar wrote:
Mark,

Yes, I think that's what precisely is happening. I call
accessor.close, which shuts down all the ExecutorService. I was
assuming the accessor.open would re-open it (I think that's how it
worked in older version of your IndexAccessor).

Basically, I need a way to stop (or close) all the IndexSearchers for
a specific IndexAccessor and do not allow them to re-open until I flag
the indexAccessor that it's safe to give out new index searchers. So I
am able to optimize the index, rename it and move it to somewhere else
during partitioning. Right now without closing the searchers I can not
rename the index as it wouldn't allow me to if some other thread has a
file handle to that index.

I don't know if there is a way to get an exclusive writer thread to an
index using IndexAccessor. I would think a better way for me would be
to,

1) Call a method on IndexAccessor, let's say stopIndex() - This
would clear all the caches (stop all the open searchers, readers and
writers) and flag the index accessor so no other reader or writer
thread can be taken from this index accessor
2) I use my own (not using IndexAccessor) IndexWriter to do
optimization on the index that needs to be partitioned and release it
3) Once done with partition, I call another method on
IndexAccessor, let's say startIndex() - This will simply flag so
now the IndexAccessor would allow to get searchers, readers and
writers. The start would have to reopen all the searchers and readers.

Not sure if this is a good design for what I am trying to do. This
would require two new methods on IndexAccessor - stopIndex() and
startIndex(). Any thoughts?

Thanks,
-vivek

On Thu, Feb 28, 2008 at 3:55 PM, Mark Miller [EMAIL PROTECTED] wrote:

Hey vivek,

Sorry you ran into this. I believe the problem is that I had just not
foreseen the use case of closing and then reopening the Accessor. The
only time I ever close the Accessors is when I am shutting down the JVM.

What do you do about all of the IndexAccessor requests while it is in a
closed state? Could their be a better way of accomplishing this without
closing the Accessor? Would a new method that just stalled everything be
better? Then you wouldn't have to recreate any resources possibly?

In any case, the problem is that after the Executor gets shutdown it is
not reopened in the open method. I can certainly change this, but I need
to look for any other issues as well. I will add an open after a
shutdown test to investigate. I am going to think about the issue
further and I will get back to you soon.

Thanks for all of the details.

- Mark

vivek sar wrote:
Mark,

Some more information,

1) I run indexwriter every 5 mins
2) After every cycle I check if I need to partition (based on
the index size)
3) In the partition interface,
a) I first call close on the index accessor (so all the
searchers can close before I move that index)
accessor =
IndexAccessorFactory.getInstance().getAccessor(dir.getFile());
accessor.close();
b) Then I re-open the index accessor,
accessor =
indexFactory.getAccessor(dir.getFile());
accessor.open();
c) I optimized the my indexes using the Index Writer (that
I get from the

Re: DefaultIndexAccessor

2008-02-28 Thread Mark Miller

vivek sar wrote:

Mark,

Just for my clarification,

Yes. This is the approach I agree with. I am putting up new code that
allows the close(), open() calls anyway though. There is nothing keeping
it from working and it used to work so its a good idea to make it work
again. It is also a quick fix for you.

https://issues.apache.org/jira/browse/LUCENE-1026

I will be adding the new stop start calls quickly, but I don't want to
rush it out.

I would prefer not to call close() and init() again if possible.

Gotchya. A comment I have on that is that you might try keeping the
mergefactor really low as well. This will keep searches faster, make
optimization much faster (its amortized), and not slow down writes that
much in my experience (since IndexAccessor drops the writes off (spawns
new thread) anyway, slightly longer writes shouldnt be a big deal at
all. I'd try out even as low as 2 or 3. I run some fairly large
interactive indexes and the writes, even when blocking until the write
is done, are pretty darn responsive.

- Mark

Thanks,
-vivek

On Thu, Feb 28, 2008 at 5:01 PM, Mark Miller [EMAIL PROTECTED] wrote:

I added the Thread Pool recently, so things did probably work before
that. I am certainly willing to put the Thread Pool init in the open
call instead of the constructor.

As for the best method to use, I was thinking of something along the
same lines as what you suggest.

One of the decisions will be how to handle shutting down method calls on
the Accessor. Throw an Exception or block?

In any case, I will put up code that makes the above change and your
code should work as it did. I'll be sure to add this to the test cases.

Just as a personal interest question, what has led you to setup your
index this way? Adding partitions as it grows that is.

- Mark

vivek sar wrote:
Mark,

I don't know if there is a way to get an exclusive writer thread to an
index using IndexAccessor. I would think a better way for me would be
to,

Not sure if this is a good design for what I am trying to do. This
would require two new methods on IndexAccessor - stopIndex() and
startIndex(). Any thoughts?

Thanks,
-vivek

On Thu, Feb 28, 2008 at 3:55 PM, Mark Miller [EMAIL PROTECTED] wrote:

Hey vivek,

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar

Thanks Mark. I'll wait for your enhancements in IndexAccessor on the
new methods.

I use mergeFactor = 100. I've read about the merge factor and it's
hard to balance both the read/write optimization. What's the number do
you use?

Thanks again.
-vivek

On Thu, Feb 28, 2008 at 7:14 PM, Mark Miller [EMAIL PROTECTED] wrote:

vivek sar wrote:
Mark,

Just for my clarification,

https://issues.apache.org/jira/browse/LUCENE-1026

I will be adding the new stop start calls quickly, but I don't want to
rush it out.

I would prefer not to call close() and init() again if possible.

The reason we have to do partition is because our index size grows
over 50G a week and then optimization takes hours. I'd a thread going
on this topic in the mailing list,

http://www.gossamer-threads.com/lists/lucene/java-user/57366?search_string=partition;#57366.

- Mark

Thanks,
-vivek

On Thu, Feb 28, 2008 at 5:01 PM, Mark Miller [EMAIL PROTECTED] wrote:

I added the Thread Pool recently, so things did probably work before
that. I am certainly willing to put the Thread Pool init in the open
call instead of the constructor.

As for the best method to use, I was thinking of something along the
same lines as what you suggest.

One of the decisions will be how to handle shutting down method calls on
the Accessor. Throw an Exception or block?

In any case, I will put up code that makes the above change and your
code should work as it did. I'll be sure to add this to the test cases.

Just as a personal interest question, what has led you to setup your
index this way? Adding partitions as it grows that is.

- Mark

vivek sar wrote:
Mark,

I don't know if there is a way to get an exclusive writer thread to an
index using IndexAccessor. I would think a better way for me would be
to,

Not sure if this is a good design for what I am trying to do. This
would require two new methods on IndexAccessor - stopIndex() and
startIndex(). Any thoughts?

Thanks,
-vivek

On Thu, Feb 28, 2008 at 3:55 PM, Mark Miller [EMAIL PROTECTED] wrote:

Hey vivek,

What do you do about

Re: DefaultIndexAccessor

2008-02-15 Thread vivek sar

Mark,

   There seems to be some issue with DefaultMultiIndexAccessor.java. I
got following NPE exception,

 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl -
java.lang.NullPointerException
at 
org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89)

Looks like the IndexAccessor for one of the Searcher in the
MultiSearcher returned null. Not sure how is that possible, any ideas
how is that possible?

In my case it caused a critical error as the writer thread was stuck
forever (we found out after couple of days) because of this,

PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait()
[0x47533000..0x47533b80]
at java.lang.Object.wait(Native Method)
- waiting on 0x2aab3e5c7700 (a
org.apache.lucene.indexaccessor.DefaultIndexAccessor)
at java.lang.Object.wait(Unknown Source)
at 
org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593)
at 
org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510)
- locked 0x2aab3e5c7700 (a
org.apache.lucene.indexaccessor.DefaultIndexAccessor)

The only way to recover was to re-start the application.

I use both MultiSearcher and IndexSearcher in my application, I've
looked at your code but not able to pinpoint how can it go wrong? Of
course, you do have to check for null in the
MultiIndexAccessor.release, but how could you get null index accessor
at first place?

I do call IndexAccessor.close during partitioning of indexes, but the
close should wait for all Searchers to close before doing anything.

Do you have any updates to your code since 02/04/2008?

Thanks,
-vivek

On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote:
 Thanks for your clarifications, Mark!


 Jay


 Mark Miller wrote:
 
 
  5. Although currently IndexSearcher.close() does almost nothing except
  to close the internal index reader, it might be a safer to close
  searcher itself as well in closeCachedSearcher(), just in case, the
  searcher may have other resources to release in the future version of
  Lucene.
  Didn't catch that as well. You are right, great idea Jay, thanks.
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-15 Thread Mark Miller

Hey vivek, sorry to hear you are having problems.

I am trying to figure out how you may be seeing this problem. The
IndexAccessor cannot return null because you would get an
IllegalStateException not a NullPointerException. Also, the released
MultiSearcher cannot be null because the Exception would have been
thrown sooner. Releasing a null Searcher throws no Exception. So a
possibility is that you are returning a foreign MultiSearcher?

Unlikely, but I don't see anything else at the moment.

The MultiSearcher code is really pretty simple and actually recreates a
MultiSearcher on every request...it did not appear to be worth it to
coordinate closed sub Accessors with a cache for the MultiSearcher (I
wrote the code at one point, and later got rid of it). So really the
MultiSearcher is just a simple class that gets cached sub Searchers for
each index and creates a one time use MultiSearcher. A simple cache is
kept around that identifies which Accessor needs to release which sub
Searcher. It's all rather simple, and I am struggling to see another
possibility beyond returning a foreign MultiSearcher somehow.

I will keep looking and keep you posted. In the mean time, do you have
any other data or code snippets to share?

vivek sar wrote:

Mark,

There seems to be some issue with DefaultMultiIndexAccessor.java. I
got following NPE exception,

2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl -
java.lang.NullPointerException
at
org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89)

Looks like the IndexAccessor for one of the Searcher in the
MultiSearcher returned null. Not sure how is that possible, any ideas
how is that possible?

In my case it caused a critical error as the writer thread was stuck
forever (we found out after couple of days) because of this,

PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait()
[0x47533000..0x47533b80]
at java.lang.Object.wait(Native Method)
- waiting on 0x2aab3e5c7700 (a
org.apache.lucene.indexaccessor.DefaultIndexAccessor)
at java.lang.Object.wait(Unknown Source)
at
org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593)
at
org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510)
- locked 0x2aab3e5c7700 (a
org.apache.lucene.indexaccessor.DefaultIndexAccessor)

The only way to recover was to re-start the application.

I use both MultiSearcher and IndexSearcher in my application, I've
looked at your code but not able to pinpoint how can it go wrong? Of
course, you do have to check for null in the
MultiIndexAccessor.release, but how could you get null index accessor
at first place?

I do call IndexAccessor.close during partitioning of indexes, but the
close should wait for all Searchers to close before doing anything.

Do you have any updates to your code since 02/04/2008?

Thanks,
-vivek

On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote:

Thanks for your clarifications, Mark!

Jay

Mark Miller wrote:

5. Although currently IndexSearcher.close() does almost nothing except
to close the internal index reader, it might be a safer to close
searcher itself as well in closeCachedSearcher(), just in case, the
searcher may have other resources to release in the future version of
Lucene.

Didn't catch that as well. You are right, great idea Jay, thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-15 Thread Mark Miller

Okay, sorry about this one vivek. Added to the unit tests to expose
this. When I took out the MultiSearcher caching, I kept the concept of
sharing a single MultiIndexAccessor. Unfortunately, this meant that
multiple threads were sharing the same Searcher to Accessor Map that was
used to track which Accessor needs to release which Searcher. Because of
this, a thread might come along and pull a Searcher out of that Map
right before another Thread tries to release that same cached Searcher
instance. The result is that it is not there, and hence the
NullPointerException.

Nice to have this added to the Unit Tests. The fix is to create a new
MultiIndexAccessor on every request, and recommend you get one for each
Thread as the class is now not thread safe. Construction of a
MultiIndexAccessor is pretty much nothing in terms of time. This way
each thread has its own Map of Searcher to Accessors.

A good tip is to make a simple page that simply prints out the
Searcher/Writer use counts. You can then check this page occasionally
and see if it appears there are Writers or Searchers stuck out you know
you have a problem. Under normal circumstances it should not be possible
and indicates a bug somewhere.

I will the fix shortly.

- Mark

vivek sar wrote:

Mark,

There seems to be some issue with DefaultMultiIndexAccessor.java. I
got following NPE exception,

Looks like the IndexAccessor for one of the Searcher in the
MultiSearcher returned null. Not sure how is that possible, any ideas
how is that possible?

In my case it caused a critical error as the writer thread was stuck
forever (we found out after couple of days) because of this,

The only way to recover was to re-start the application.

I do call IndexAccessor.close during partitioning of indexes, but the
close should wait for all Searchers to close before doing anything.

Do you have any updates to your code since 02/04/2008?

Thanks,
-vivek

On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote:

Thanks for your clarifications, Mark!

Jay

Mark Miller wrote:

Didn't catch that as well. You are right, great idea Jay, thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-15 Thread vivek sar

Mark,

  Here is the scenario when I saw this exception,

1) A search was run which uses MultiSearcher. This search took more
than 3 mins to complete (due to index size and multiple indices)
2) Just a minute after the search was started, we started writing (in
a separate thread) to one of the index which was searched on
3) Writer finished in few seconds and called writer.release
4) Since, the multi-searcher was still running the writer.release
waits for all multisearcher to complete
5) The MultiSearcher finally completes and calls
MultiIndexAccessor.release for the multisearcher
6) At this point for some reason the multisearcher throws NPE. I'm not
sure whether the NPE was on the index that was being updated by writer
or some other index
7) Because of NPE the index that the writer was writing to never gets
released and Writer gets stuck

I haven't been able to reproduce this again.

Is IndexAccessor-02.07.2008.zip
(https://issues.apache.org/jira/browse/LUCENE-1026) most up to date
code you got? You mentioned the new jar should,

Releasing a Writer never blocks for a reopen now - so after adding a
doc it may be a second or two before its visible to new Searchers

 Do you think this would help the case I ran into where the writer was
stuck because of the searcher release?

 I think you may still want to check for null in the
MultiIndexAccessor.release(), i.e.,

public synchronized void release(Searcher multiSearcher) {
Searchable[] searchers = ((MultiSearcher) multiSearcher).getSearchables();
IndexAccessor accessor = null;
for (Searchable searchable : searchers) {
  if(searchable != null){
 accessor = multiSearcherAccessors.remove(searchable);
 if(accessor != null){
   accessor.release((Searcher) searchable);
 }
  }
}
}

Thanks,
-vivek

On Feb 15, 2008 5:02 PM, Mark Miller [EMAIL PROTECTED] wrote:
 Hey vivek, sorry to hear you are having problems.

 I am trying to figure out how you may be seeing this problem. The
 IndexAccessor cannot return null because you would get an
 IllegalStateException not a NullPointerException. Also, the released
 MultiSearcher cannot be null because the Exception would have been
 thrown sooner. Releasing a null Searcher throws no Exception. So a
 possibility is that you are returning a foreign MultiSearcher?

 Unlikely, but I don't see anything else at the moment.

 The MultiSearcher code is really pretty simple and actually recreates a
 MultiSearcher on every request...it did not appear to be worth it to
 coordinate closed sub Accessors with a cache for the MultiSearcher (I
 wrote the code at one point, and later got rid of it). So really the
 MultiSearcher is just a simple class that gets cached sub Searchers for
 each index and creates a one time use MultiSearcher. A simple cache is
 kept around that identifies which Accessor needs to release which sub
 Searcher. It's all rather simple, and I am struggling to see another
 possibility beyond returning a foreign MultiSearcher somehow.

 I will keep looking and keep you posted. In the mean time, do you have
 any other data or code snippets to share?


 vivek sar wrote:
  Mark,
 
 There seems to be some issue with DefaultMultiIndexAccessor.java. I
  got following NPE exception,
 
   2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl 
  -
  java.lang.NullPointerException
  at 
  org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89)
 
  Looks like the IndexAccessor for one of the Searcher in the
  MultiSearcher returned null. Not sure how is that possible, any ideas
  how is that possible?
 
  In my case it caused a critical error as the writer thread was stuck
  forever (we found out after couple of days) because of this,
 
  PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait()
  [0x47533000..0x47533b80]
  at java.lang.Object.wait(Native Method)
  - waiting on 0x2aab3e5c7700 (a
  org.apache.lucene.indexaccessor.DefaultIndexAccessor)
  at java.lang.Object.wait(Unknown Source)
  at 
  org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593)
  at 
  org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510)
  - locked 0x2aab3e5c7700 (a
  org.apache.lucene.indexaccessor.DefaultIndexAccessor)
 
  The only way to recover was to re-start the application.
 
  I use both MultiSearcher and IndexSearcher in my application, I've
  looked at your code but not able to pinpoint how can it go wrong? Of
  course, you do have to check for null in the
  MultiIndexAccessor.release, but how could you get null index accessor
  at first place?
 
  I do call IndexAccessor.close during partitioning of indexes, but the
  close should wait for all Searchers to close before doing anything.
 
  Do you have any updates to your code since

Re: DefaultIndexAccessor

2008-02-15 Thread Mark Miller


Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026

vivek sar wrote:

Mark,

   There seems to be some issue with DefaultMultiIndexAccessor.java. I
got following NPE exception,

 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl -
java.lang.NullPointerException
at 
org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89)

Looks like the IndexAccessor for one of the Searcher in the
MultiSearcher returned null. Not sure how is that possible, any ideas
how is that possible?

In my case it caused a critical error as the writer thread was stuck
forever (we found out after couple of days) because of this,

PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait()
[0x47533000..0x47533b80]
at java.lang.Object.wait(Native Method)
- waiting on 0x2aab3e5c7700 (a
org.apache.lucene.indexaccessor.DefaultIndexAccessor)
at java.lang.Object.wait(Unknown Source)
at 
org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593)
at 
org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510)
- locked 0x2aab3e5c7700 (a
org.apache.lucene.indexaccessor.DefaultIndexAccessor)

The only way to recover was to re-start the application.

I use both MultiSearcher and IndexSearcher in my application, I've
looked at your code but not able to pinpoint how can it go wrong? Of
course, you do have to check for null in the
MultiIndexAccessor.release, but how could you get null index accessor
at first place?

I do call IndexAccessor.close during partitioning of indexes, but the
close should wait for all Searchers to close before doing anything.

Do you have any updates to your code since 02/04/2008?

Thanks,
-vivek

On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote:
  

Thanks for your clarifications, Mark!


Jay


Mark Miller wrote:


5. Although currently IndexSearcher.close() does almost nothing except
to close the internal index reader, it might be a safer to close
searcher itself as well in closeCachedSearcher(), just in case, the
searcher may have other resources to release in the future version of
Lucene.


Didn't catch that as well. You are right, great idea Jay, thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-15 Thread vivek sar

Mark,

Thanks for the quick fix. Actually, it is possible that there might
had been simultaneous queries using the MultiSearcher. I assumed it
was thread-safe, thus was re-using the same instance. I'll update my
application code as well.

Thanks,
-vivek

On Feb 15, 2008 5:56 PM, Mark Miller [EMAIL PROTECTED] wrote:
 Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026


 vivek sar wrote:
  Mark,
 
 There seems to be some issue with DefaultMultiIndexAccessor.java. I
  got following NPE exception,
 
   2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl 
  -
  java.lang.NullPointerException
  at 
  org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89)
 
  Looks like the IndexAccessor for one of the Searcher in the
  MultiSearcher returned null. Not sure how is that possible, any ideas
  how is that possible?
 
  In my case it caused a critical error as the writer thread was stuck
  forever (we found out after couple of days) because of this,
 
  PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait()
  [0x47533000..0x47533b80]
  at java.lang.Object.wait(Native Method)
  - waiting on 0x2aab3e5c7700 (a
  org.apache.lucene.indexaccessor.DefaultIndexAccessor)
  at java.lang.Object.wait(Unknown Source)
  at 
  org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593)
  at 
  org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510)
  - locked 0x2aab3e5c7700 (a
  org.apache.lucene.indexaccessor.DefaultIndexAccessor)
 
  The only way to recover was to re-start the application.
 
  I use both MultiSearcher and IndexSearcher in my application, I've
  looked at your code but not able to pinpoint how can it go wrong? Of
  course, you do have to check for null in the
  MultiIndexAccessor.release, but how could you get null index accessor
  at first place?
 
  I do call IndexAccessor.close during partitioning of indexes, but the
  close should wait for all Searchers to close before doing anything.
 
  Do you have any updates to your code since 02/04/2008?
 
  Thanks,
  -vivek
 
  On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote:
 
  Thanks for your clarifications, Mark!
 
 
  Jay
 
 
  Mark Miller wrote:
 
  5. Although currently IndexSearcher.close() does almost nothing except
  to close the internal index reader, it might be a safer to close
  searcher itself as well in closeCachedSearcher(), just in case, the
  searcher may have other resources to release in the future version of
  Lucene.
 
  Didn't catch that as well. You are right, great idea Jay, thanks.
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-06 Thread Mark Miller


Thanks for the feedback jay. One at a time:

Jay wrote:

Great effort for much improved indexaccessor, Mark!
A couple questions and observations:

1. In release(Searcher), you removed a check if the given searcher is 
the cached one from an earlier version. This could potentially  cause 
problems for some people.
This is something that I meant to come back to. The problem is that the 
Searcher you are returning may have already been replaced in the 
Searcher cache...so retired searchers must be checked too. Will consider 
again.
2. The createdSearchers variable is not really used: you just populate 
it and print it out. What's the purpose for it?

This was a debug check that I had been using. Will cleanup.
3. The variable numSearchersForRetirment is  used in 
WarmingIndexAccessor not DefaultIndexAccessor.

Thanks. Will move.
4. I wish that in the next release of Lucene, they will add searcher 
reopen api so that we do not have to wor around it.

Not sure how this would play out, but an interesting thought...
5. Although currently IndexSearcher.close() does almost nothing except 
to close the internal index reader, it might be a safer to close 
searcher itself as well in closeCachedSearcher(), just in case, the 
searcher may have other resources to release in the future version of 
Lucene.
The problem is that because I am supplying the reader, calling 
Searcher.close() won't close it. The Searcher has to be created without 
a supplied Reader for it to be able to close itself. I got the same itch 
though...


Thanks!

Jay
I appreciate the feedback! I'll be working more on this. I think more 
can be done.


- Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-06 Thread Mark Miller



 
5. Although currently IndexSearcher.close() does almost nothing except 
to close the internal index reader, it might be a safer to close 
searcher itself as well in closeCachedSearcher(), just in case, the 
searcher may have other resources to release in the future version of 
Lucene.

Didn't catch that as well. You are right, great idea Jay, thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-06 Thread Jay


Thanks for your clarifications, Mark!


Jay

Mark Miller wrote:


 
5. Although currently IndexSearcher.close() does almost nothing except 
to close the internal index reader, it might be a safer to close 
searcher itself as well in closeCachedSearcher(), just in case, the 
searcher may have other resources to release in the future version of 
Lucene.

Didn't catch that as well. You are right, great idea Jay, thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-05 Thread Jay


Great effort for much improved indexaccessor, Mark!
A couple questions and observations:

1. In release(Searcher), you removed a check if the given searcher is 
the cached one from an earlier version. This could potentially  cause 
problems for some people.
2. The createdSearchers variable is not really used: you just populate 
it and print it out. What's the purpose for it?
3. The variable numSearchersForRetirment is  used in 
WarmingIndexAccessor not DefaultIndexAccessor.
4. I wish that in the next release of Lucene, they will add searcher 
reopen api so that we do not have to wor around it.
5. Although currently IndexSearcher.close() does almost nothing except 
to close the internal index reader, it might be a safer to close 
searcher itself as well in closeCachedSearcher(), just in case, the 
searcher may have other resources to release in the future version of 
Lucene.


Thanks!

Jay
Mark Miller wrote:
For anyone following this thread who would like to check this out, I put 
up the new code with the warming capability:


https://issues.apache.org/jira/browse/LUCENE-1026
https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip 
IndexAccessor-02.04.2008.zip 
https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip 
(32 kb)


See the comment at the bottom.

Cam Bazz wrote:

Hello Mark,

Thank you for your lengthy and valuable clarification. I have the case -
before adding to the index, i must check if a document exist with the
same key (actually, double key) - or before deleting a document - I must
ensure it exists in the index.

Currently I am doing it with my custom caching routine. It works quite 
well

upto 32M documents. but after that something happens and it really slows
down.

I will experiment with your implementation, as soon as I can. It is very
cool by the way. Will it be included in the next release?

Best,
-C.B.

On Feb 4, 2008 7:15 PM, Mark Miller [EMAIL PROTECTED] wrote:

 

The purpose of IndexAccessor is to coordinate Readers/Writers for a
Lucene index. Readers and Writers in Lucene are multi-threaded in that
multiple threads may use them at the same time, but they must/should be
shared and there are special rules (You cannot delete with a Reader
while a Writer is working on the index). Also, you need to refresh
Reader views every so often; this is expensive (though usually much less
so with the new reopen method).

IndexAccessor enforces the rules and controls Reader refreshing. Instead
of worrying about caching or index interaction rules, you just ask for
your Reader/Writer, use it to search or add a doc, and then return it.
The rest is taken care of for you.

This is done by keeping a cached Writer and Searcher(s) that all threads
share. References to the Searchers are counted so that after a Writer is
returned (and no other thread has a reference to the Writer),
IndexAccessor waits for all of the current Searchers to come back and
then reopens their Readers.

In this regard, you get a  similar setup to what Solr might give: from
any thread you just add docs and run searches -- you don't have to worry
about refreshing Readers or sharing Writers/Readers or one thread
deleting with a Reader while another thread tries to write with a 
Writer.


This setup allows you to do other cool things, like warm Searchers
before putting them into action. Thats what the code I am posting soon
is be capable of - when the Readers are reopened, search requests will
still be handled by the old Readers while the new Searchers run a sample
query with optional sort fields. This will make sure the Reader is open
and its sort caches are loaded before the first thread tries to use it.
Much faster response to applications.

You must  open a new Reader or reopen a Reader to see recently added
docs...IndexAccessor provides no real way around that. But it does make
the reopening much easier -- and your application that just wants to add
docs and search at will from multiple threads, won't have to worry about
it.

You can bail out here, or if you want further clarification I will
include an alternate attempt at what IndexAccessor is below.

- Mark


 


When accessing a Lucene index from multiple threads, there are a variety
of issues that you must address.

1. The Readers/Writer should be shared across threads.
2. Readers must periodically be refreshed, either be creating new
instances or using the new reopen method.
3. A Reader that writes needs to be properly coordinated with a Writer
eg they cannot be used at the same time.

IndexAccessor addresses each of these issues.

How it works:

A single Writer is shared among threads that try to concurrently
retrieve and use a Writer. Once all of these threads release their
reference
to the Writer, it is closed and upon the next request a new one is
created.

A single Searcher for each Similarity is also

Re: DefaultIndexAccessor

2008-02-04 Thread Mark Miller


IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip from now 
on.

I hope to post new code with the warming either tonight or tomorrow night. I 
would be ecstatic to have some help vetting that.

Also, I am thinking of making a change so that when you release the Writer the 
thread that releases does not block until reopen. I think the original author 
did this so that if you add a doc with a thread and then immediately search 
from the same thread, you are guaranteed to find the doc. However, this 
gaurentee did not hold -- if another thread had a reference to the Writer and a 
new thread grabbed a Writer and then quicly released before the first thread, 
you will have added a doc but it will not be visible until the first thread 
releases its reference to the Writer...since the concept is not enforced 
anyway, you might as well not block for the final thread that releases the 
Writer either. Instead I will grab a thread from a thread pool to do the 
reopening with that thread, and return right after closing the Writer. The 
result is that you cannot add a doc and search and expect to find it without 
waiting a second or too. But this way things will be consistent, and an app 
that adds docs will be a bit more responsiveeg it wont hang as Readers are 
being reopened.

I also have to bring the AccessProvider classes back. No easy way to use your 
own custom Readers without it...I shouldn't have stripped it out.

- Mark



Cam Bazz wrote:

Hello,

Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems
very interesting. I have read the discussion on the page, but I could not
figure out which set of files is the latest.
Is it the IndexAccessor-1.26.2008.zip file?

I will read through the code, make my own tests, and send some feedback.

Best.
-C.B.

  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-04 Thread Cam Bazz

Hello Mark,

I have been reading the code - and honestly I have not understood how it
works. I was hoping that this was a solution to the case when you are adding
documents - in a multithreaded way, it allows other non-writer threads to be
able to see documents added without refreshing the indexsearcher - by using
some caching mechanism.

Could you elaborate what IndexAccessor does and how it does it a little bit
more?

Best Regards,
-C.B.

On Feb 4, 2008 3:06 PM, Mark Miller [EMAIL PROTECTED] wrote:

 IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip from
 now on.

 I hope to post new code with the warming either tonight or tomorrow night.
 I would be ecstatic to have some help vetting that.

 Also, I am thinking of making a change so that when you release the Writer
 the thread that releases does not block until reopen. I think the original
 author did this so that if you add a doc with a thread and then immediately
 search from the same thread, you are guaranteed to find the doc. However,
 this gaurentee did not hold -- if another thread had a reference to the
 Writer and a new thread grabbed a Writer and then quicly released before the
 first thread, you will have added a doc but it will not be visible until the
 first thread releases its reference to the Writer...since the concept is not
 enforced anyway, you might as well not block for the final thread that
 releases the Writer either. Instead I will grab a thread from a thread pool
 to do the reopening with that thread, and return right after closing the
 Writer. The result is that you cannot add a doc and search and expect to
 find it without waiting a second or too. But this way things will be
 consistent, and an app that adds docs will be a bit more responsiveeg it
 wont hang as Readers are being reopened.

 I also have to bring the AccessProvider classes back. No easy way to use
 your own custom Readers without it...I shouldn't have stripped it out.

 - Mark



 Cam Bazz wrote:
  Hello,
 
  Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems
  very interesting. I have read the discussion on the page, but I could
 not
  figure out which set of files is the latest.
  Is it the IndexAccessor-1.26.2008.zip file?
 
  I will read through the code, make my own tests, and send some feedback.
 
  Best.
  -C.B.
 
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

Re: DefaultIndexAccessor

2008-02-04 Thread Mark Miller

The purpose of IndexAccessor is to coordinate Readers/Writers for a 
Lucene index. Readers and Writers in Lucene are multi-threaded in that 
multiple threads may use them at the same time, but they must/should be 
shared and there are special rules (You cannot delete with a Reader 
while a Writer is working on the index). Also, you need to refresh 
Reader views every so often; this is expensive (though usually much less 
so with the new reopen method).


IndexAccessor enforces the rules and controls Reader refreshing. Instead 
of worrying about caching or index interaction rules, you just ask for 
your Reader/Writer, use it to search or add a doc, and then return it. 
The rest is taken care of for you.


This is done by keeping a cached Writer and Searcher(s) that all threads 
share. References to the Searchers are counted so that after a Writer is 
returned (and no other thread has a reference to the Writer), 
IndexAccessor waits for all of the current Searchers to come back and 
then reopens their Readers.


In this regard, you get a  similar setup to what Solr might give: from 
any thread you just add docs and run searches -- you don't have to worry 
about refreshing Readers or sharing Writers/Readers or one thread 
deleting with a Reader while another thread tries to write with a Writer.


This setup allows you to do other cool things, like warm Searchers 
before putting them into action. Thats what the code I am posting soon 
is be capable of - when the Readers are reopened, search requests will 
still be handled by the old Readers while the new Searchers run a sample 
query with optional sort fields. This will make sure the Reader is open 
and its sort caches are loaded before the first thread tries to use it. 
Much faster response to applications.


You must  open a new Reader or reopen a Reader to see recently added 
docs...IndexAccessor provides no real way around that. But it does make 
the reopening much easier -- and your application that just wants to add 
docs and search at will from multiple threads, won't have to worry about it.


You can bail out here, or if you want further clarification I will 
include an alternate attempt at what IndexAccessor is below.


- Mark


When accessing a Lucene index from multiple threads, there are a variety 
of issues that you must address.


1. The Readers/Writer should be shared across threads.
2. Readers must periodically be refreshed, either be creating new 
instances or using the new reopen method.
3. A Reader that writes needs to be properly coordinated with a Writer 
eg they cannot be used at the same time.


IndexAccessor addresses each of these issues.

How it works:

A single Writer is shared among threads that try to concurrently 
retrieve and use a Writer. Once all of these threads release their 
reference

to the Writer, it is closed and upon the next request a new one is created.

A single Searcher for each Similarity is also shared across threads. 
Upon first request, a new Searcher is created. This Searcher is then 
returned
upon every request. A count of every Searcher reference retrieved is 
maintained.


When all references to a Writer are released, the Writer is closed and 
after waiting for all of the Searchers to be returned, the Searchers are
reopened. Without warming enabled, new requests for Searchers/Readers 
must wait for this reopen to complete. If warming is enabled, the old
Searchers/Readers continue handling Searcher requests until the Readers 
have been reopened and any requested sort caches have been loaded.


If you ask for a writing Reader, you will not get it until a Writer is 
released and vice versa.


The result is that you can freely use Writers/Readers/Searchers from any 
thread without considering thread interactions. ***


If you want to add docs, just ask for a Writer, add the docs, and 
release the Writer. If you want to search, get a Searcher, search,
and release the Searcher. You don't have to worry about reopening 
Readers or coordinating access.



***
You still do have to consider things like hogging the Writer/Readers - 
if you don't occasionally release them, things will not stay very 
interactive.
The best method is to just get the object, use it, and then return it in 
a finally block. Batch load multiple docs, but if your just randomly adding
a doc, get the Writer, add it, and then release the Writer in a finally 
block. If you are batch loading a million docs and you want to be able 
to see them
as they are added: get the writer and add 10,000 docs (or something), 
release the Writer, get the Writer and add 10,000 docs, etc.


Cam Bazz wrote:

Hello Mark,

I have been reading the code - and honestly I have not understood how it
works. I was hoping that this was a solution to the case when you are adding
documents - in a multithreaded way, it allows other non-writer threads to be
able to see documents

Re: DefaultIndexAccessor

2008-02-04 Thread Mark Miller


I replied to the wrong thread -- sorry about that:

You still have to be careful if you want to alternate a search and 
write. If you are loading a lot of docs this way, you would want to hold 
the Writer to batch the docs, but while you are holding it, you will not 
have a fresh view of the index - so you could add the same doc twice if 
it came twice in a batch. The only way to be sure you avoid this is to 
reopen readers after you add every doc. This is just not going to be a 
fast way of doing things...but if you have a high mergefactor, the new 
reopen method will prob make it *much* faster. Or if you are sure that 
the batch won't contain duplicates, you can batch load.


Cam Bazz wrote:

Hello Mark,

Thank you for your lengthy and valuable clarification. I have the case -
before adding to the index, i must check if a document exist with the
same key (actually, double key) - or before deleting a document - I must
ensure it exists in the index.

Currently I am doing it with my custom caching routine. It works quite well
upto 32M documents. but after that something happens and it really slows
down.

I will experiment with your implementation, as soon as I can. It is very
cool by the way. Will it be included in the next release?

Best,
-C.B.

On Feb 4, 2008 7:15 PM, Mark Miller [EMAIL PROTECTED] wrote:

  

The purpose of IndexAccessor is to coordinate Readers/Writers for a
Lucene index. Readers and Writers in Lucene are multi-threaded in that
multiple threads may use them at the same time, but they must/should be
shared and there are special rules (You cannot delete with a Reader
while a Writer is working on the index). Also, you need to refresh
Reader views every so often; this is expensive (though usually much less
so with the new reopen method).

IndexAccessor enforces the rules and controls Reader refreshing. Instead
of worrying about caching or index interaction rules, you just ask for
your Reader/Writer, use it to search or add a doc, and then return it.
The rest is taken care of for you.

This is done by keeping a cached Writer and Searcher(s) that all threads
share. References to the Searchers are counted so that after a Writer is
returned (and no other thread has a reference to the Writer),
IndexAccessor waits for all of the current Searchers to come back and
then reopens their Readers.

In this regard, you get a  similar setup to what Solr might give: from
any thread you just add docs and run searches -- you don't have to worry
about refreshing Readers or sharing Writers/Readers or one thread
deleting with a Reader while another thread tries to write with a Writer.

This setup allows you to do other cool things, like warm Searchers
before putting them into action. Thats what the code I am posting soon
is be capable of - when the Readers are reopened, search requests will
still be handled by the old Readers while the new Searchers run a sample
query with optional sort fields. This will make sure the Reader is open
and its sort caches are loaded before the first thread tries to use it.
Much faster response to applications.

You must  open a new Reader or reopen a Reader to see recently added
docs...IndexAccessor provides no real way around that. But it does make
the reopening much easier -- and your application that just wants to add
docs and search at will from multiple threads, won't have to worry about
it.

You can bail out here, or if you want further clarification I will
include an alternate attempt at what IndexAccessor is below.

- Mark



When accessing a Lucene index from multiple threads, there are a variety
of issues that you must address.

1. The Readers/Writer should be shared across threads.
2. Readers must periodically be refreshed, either be creating new
instances or using the new reopen method.
3. A Reader that writes needs to be properly coordinated with a Writer
eg they cannot be used at the same time.

IndexAccessor addresses each of these issues.

How it works:

A single Writer is shared among threads that try to concurrently
retrieve and use a Writer. Once all of these threads release their
reference
to the Writer, it is closed and upon the next request a new one is
created.

A single Searcher for each Similarity is also shared across threads.
Upon first request, a new Searcher is created. This Searcher is then
returned
upon every request. A count of every Searcher reference retrieved is
maintained.

When all references to a Writer are released, the Writer is closed and
after waiting for all of the Searchers to be returned, the Searchers are
reopened. Without warming enabled, new requests for Searchers/Readers
must wait for this reopen to complete. If warming is enabled, the old
Searchers/Readers continue handling Searcher requests until the Readers
have been reopened and any requested sort caches have been loaded.

If you ask for a writing Reader, you will not get it

Re: DefaultIndexAccessor

2008-02-04 Thread Mark Miller

For anyone following this thread who would like to check this out, I put 
up the new code with the warming capability:


https://issues.apache.org/jira/browse/LUCENE-1026
https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip 
IndexAccessor-02.04.2008.zip 
https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip 
(32 kb)


See the comment at the bottom.

Cam Bazz wrote:

Hello Mark,

Thank you for your lengthy and valuable clarification. I have the case -
before adding to the index, i must check if a document exist with the
same key (actually, double key) - or before deleting a document - I must
ensure it exists in the index.

Currently I am doing it with my custom caching routine. It works quite well
upto 32M documents. but after that something happens and it really slows
down.

I will experiment with your implementation, as soon as I can. It is very
cool by the way. Will it be included in the next release?

Best,
-C.B.

On Feb 4, 2008 7:15 PM, Mark Miller [EMAIL PROTECTED] wrote:

  

The purpose of IndexAccessor is to coordinate Readers/Writers for a
Lucene index. Readers and Writers in Lucene are multi-threaded in that
multiple threads may use them at the same time, but they must/should be
shared and there are special rules (You cannot delete with a Reader
while a Writer is working on the index). Also, you need to refresh
Reader views every so often; this is expensive (though usually much less
so with the new reopen method).

IndexAccessor enforces the rules and controls Reader refreshing. Instead
of worrying about caching or index interaction rules, you just ask for
your Reader/Writer, use it to search or add a doc, and then return it.
The rest is taken care of for you.

This is done by keeping a cached Writer and Searcher(s) that all threads
share. References to the Searchers are counted so that after a Writer is
returned (and no other thread has a reference to the Writer),
IndexAccessor waits for all of the current Searchers to come back and
then reopens their Readers.

In this regard, you get a  similar setup to what Solr might give: from
any thread you just add docs and run searches -- you don't have to worry
about refreshing Readers or sharing Writers/Readers or one thread
deleting with a Reader while another thread tries to write with a Writer.

This setup allows you to do other cool things, like warm Searchers
before putting them into action. Thats what the code I am posting soon
is be capable of - when the Readers are reopened, search requests will
still be handled by the old Readers while the new Searchers run a sample
query with optional sort fields. This will make sure the Reader is open
and its sort caches are loaded before the first thread tries to use it.
Much faster response to applications.

You must  open a new Reader or reopen a Reader to see recently added
docs...IndexAccessor provides no real way around that. But it does make
the reopening much easier -- and your application that just wants to add
docs and search at will from multiple threads, won't have to worry about
it.

You can bail out here, or if you want further clarification I will
include an alternate attempt at what IndexAccessor is below.

- Mark



When accessing a Lucene index from multiple threads, there are a variety
of issues that you must address.

1. The Readers/Writer should be shared across threads.
2. Readers must periodically be refreshed, either be creating new
instances or using the new reopen method.
3. A Reader that writes needs to be properly coordinated with a Writer
eg they cannot be used at the same time.

IndexAccessor addresses each of these issues.

How it works:

A single Writer is shared among threads that try to concurrently
retrieve and use a Writer. Once all of these threads release their
reference
to the Writer, it is closed and upon the next request a new one is
created.

A single Searcher for each Similarity is also shared across threads.
Upon first request, a new Searcher is created. This Searcher is then
returned
upon every request. A count of every Searcher reference retrieved is
maintained.

When all references to a Writer are released, the Writer is closed and
after waiting for all of the Searchers to be returned, the Searchers are
reopened. Without warming enabled, new requests for Searchers/Readers
must wait for this reopen to complete. If warming is enabled, the old
Searchers/Readers continue handling Searcher requests until the Readers
have been reopened and any requested sort caches have been loaded.

If you ask for a writing Reader, you will not get it until a Writer is
released and vice versa.

The result is that you can freely use Writers/Readers/Searchers from any
thread without considering thread interactions. ***

If you want to add docs, just ask for a Writer, add the docs, and
release the Writer. If you want to

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

Re: DefaultIndexAccessor

21 matches

Site Navigation

Mail list logo

Footer information