Re: ScorerDocQueue.HeapedScorerDoc

2008-07-28 Thread eks dev


... to change semantics of these iterators not to return boolen but
  rather document Id with sentinel values. This would definitely reduce
  number of method invocations by factor 2 at least.--- {next() doc()}
  - next()
 
  It would be pretty easy to do that, just requires on one huge patch, 
  but with only simple changes ... this is public api (wait for V3.0?).
  Would that make sense?
 
  Also, without measuring I could not say if that would bring
  something, but looks like. I think MG4J people made this switch in
  last version as well.
 
 I'm skeptical about this one, I think it will not be easy to beat
 the simplicity of the current next()/skipTo()/doc(), especially
 with good inlining.
 But when it improves performance, I'm all ears.
 
 Also, would sentinel testing keep its speed when doc numbers
 change from int to long? I really don't know...
 Regards,
 Paul Elschot
 


it is hard to measure this, micro  benchmarks... someone with built 
disassembler like Yonik could probably make much better qualified guess :)

what I am hypothesizing is following:

can this be faster:
while(-1 != (doc=i.next()){  //next() returns doc number or sentinel at the end
//do something with doc
} 


than this (what we have today):
while(i.next()){
doc = i.doc();
//do something with doc
} 

theoretically, the first one is:
- one method call
- test on equality on int (in future probably long) and 
- one assignment to the local variable

in second case:
- two method calls
- test on equality on boolean 
- one assignment to the local variable

so the question to answer (from performance point of view) is if t(method call) 
+ t(comparison on boolean) takes longer than t(comparison on int/long). I know, 
it depends, but the question is is it likely enogh to warrant effort? 
  
By the way, how we want to proceed with The Most Popular Bug Source on these 
iterators, one off due to calling doc() before calling next()/skipTo()... if 
I remember well, someone smart (Doug or Paul) suggested that we assert these 
cases... Before moving on with this, I would propose to do it, could save some 
debugging ours. If we opt for this first option of iterators, we do not have 
this problem, also, once you are at the end of iterator, you cannot get cached 
doc() (we had one off at the end as well)

I do not know, just brain dumping 


  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Deadlock when multi-threading DocumentsWriter

2008-07-28 Thread Jagadesh Nomula
Would anyone be having any insight into deadlock issues, when running 
DocumentsWriter.java from multiple threads ?. I am trying to port 
ParallelWriter.java code to new codebase of DocumentsWriter.java and 
IndexWriter. I am doing this by splitting, DocumentsWriter.addDocument call 
into two methods unsynchronized methods, doGetThreadState and 
finishDocWithThreadState. doGetThreadState just calls the synchronized 
getThreadState method and returns a thread state to be used by 
finishDocWithThreadState, which inverts the document and flushes it.  The code 
base is semantically equivalent to addDocument method in DocumentsWriter, the 
only variation being, call to doGetThreadState executed from a synched block in 
ParallelWriter to maintain the consistency of same doc-ids in parallelWriter.

You would imagine that, this code would work without any issues, but it runs 
into a deadlock. The excerpt of suspicious calls is:

== Thread ConnectionThreadGroup-26491.pool-8-thread-1 === 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)

org.apache.lucene.index.DocumentsWriter.pauseAllThreads(DocumentsWriter.java:507)
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2670)
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2660)
org.apache.lucene.index.IndexWriter.finishDoc(IndexWriter.java:1601)

org.apache.lucene.index.ParallelWriter$ProcessWorker.run(ParallelWriter.java:464)

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
java.lang.Thread.run(Thread.java:619)


===
== Thread ConnectionThreadGroup-26491.pool-3-thread-6 === 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)

org.apache.lucene.index.DocumentsWriter.getThreadState(DocumentsWriter.java:2420)

org.apache.lucene.index.DocumentsWriter.doGetThreadState(DocumentsWriter.java:2532)

org.apache.lucene.index.IndexWriter.getThreadState(IndexWriter.java:1564)

org.apache.lucene.index.ParallelWriter$ThreadStateWorker.call(ParallelWriter.java:425)

org.apache.lucene.index.ParallelWriter$ThreadStateWorker.call(ParallelWriter.java:405)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
java.lang.Thread.run(Thread.java:619)

Any info, that I might be overlooking or any comments would be of great help to 
me in resolving this. Thanks in advance for your help.

Jagdish



[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1345:


Attachment: TestIteratorPerf.java

Hi Paul, 
I gave it a try on micro benchmarking, and it looks like we could gain a lot by 
switcing to sentinel approach for iterators, apart for being faster they are 
also a bit robuster to one off bugs. 

This test is just a simulation made assuming docId is long (I have tried it 
with int and it is about the same result).

Just attaching it here as I did not want to create new issue for now, before we 
identify if there are some design/performance knock-out criteria.

test on my setup:
32bit java version 1.6.0_10-rc
java(TM) SE Runtime Environment (build 1.6.0_10-rc-b28)
Windows XP Profesional 32bit
notebook, 3Gb RAM, 
CPU x86 Family 6 Model 15 Stepping 11 GenuineIntel ~2194 Mhz

java -server -Xbatch


result (with docID long):
old  milliseconds=6938
old  milliseconds=6953
old  milliseconds=6890
old  milliseconds=6938
old  milliseconds=6906
old  milliseconds=6922
old  milliseconds=6906
old  milliseconds=6938
old  milliseconds=6906
old  milliseconds=6906
old total milliseconds=69203

new  milliseconds=5797
new  milliseconds=5703
new  milliseconds=5266
new  milliseconds=5250
new  milliseconds=5234
new  milliseconds=5250
new  milliseconds=5235
new  milliseconds=5250
new  milliseconds=5250
new  milliseconds=5250
new total milliseconds=53485
New/Old Time 53485/69203 (77.28711%)

all in all, faster more than 22% !!

Of course, this type of benchmark does not mean all iterator ops in real life 
are going to be 20% faster... other things probably dominate, but if it proves 
that this test does not have some flaws (easy possible)... well worth of 
pursuing

cheers, eks 




 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345.patch, TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Deadlock when multi-threading DocumentsWriter

2008-07-28 Thread Jagadesh Nomula
Would anyone be having any insight into deadlock issues, when running 
DocumentsWriter.java from multiple threads ?. I am trying to port 
ParallelWriter.java code to new codebase of DocumentsWriter.java and 
IndexWriter. I am doing this by splitting, DocumentsWriter.addDocument call 
into two methods unsynchronized methods, doGetThreadState and 
finishDocWithThreadState. doGetThreadState just calls the synchronized 
getThreadState method and returns a thread state to be used by 
finishDocWithThreadState, which inverts the document and flushes it.  The code 
base is semantically equivalent to addDocument method in DocumentsWriter, the 
only variation being, call to doGetThreadState executed from a synched block in 
ParallelWriter to maintain the consistency of same doc-ids in parallelWriter.

You would imagine that, this code would work without any issues, but it runs 
into a deadlock. The excerpt of suspicious calls is:

== Thread ConnectionThreadGroup-26491.pool-8-thread-1 === 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)

org.apache.lucene.index.DocumentsWriter.pauseAllThreads(DocumentsWriter.java:507)
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2670)
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2660)
org.apache.lucene.index.IndexWriter.finishDoc(IndexWriter.java:1601)

org.apache.lucene.index.ParallelWriter$ProcessWorker.run(ParallelWriter.java:464)

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
java.lang.Thread.run(Thread.java:619)


===
== Thread ConnectionThreadGroup-26491.pool-3-thread-6 === 
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)

org.apache.lucene.index.DocumentsWriter.getThreadState(DocumentsWriter.java:2420)

org.apache.lucene.index.DocumentsWriter.doGetThreadState(DocumentsWriter.java:2532)

org.apache.lucene.index.IndexWriter.getThreadState(IndexWriter.java:1564)

org.apache.lucene.index.ParallelWriter$ThreadStateWorker.call(ParallelWriter.java:425)

org.apache.lucene.index.ParallelWriter$ThreadStateWorker.call(ParallelWriter.java:405)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
java.lang.Thread.run(Thread.java:619)

Any info, that I might be overlooking or any comments would be of great help to 
me in resolving this. Thanks in advance for your help.

Jagdish




[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-1345:
-

Attachment: LUCENE-1345.patch

Patch of 20080729: all tests pass, but no tests cases for filter clauses yet.

Added BooleanFilterClause class, usable as argument to BooleanQuery.add().

API change: made ReqExclScorer package private, added an arg to the constructor.

Removed the queueSize variable in DisjunctionSumScorer and in the added 
DisjunctionDISI. Left the doc caching in ScorerDocQueue and in the added 
DisiDocQueue.

It might be possible to subclass DisjunctionSumScorer from DisjunctionDISI,
and to subclass ScorerDocQueue from DisiDocQueue, I have not checked that.

Since ConjunctionScorer can handle DocIdSetIterators with this patch, it should 
improve the speed for Filters when they are added to a BooleanQuery instead of 
being used as through the current search API.

Eks, thanks for DisjunctionDISI, I took it a bit further.


 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617603#action_12617603
 ] 

Eks Dev commented on LUCENE-1345:
-

great! Will look into at at the weekend in more datails.

 I have moved this part to Constructor on my local copy, it passes all tests:

+if (disiDocQueue == null) {
+  initDisiDocQueue();
+}


it is in next() and skipTo()

practically the same as reported in 
https://issues.apache.org/jira/browse/LUCENE-1145, with this, 1145 can be closed



 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617604#action_12617604
 ] 

Paul Elschot commented on LUCENE-1345:
--

20090729 is the date here, the attachment is dated 20080728, never mind.

As to the sentinel for doc()/next() in the TestIteraratorPerf patch: this will 
need some real Scorers/DocIdSetIterators to see actual JIT compiler inlining in 
both cases. In the patch, the Old and New classes are local private classes, 
which are much easier to inline than separate, (non final) public classes. 


 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617606#action_12617606
 ] 

Paul Elschot commented on LUCENE-1345:
--

Indeed, it makes sense to add the changes from LUCENE-1145 here.
I remembered some discussion about this, but not that there was an issue open...

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread eks dev
from what I can say, this just makes it harder for the new approach, but you 
newer know before you try it in production ...

just wanted to see if it could lead anywhere before spending real time on it



- Original Message 
 From: Paul Elschot (JIRA) [EMAIL PROTECTED]
 To: java-dev@lucene.apache.org
 Sent: Tuesday, 29 July, 2008 12:44:31 AM
 Subject: [jira] Commented: (LUCENE-1345) Allow Filter as clause to 
 BooleanQuery
 
 
 [ 
 https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617604#action_12617604
  
 ] 
 
 Paul Elschot commented on LUCENE-1345:
 --
 
 20090729 is the date here, the attachment is dated 20080728, never mind.
 
 As to the sentinel for doc()/next() in the TestIteraratorPerf patch: this 
 will 
 need some real Scorers/DocIdSetIterators to see actual JIT compiler inlining 
 in 
 both cases. In the patch, the Old and New classes are local private classes, 
 which are much easier to inline than separate, (non final) public classes. 
 
 
  Allow Filter as clause to BooleanQuery
  --
 
  Key: LUCENE-1345
  URL: https://issues.apache.org/jira/browse/LUCENE-1345
  Project: Lucene - Java
   Issue Type: Improvement
   Components: Search
 Reporter: Paul Elschot
 Priority: Minor
  Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java
 
 
 
 
 -- 
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Deadlock when multi-threading DocumentsWriter

2008-07-28 Thread Michael McCandless


Can you post a patch with your full changes to DocumentsWriter and  
IndexWriter?


That first thread is trying to flush, but is waiting for all threads  
to leave DocumentsWriter (finish adding docs).  The 2nd thread looks  
like it's waiting for the flush to finish before proceeding.  Are  
there any other threads?


Are you calling DocumentsWriter.finishDocument?  That method frees the  
thread state, which is what that first thread is waiting on...


Mike

Jagadesh Nomula wrote:

Would anyone be having any insight into deadlock issues, when  
running DocumentsWriter.java from multiple threads ?. I am trying to  
port ParallelWriter.java code to new codebase of  
DocumentsWriter.java and IndexWriter. I am doing this by splitting,  
DocumentsWriter.addDocument call into two methods unsynchronized  
methods, doGetThreadState and finishDocWithThreadState.  
doGetThreadState just calls the synchronized getThreadState method  
and returns a thread state to be used by finishDocWithThreadState,  
which inverts the document and flushes it.  The code base is  
semantically equivalent to addDocument method in DocumentsWriter,  
the only variation being, call to doGetThreadState executed from a  
synched block in ParallelWriter to maintain the consistency of same  
doc-ids in parallelWriter.


You would imagine that, this code would work without any issues, but  
it runs into a deadlock. The excerpt of suspicious calls is:


== Thread ConnectionThreadGroup-26491.pool-8-thread-1 ===  
java.lang.Object.wait(Native Method)

java.lang.Object.wait(Object.java:485)
 
org 
.apache 
.lucene.index.DocumentsWriter.pauseAllThreads(DocumentsWriter.java: 
507)
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java: 
2670)
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java: 
2660)
 
org.apache.lucene.index.IndexWriter.finishDoc(IndexWriter.java:1601)
org.apache.lucene.index.ParallelWriter 
$ProcessWorker.run(ParallelWriter.java:464)
java.util.concurrent.ThreadPoolExecutor 
$Worker.runTask(ThreadPoolExecutor.java:885)
java.util.concurrent.ThreadPoolExecutor 
$Worker.run(ThreadPoolExecutor.java:907)

java.lang.Thread.run(Thread.java:619)


===
== Thread ConnectionThreadGroup-26491.pool-3-thread-6 ===  
java.lang.Object.wait(Native Method)

java.lang.Object.wait(Object.java:485)
 
org 
.apache 
.lucene.index.DocumentsWriter.getThreadState(DocumentsWriter.java: 
2420)
 
org 
.apache 
.lucene.index.DocumentsWriter.doGetThreadState(DocumentsWriter.java: 
2532)
 
org.apache.lucene.index.IndexWriter.getThreadState(IndexWriter.java: 
1564)
org.apache.lucene.index.ParallelWriter 
$ThreadStateWorker.call(ParallelWriter.java:425)
org.apache.lucene.index.ParallelWriter 
$ThreadStateWorker.call(ParallelWriter.java:405)
java.util.concurrent.FutureTask 
$Sync.innerRun(FutureTask.java:303)

java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor 
$Worker.runTask(ThreadPoolExecutor.java:885)
java.util.concurrent.ThreadPoolExecutor 
$Worker.run(ThreadPoolExecutor.java:907)

java.lang.Thread.run(Thread.java:619)

Any info, that I might be overlooking or any comments would be of  
great help to me in resolving this. Thanks in advance for your help.


Jagdish





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617631#action_12617631
 ] 

Yonik Seeley commented on LUCENE-1345:
--

Eks: just for grins, you can sometimes save a single cycle by changing id==-1 
to id0 (many operations on x86 automatically set status flags, hence 
comparison to zero can often be free).  Not sure if the java optimizer will 
catch it though, and if it does if it would actually rise above the noise level.

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2008-07-28 Thread Chris Hostetter

: Eks: just for grins, you can sometimes save a single cycle by changing 
: id==-1 to id0 (many operations on x86 automatically set status 

can you save anymore if you use 0id ? :)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]