Re: ScorerDocQueue.HeapedScorerDoc
... to change semantics of these iterators not to return boolen but rather document Id with sentinel values. This would definitely reduce number of method invocations by factor 2 at least.--- {next() doc()} - next() It would be pretty easy to do that, just requires on one huge patch, but with only simple changes ... this is public api (wait for V3.0?). Would that make sense? Also, without measuring I could not say if that would bring something, but looks like. I think MG4J people made this switch in last version as well. I'm skeptical about this one, I think it will not be easy to beat the simplicity of the current next()/skipTo()/doc(), especially with good inlining. But when it improves performance, I'm all ears. Also, would sentinel testing keep its speed when doc numbers change from int to long? I really don't know... Regards, Paul Elschot it is hard to measure this, micro benchmarks... someone with built disassembler like Yonik could probably make much better qualified guess :) what I am hypothesizing is following: can this be faster: while(-1 != (doc=i.next()){ //next() returns doc number or sentinel at the end //do something with doc } than this (what we have today): while(i.next()){ doc = i.doc(); //do something with doc } theoretically, the first one is: - one method call - test on equality on int (in future probably long) and - one assignment to the local variable in second case: - two method calls - test on equality on boolean - one assignment to the local variable so the question to answer (from performance point of view) is if t(method call) + t(comparison on boolean) takes longer than t(comparison on int/long). I know, it depends, but the question is is it likely enogh to warrant effort? By the way, how we want to proceed with The Most Popular Bug Source on these iterators, one off due to calling doc() before calling next()/skipTo()... if I remember well, someone smart (Doug or Paul) suggested that we assert these cases... Before moving on with this, I would propose to do it, could save some debugging ours. If we opt for this first option of iterators, we do not have this problem, also, once you are at the end of iterator, you cannot get cached doc() (we had one off at the end as well) I do not know, just brain dumping __ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Deadlock when multi-threading DocumentsWriter
Would anyone be having any insight into deadlock issues, when running DocumentsWriter.java from multiple threads ?. I am trying to port ParallelWriter.java code to new codebase of DocumentsWriter.java and IndexWriter. I am doing this by splitting, DocumentsWriter.addDocument call into two methods unsynchronized methods, doGetThreadState and finishDocWithThreadState. doGetThreadState just calls the synchronized getThreadState method and returns a thread state to be used by finishDocWithThreadState, which inverts the document and flushes it. The code base is semantically equivalent to addDocument method in DocumentsWriter, the only variation being, call to doGetThreadState executed from a synched block in ParallelWriter to maintain the consistency of same doc-ids in parallelWriter. You would imagine that, this code would work without any issues, but it runs into a deadlock. The excerpt of suspicious calls is: == Thread ConnectionThreadGroup-26491.pool-8-thread-1 === java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.lucene.index.DocumentsWriter.pauseAllThreads(DocumentsWriter.java:507) org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2670) org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2660) org.apache.lucene.index.IndexWriter.finishDoc(IndexWriter.java:1601) org.apache.lucene.index.ParallelWriter$ProcessWorker.run(ParallelWriter.java:464) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) java.lang.Thread.run(Thread.java:619) === == Thread ConnectionThreadGroup-26491.pool-3-thread-6 === java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.lucene.index.DocumentsWriter.getThreadState(DocumentsWriter.java:2420) org.apache.lucene.index.DocumentsWriter.doGetThreadState(DocumentsWriter.java:2532) org.apache.lucene.index.IndexWriter.getThreadState(IndexWriter.java:1564) org.apache.lucene.index.ParallelWriter$ThreadStateWorker.call(ParallelWriter.java:425) org.apache.lucene.index.ParallelWriter$ThreadStateWorker.call(ParallelWriter.java:405) java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) java.util.concurrent.FutureTask.run(FutureTask.java:138) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) java.lang.Thread.run(Thread.java:619) Any info, that I might be overlooking or any comments would be of great help to me in resolving this. Thanks in advance for your help. Jagdish
[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1345: Attachment: TestIteratorPerf.java Hi Paul, I gave it a try on micro benchmarking, and it looks like we could gain a lot by switcing to sentinel approach for iterators, apart for being faster they are also a bit robuster to one off bugs. This test is just a simulation made assuming docId is long (I have tried it with int and it is about the same result). Just attaching it here as I did not want to create new issue for now, before we identify if there are some design/performance knock-out criteria. test on my setup: 32bit java version 1.6.0_10-rc java(TM) SE Runtime Environment (build 1.6.0_10-rc-b28) Windows XP Profesional 32bit notebook, 3Gb RAM, CPU x86 Family 6 Model 15 Stepping 11 GenuineIntel ~2194 Mhz java -server -Xbatch result (with docID long): old milliseconds=6938 old milliseconds=6953 old milliseconds=6890 old milliseconds=6938 old milliseconds=6906 old milliseconds=6922 old milliseconds=6906 old milliseconds=6938 old milliseconds=6906 old milliseconds=6906 old total milliseconds=69203 new milliseconds=5797 new milliseconds=5703 new milliseconds=5266 new milliseconds=5250 new milliseconds=5234 new milliseconds=5250 new milliseconds=5235 new milliseconds=5250 new milliseconds=5250 new milliseconds=5250 new total milliseconds=53485 New/Old Time 53485/69203 (77.28711%) all in all, faster more than 22% !! Of course, this type of benchmark does not mean all iterator ops in real life are going to be 20% faster... other things probably dominate, but if it proves that this test does not have some flaws (easy possible)... well worth of pursuing cheers, eks Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Deadlock when multi-threading DocumentsWriter
Would anyone be having any insight into deadlock issues, when running DocumentsWriter.java from multiple threads ?. I am trying to port ParallelWriter.java code to new codebase of DocumentsWriter.java and IndexWriter. I am doing this by splitting, DocumentsWriter.addDocument call into two methods unsynchronized methods, doGetThreadState and finishDocWithThreadState. doGetThreadState just calls the synchronized getThreadState method and returns a thread state to be used by finishDocWithThreadState, which inverts the document and flushes it. The code base is semantically equivalent to addDocument method in DocumentsWriter, the only variation being, call to doGetThreadState executed from a synched block in ParallelWriter to maintain the consistency of same doc-ids in parallelWriter. You would imagine that, this code would work without any issues, but it runs into a deadlock. The excerpt of suspicious calls is: == Thread ConnectionThreadGroup-26491.pool-8-thread-1 === java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.lucene.index.DocumentsWriter.pauseAllThreads(DocumentsWriter.java:507) org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2670) org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2660) org.apache.lucene.index.IndexWriter.finishDoc(IndexWriter.java:1601) org.apache.lucene.index.ParallelWriter$ProcessWorker.run(ParallelWriter.java:464) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) java.lang.Thread.run(Thread.java:619) === == Thread ConnectionThreadGroup-26491.pool-3-thread-6 === java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.lucene.index.DocumentsWriter.getThreadState(DocumentsWriter.java:2420) org.apache.lucene.index.DocumentsWriter.doGetThreadState(DocumentsWriter.java:2532) org.apache.lucene.index.IndexWriter.getThreadState(IndexWriter.java:1564) org.apache.lucene.index.ParallelWriter$ThreadStateWorker.call(ParallelWriter.java:425) org.apache.lucene.index.ParallelWriter$ThreadStateWorker.call(ParallelWriter.java:405) java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) java.util.concurrent.FutureTask.run(FutureTask.java:138) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) java.lang.Thread.run(Thread.java:619) Any info, that I might be overlooking or any comments would be of great help to me in resolving this. Thanks in advance for your help. Jagdish
[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-1345: - Attachment: LUCENE-1345.patch Patch of 20080729: all tests pass, but no tests cases for filter clauses yet. Added BooleanFilterClause class, usable as argument to BooleanQuery.add(). API change: made ReqExclScorer package private, added an arg to the constructor. Removed the queueSize variable in DisjunctionSumScorer and in the added DisjunctionDISI. Left the doc caching in ScorerDocQueue and in the added DisiDocQueue. It might be possible to subclass DisjunctionSumScorer from DisjunctionDISI, and to subclass ScorerDocQueue from DisiDocQueue, I have not checked that. Since ConjunctionScorer can handle DocIdSetIterators with this patch, it should improve the speed for Filters when they are added to a BooleanQuery instead of being used as through the current search API. Eks, thanks for DisjunctionDISI, I took it a bit further. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617603#action_12617603 ] Eks Dev commented on LUCENE-1345: - great! Will look into at at the weekend in more datails. I have moved this part to Constructor on my local copy, it passes all tests: +if (disiDocQueue == null) { + initDisiDocQueue(); +} it is in next() and skipTo() practically the same as reported in https://issues.apache.org/jira/browse/LUCENE-1145, with this, 1145 can be closed Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617604#action_12617604 ] Paul Elschot commented on LUCENE-1345: -- 20090729 is the date here, the attachment is dated 20080728, never mind. As to the sentinel for doc()/next() in the TestIteraratorPerf patch: this will need some real Scorers/DocIdSetIterators to see actual JIT compiler inlining in both cases. In the patch, the Old and New classes are local private classes, which are much easier to inline than separate, (non final) public classes. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617606#action_12617606 ] Paul Elschot commented on LUCENE-1345: -- Indeed, it makes sense to add the changes from LUCENE-1145 here. I remembered some discussion about this, but not that there was an issue open... Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
from what I can say, this just makes it harder for the new approach, but you newer know before you try it in production ... just wanted to see if it could lead anywhere before spending real time on it - Original Message From: Paul Elschot (JIRA) [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Tuesday, 29 July, 2008 12:44:31 AM Subject: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery [ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617604#action_12617604 ] Paul Elschot commented on LUCENE-1345: -- 20090729 is the date here, the attachment is dated 20080728, never mind. As to the sentinel for doc()/next() in the TestIteraratorPerf patch: this will need some real Scorers/DocIdSetIterators to see actual JIT compiler inlining in both cases. In the patch, the Old and New classes are local private classes, which are much easier to inline than separate, (non final) public classes. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Deadlock when multi-threading DocumentsWriter
Can you post a patch with your full changes to DocumentsWriter and IndexWriter? That first thread is trying to flush, but is waiting for all threads to leave DocumentsWriter (finish adding docs). The 2nd thread looks like it's waiting for the flush to finish before proceeding. Are there any other threads? Are you calling DocumentsWriter.finishDocument? That method frees the thread state, which is what that first thread is waiting on... Mike Jagadesh Nomula wrote: Would anyone be having any insight into deadlock issues, when running DocumentsWriter.java from multiple threads ?. I am trying to port ParallelWriter.java code to new codebase of DocumentsWriter.java and IndexWriter. I am doing this by splitting, DocumentsWriter.addDocument call into two methods unsynchronized methods, doGetThreadState and finishDocWithThreadState. doGetThreadState just calls the synchronized getThreadState method and returns a thread state to be used by finishDocWithThreadState, which inverts the document and flushes it. The code base is semantically equivalent to addDocument method in DocumentsWriter, the only variation being, call to doGetThreadState executed from a synched block in ParallelWriter to maintain the consistency of same doc-ids in parallelWriter. You would imagine that, this code would work without any issues, but it runs into a deadlock. The excerpt of suspicious calls is: == Thread ConnectionThreadGroup-26491.pool-8-thread-1 === java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org .apache .lucene.index.DocumentsWriter.pauseAllThreads(DocumentsWriter.java: 507) org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java: 2670) org.apache.lucene.index.IndexWriter.flush(IndexWriter.java: 2660) org.apache.lucene.index.IndexWriter.finishDoc(IndexWriter.java:1601) org.apache.lucene.index.ParallelWriter $ProcessWorker.run(ParallelWriter.java:464) java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:885) java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:907) java.lang.Thread.run(Thread.java:619) === == Thread ConnectionThreadGroup-26491.pool-3-thread-6 === java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org .apache .lucene.index.DocumentsWriter.getThreadState(DocumentsWriter.java: 2420) org .apache .lucene.index.DocumentsWriter.doGetThreadState(DocumentsWriter.java: 2532) org.apache.lucene.index.IndexWriter.getThreadState(IndexWriter.java: 1564) org.apache.lucene.index.ParallelWriter $ThreadStateWorker.call(ParallelWriter.java:425) org.apache.lucene.index.ParallelWriter $ThreadStateWorker.call(ParallelWriter.java:405) java.util.concurrent.FutureTask $Sync.innerRun(FutureTask.java:303) java.util.concurrent.FutureTask.run(FutureTask.java:138) java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:885) java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:907) java.lang.Thread.run(Thread.java:619) Any info, that I might be overlooking or any comments would be of great help to me in resolving this. Thanks in advance for your help. Jagdish - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617631#action_12617631 ] Yonik Seeley commented on LUCENE-1345: -- Eks: just for grins, you can sometimes save a single cycle by changing id==-1 to id0 (many operations on x86 automatically set status flags, hence comparison to zero can often be free). Not sure if the java optimizer will catch it though, and if it does if it would actually rise above the noise level. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Attachments: DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345.patch, LUCENE-1345.patch, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
: Eks: just for grins, you can sometimes save a single cycle by changing : id==-1 to id0 (many operations on x86 automatically set status can you save anymore if you use 0id ? :) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]