[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576723#action_12576723 ] Doron Cohen commented on LUCENE-1209: - Mark you are right that setConfig is called just once, at start. At least for setting properties by round this should be sufficient. I wonder why this doesn't work for you. I tried with this one: {code} compound=true analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=RamDirectory doc.stored=true doc.tokenized=true doc.term.vector=termVec:false:true doc.add.log.step=10 doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker task.max.depth.log=1 { { Populate CreateIndex { AddDoc : 50 Optimize CloseIndex ResetSystemErase NewRound } : 2 RepSumByName RepSelectByPref Populate {code} And got this output: {code} Working Directory: work Running algorithm from: conf\termVecByRound.alg config properties: analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer compound = true directory = RamDirectory doc.add.log.step = 10 doc.maker = org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker doc.stored = true doc.term.vector = termVec:false:true doc.tokenized = true task.max.depth.log = 1 work.dir = work --- algorithm: Seq { Seq_2 { Populate { CreateIndex Seq_50 { AddDoc * 50 Optimize CloseIndex ResetSystemErase NewRound } * 2 RepSumByName RepSelectByPref Populate } starting task: Seq starting task: Seq_2 -- 0.1 sec: main processed (add) 10 docs -- 0.1 sec: main processed (add) 20 docs -- 0.11 sec: main processed (add) 30 docs -- 0.11 sec: main processed (add) 40 docs -- 0.11 sec: main processed (add) 50 docs SimpleDocMaker statistics (0): num docs added since last inputs reset: 50 total bytes added since last inputs reset: 42,150 -- Round 0--1: doc.term.vector:false--true -- 0 sec: main processed (add) 60 docs -- 0 sec: main processed (add) 70 docs -- 0 sec: main processed (add) 80 docs -- 0 sec: main processed (add) 90 docs -- 0 sec: main processed (add) 100 docs SimpleDocMaker statistics (1): num docs added since last inputs reset: 50 total bytes added since last inputs reset: 42,150 -- Round 1--2: doc.term.vector:true--false Report Sum By (any) Name (2 about 3 out of 4) Operation round termVec runCnt recsPerRunrec/s elapsedSec avgUsedMemavgTotalMem Seq_2 0 false1 106530.00.20 639,912 5,177,344 Populate- -2 53706.70.15 839,552 5,177,344 Report Select By Prefix (Populate) (2 about 2 out of 4) Operation round termVec runCnt recsPerRunrec/s elapsedSec avgUsedMemavgTotalMem Populate0 false1 53378.60.14 858,080 5,177,344 Populate - - 1 - true - - 1 - - - 53 - - 5,300.0 - - 0.01 - - 821,024 - - 5,177,344 ### D O N E !!! ### {code} Note in particular this line: {code} [java] -- Round 0--1: doc.term.vector:false--true {code} Note that a *NewRound* command is required in order for the round number to change. {code} NewRound {code} A possible cause for error is that the property definition parsing requires a property name prefix for multi-valued properties. So this would not work as expected: {code} doc.term.vector=false:true {code} But this will work: {code} doc.term.vector=termVec:false:true {code} If it still doesn't work for you, can you post here the algorithm? If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round - Key: LUCENE-1209 URL: https://issues.apache.org/jira/browse/LUCENE-1209 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Mark Miller Priority: Trivial Attachments: reset_config.patch I want to be able to run one benchmark that tests things using term vectors and not using term vectors. Currently this is not easy because you cannot specify term vectors per round. While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests. If it doesn't affect anything else, it would be great to
[jira] Created: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception
IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception -- Key: LUCENE-1210 URL: https://issues.apache.org/jira/browse/LUCENE-1210 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1, 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 If you're using CMS (the default) and mergeInit hits an exception (eg OOME), we are not properly clearing IndexWriter's internal tracking of running merges. This causes IW.close() to hang while it incorrectly waits for these non-started merges to finish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception
[ https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576725#action_12576725 ] Michael McCandless commented on LUCENE-1210: The fix is trivial: add a try/finally to mergeInit to clear the internal tracking on exception. I'll commit shortly. IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception -- Key: LUCENE-1210 URL: https://issues.apache.org/jira/browse/LUCENE-1210 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 If you're using CMS (the default) and mergeInit hits an exception (eg OOME), we are not properly clearing IndexWriter's internal tracking of running merges. This causes IW.close() to hang while it incorrectly waits for these non-started merges to finish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception
[ https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1210. Resolution: Fixed IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception -- Key: LUCENE-1210 URL: https://issues.apache.org/jira/browse/LUCENE-1210 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 If you're using CMS (the default) and mergeInit hits an exception (eg OOME), we are not properly clearing IndexWriter's internal tracking of running merges. This causes IW.close() to hang while it incorrectly waits for these non-started merges to finish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1211) Small speedups to DocumentsWriter's quickSort
Small speedups to DocumentsWriter's quickSort - Key: LUCENE-1211 URL: https://issues.apache.org/jira/browse/LUCENE-1211 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1, 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 In working on LUCENE-510 I found that DocumentsWriter's quickSort can be further optimized to handle the common case of sorting only 2 values. I ran with this alg: analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker docs.file=/Volumes/External/lucene/wiki.txt doc.stored = true doc.term.vector = true doc.add.log.step=2000 doc.maker.forever = false directory=FSDirectory autocommit=false compound=false ram.flush.mb=64 { Rounds ResetSystemErase { BuildIndex CreateIndex { AddDocs AddDoc : 20 - CloseIndex } NewRound } : 5 RepSumByPrefRound BuildIndex Best of 5 was 857.3 docs/sec before the optimization and 881.6 after = 2.8% speedup, on a quad-core Mac Pro with 4-drive RAID 0 array. The fix is trivial. I will commit shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1211) Small speedups to DocumentsWriter's quickSort
[ https://issues.apache.org/jira/browse/LUCENE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1211. Resolution: Fixed Small speedups to DocumentsWriter's quickSort - Key: LUCENE-1211 URL: https://issues.apache.org/jira/browse/LUCENE-1211 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 In working on LUCENE-510 I found that DocumentsWriter's quickSort can be further optimized to handle the common case of sorting only 2 values. I ran with this alg: analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker docs.file=/Volumes/External/lucene/wiki.txt doc.stored = true doc.term.vector = true doc.add.log.step=2000 doc.maker.forever = false directory=FSDirectory autocommit=false compound=false ram.flush.mb=64 { Rounds ResetSystemErase { BuildIndex CreateIndex { AddDocs AddDoc : 20 - CloseIndex } NewRound } : 5 RepSumByPrefRound BuildIndex Best of 5 was 857.3 docs/sec before the optimization and 881.6 after = 2.8% speedup, on a quad-core Mac Pro with 4-drive RAID 0 array. The fix is trivial. I will commit shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1212) Basic refactoring of DocumentsWriter
Basic refactoring of DocumentsWriter Key: LUCENE-1212 URL: https://issues.apache.org/jira/browse/LUCENE-1212 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.3.1, 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 As a starting point for making DocumentsWriter more understandable, I've fixed its inner classes to be static, and then broke the classes out into separate sources, all in org.apache.lucene.index package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1212) Basic refactoring of DocumentsWriter
[ https://issues.apache.org/jira/browse/LUCENE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1212: --- Attachment: LUCENE-1212.patch Attached patch. All tests pass. I will commit in a day or two. There is a small performance loss with this: 924.5 docs/sec vs 913.4 docs/sec = ~1.2%, best of 5 runs indexing first 200K docs of Wikipedia. But I think it's an acceptable tradeoff for cleaner code. Basic refactoring of DocumentsWriter Key: LUCENE-1212 URL: https://issues.apache.org/jira/browse/LUCENE-1212 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.3, 2.3.1 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 Attachments: LUCENE-1212.patch As a starting point for making DocumentsWriter more understandable, I've fixed its inner classes to be static, and then broke the classes out into separate sources, all in org.apache.lucene.index package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576770#action_12576770 ] [EMAIL PROTECTED] edited comment on LUCENE-1209 at 3/9/08 6:44 AM: - My algorithm is below. I see Round 0--1: doc.term.vector:false--true as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for doc.term.vector as well as the other guys in setConfig. More importantly, lets say I set it to true:falseif I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round. - Mark {code} ram.flush.mb=flush:32:32 compound=false analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory doc.stored=true doc.tokenized=tok:false:true doc.term.vector=vec:true:false doc.term.vector.offsets=tvo:false:true doc.term.vector.positions=tvp:false:true doc.add.log.step=2000 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { Rounds ResetSystemErase CreateIndex { MAddDocs AddDoc(60) } : 2 Optimize CloseIndex OpenReader { SrchTrvRetNewRdr SearchTravRet(10) : 1000 CloseReader OpenReader { SearchHlgtSameRdr SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) : 1000 CloseReader RepSumByPref SearchHlgtSameRdr NewRound } : 2 RepSumByNameRound RepSumByName RepSumByPrefRound MAddDocs {code} was (Author: [EMAIL PROTECTED]): My algorithm is below. I see Round 0--1: doc.term.vector:false--true as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for doc.term.vector as well as the other guys in setConfig. More importantly, lets say I set it to true:falseif I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round. - Mark code ram.flush.mb=flush:32:32 compound=false analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory doc.stored=true doc.tokenized=tok:false:true doc.term.vector=vec:true:false doc.term.vector.offsets=tvo:false:true doc.term.vector.positions=tvp:false:true doc.add.log.step=2000 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { Rounds ResetSystemErase CreateIndex { MAddDocs AddDoc(60) } : 2 Optimize CloseIndex OpenReader { SrchTrvRetNewRdr SearchTravRet(10) : 1000 CloseReader OpenReader { SearchHlgtSameRdr SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) : 1000 CloseReader RepSumByPref SearchHlgtSameRdr NewRound } : 2 RepSumByNameRound RepSumByName RepSumByPrefRound MAddDocs /code If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round - Key: LUCENE-1209 URL: https://issues.apache.org/jira/browse/LUCENE-1209 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Mark Miller Priority: Trivial Attachments: reset_config.patch I want to be able to run one benchmark that tests things using term vectors and not using term vectors. Currently this is not easy because you cannot specify term vectors per round. While you do have to create a new index per round, this automation is
[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576770#action_12576770 ] Mark Miller commented on LUCENE-1209: - My algorithm is below. I see Round 0--1: doc.term.vector:false--true as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for doc.term.vector as well as the other guys in setConfig. More importantly, lets say I set it to true:falseif I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round. - Mark code ram.flush.mb=flush:32:32 compound=false analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory doc.stored=true doc.tokenized=tok:false:true doc.term.vector=vec:true:false doc.term.vector.offsets=tvo:false:true doc.term.vector.positions=tvp:false:true doc.add.log.step=2000 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { Rounds ResetSystemErase CreateIndex { MAddDocs AddDoc(60) } : 2 Optimize CloseIndex OpenReader { SrchTrvRetNewRdr SearchTravRet(10) : 1000 CloseReader OpenReader { SearchHlgtSameRdr SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) : 1000 CloseReader RepSumByPref SearchHlgtSameRdr NewRound } : 2 RepSumByNameRound RepSumByName RepSumByPrefRound MAddDocs /code If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round - Key: LUCENE-1209 URL: https://issues.apache.org/jira/browse/LUCENE-1209 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Mark Miller Priority: Trivial Attachments: reset_config.patch I want to be able to run one benchmark that tests things using term vectors and not using term vectors. Currently this is not easy because you cannot specify term vectors per round. While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests. If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576770#action_12576770 ] [EMAIL PROTECTED] edited comment on LUCENE-1209 at 3/9/08 6:51 AM: - My algorithm is below. I see Round 0--1: doc.term.vector:false--true as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for doc.term.vector as well as the other guys in setConfig. More importantly, lets say I set it to true:falseif I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round. Mark you are right that setConfig is called just once, at start. At least for setting properties by round this should be sufficient. I wonder why this doesn't work for you. I think this admits the problem right? The get property for everything in setConfig is only called once...that loads up the false:true, returns false, and sets up true to be returned on the next call...the next time you call get on Config you will get the true...but there is no next time. Its only done once...so it shows up right in the output Round 0--1: doc.term.vector:false--true, but its only every called once and so only loads false. - Mark {code} ram.flush.mb=flush:32:32 compound=false analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory doc.stored=true doc.tokenized=tok:false:true doc.term.vector=vec:true:false doc.term.vector.offsets=tvo:false:true doc.term.vector.positions=tvp:false:true doc.add.log.step=2000 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { Rounds ResetSystemErase CreateIndex { MAddDocs AddDoc(60) } : 2 Optimize CloseIndex OpenReader { SrchTrvRetNewRdr SearchTravRet(10) : 1000 CloseReader OpenReader { SearchHlgtSameRdr SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) : 1000 CloseReader RepSumByPref SearchHlgtSameRdr NewRound } : 2 RepSumByNameRound RepSumByName RepSumByPrefRound MAddDocs {code} was (Author: [EMAIL PROTECTED]): My algorithm is below. I see Round 0--1: doc.term.vector:false--true as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for doc.term.vector as well as the other guys in setConfig. More importantly, lets say I set it to true:falseif I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round. - Mark {code} ram.flush.mb=flush:32:32 compound=false analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory doc.stored=true doc.tokenized=tok:false:true doc.term.vector=vec:true:false doc.term.vector.offsets=tvo:false:true doc.term.vector.positions=tvp:false:true doc.add.log.step=2000 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { Rounds ResetSystemErase CreateIndex { MAddDocs AddDoc(60) } : 2 Optimize CloseIndex OpenReader { SrchTrvRetNewRdr SearchTravRet(10) : 1000 CloseReader OpenReader { SearchHlgtSameRdr SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) : 1000 CloseReader RepSumByPref SearchHlgtSameRdr NewRound } : 2 RepSumByNameRound RepSumByName RepSumByPrefRound MAddDocs {code} If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[jira] Assigned: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-1209: --- Assignee: Doron Cohen If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round - Key: LUCENE-1209 URL: https://issues.apache.org/jira/browse/LUCENE-1209 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Mark Miller Assignee: Doron Cohen Priority: Trivial Attachments: reset_config.patch I want to be able to run one benchmark that tests things using term vectors and not using term vectors. Currently this is not easy because you cannot specify term vectors per round. While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests. If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576781#action_12576781 ] Doron Cohen commented on LUCENE-1209: - Ok I can see it now, you're right. So all doc maker per rounds settings were ignored - first round settings were used. I am updating TestPerfTasksLogic.testIndexWriterSettings() to catch this bug. Thanks for catching this, Doron If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round - Key: LUCENE-1209 URL: https://issues.apache.org/jira/browse/LUCENE-1209 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Mark Miller Assignee: Doron Cohen Priority: Trivial Attachments: reset_config.patch I want to be able to run one benchmark that tests things using term vectors and not using term vectors. Currently this is not easy because you cannot specify term vectors per round. While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests. If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1209: Attachment: reset_config.patch same fix + test case that fails without the fix. If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round - Key: LUCENE-1209 URL: https://issues.apache.org/jira/browse/LUCENE-1209 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Mark Miller Assignee: Doron Cohen Priority: Trivial Attachments: reset_config.patch, reset_config.patch I want to be able to run one benchmark that tests things using term vectors and not using term vectors. Currently this is not easy because you cannot specify term vectors per round. While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests. If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1209: Attachment: reset_config.patch QualityTest fails with previous patch, exposing a related bug in ReutersDocMaker, of not reseting files list at call to setConfig(), Was not required before, but now since setConfig is called more than once must clear the list of collected files. Attached file fixes this and all benchmark tests pass. If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round - Key: LUCENE-1209 URL: https://issues.apache.org/jira/browse/LUCENE-1209 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Mark Miller Assignee: Doron Cohen Priority: Trivial Attachments: reset_config.patch, reset_config.patch, reset_config.patch I want to be able to run one benchmark that tests things using term vectors and not using term vectors. Currently this is not easy because you cannot specify term vectors per round. While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests. If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
[ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-1209. - Resolution: Fixed Lucene Fields: [Patch Available] (was: [Patch Available, New]) Committed, thanks Mark! If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round - Key: LUCENE-1209 URL: https://issues.apache.org/jira/browse/LUCENE-1209 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Mark Miller Assignee: Doron Cohen Priority: Trivial Attachments: reset_config.patch, reset_config.patch, reset_config.patch I want to be able to run one benchmark that tests things using term vectors and not using term vectors. Currently this is not easy because you cannot specify term vectors per round. While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests. If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset. - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
[ https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1213: Component/s: QueryParser MultiFieldQueryParser ignores slop parameter Key: LUCENE-1213 URL: https://issues.apache.org/jira/browse/LUCENE-1213 Project: Lucene - Java Issue Type: Bug Components: QueryParser Reporter: Trejkaz MultiFieldQueryParser.getFieldQuery(String, String, int) calls super.getFieldQuery(String, String), thus obliterating any slop parameter present in the query. It should probably be changed to call super.getFieldQuery(String, String, int), except doing only that will result in a recursive loop which is a side-effect of what may be a deeper problem in MultiFieldQueryParser -- getFieldQuery(String, String, int) is documented as delegating to getFieldQuery(String, String), yet what it actually does is the exact opposite. This also causes problems for subclasses which need to override getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
MultiFieldQueryParser ignores slop parameter Key: LUCENE-1213 URL: https://issues.apache.org/jira/browse/LUCENE-1213 Project: Lucene - Java Issue Type: Bug Reporter: Trejkaz MultiFieldQueryParser.getFieldQuery(String, String, int) calls super.getFieldQuery(String, String), thus obliterating any slop parameter present in the query. It should probably be changed to call super.getFieldQuery(String, String, int), except doing only that will result in a recursive loop which is a side-effect of what may be a deeper problem in MultiFieldQueryParser -- getFieldQuery(String, String, int) is documented as delegating to getFieldQuery(String, String), yet what it actually does is the exact opposite. This also causes problems for subclasses which need to override getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
[ https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1213: Attachment: multifield-fix.patch Attaching one possible fix. It's more verbose than I wish it could be, but I couldn't think of a reliable way to make it delegate as it would require casting the result to BooleanQuery to get the clauses our, and a subclass may return something else entirely. MultiFieldQueryParser ignores slop parameter Key: LUCENE-1213 URL: https://issues.apache.org/jira/browse/LUCENE-1213 Project: Lucene - Java Issue Type: Bug Components: QueryParser Reporter: Trejkaz Attachments: multifield-fix.patch MultiFieldQueryParser.getFieldQuery(String, String, int) calls super.getFieldQuery(String, String), thus obliterating any slop parameter present in the query. It should probably be changed to call super.getFieldQuery(String, String, int), except doing only that will result in a recursive loop which is a side-effect of what may be a deeper problem in MultiFieldQueryParser -- getFieldQuery(String, String, int) is documented as delegating to getFieldQuery(String, String), yet what it actually does is the exact opposite. This also causes problems for subclasses which need to override getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576858#action_12576858 ] Asgeir Frimannsson commented on LUCENE-1026: Is there any specific reason why this indexaccessor is limited to FSDirectory based indexes? I see FSDirectory.getFile() is used as a unique key in the list of IndexAccessors in the factory. However, it seems more natural to use dir.getLockID() for this purpose. Then it would be possible to use a generic Directory rather than the file-system specific FSDirectory. Provide a simple way to concurrently access a Lucene index from multiple threads Key: LUCENE-1026 URL: https://issues.apache.org/jira/browse/LUCENE-1026 Project: Lucene - Java Issue Type: New Feature Components: Index, Search Reporter: Mark Miller Priority: Minor Attachments: DefaultIndexAccessor.java, DefaultMultiIndexAccessor.java, IndexAccessor-02.04.2008.zip, IndexAccessor-02.07.2008.zip, IndexAccessor-02.28.2008.zip, IndexAccessor-1.26.2008.zip, IndexAccessor-2.15.2008.zip, IndexAccessor.java, IndexAccessor.zip, IndexAccessorFactory.java, MultiIndexAccessor.java, shai-IndexAccessor-2.zip, shai-IndexAccessor.zip, shai-IndexAccessor3.zip, SimpleSearchServer.java, StopWatch.java, TestIndexAccessor.java For building interactive indexes accessed through a network/internet (multiple threads). This builds upon the LuceneIndexAccessor patch. That patch was not very newbie friendly and did not properly handle MultiSearchers (or at the least made it easy to get into trouble). This patch simplifies things and provides out of the box support for sharing the IndexAccessors across threads. There is also a simple test class and example SearchServer to get you started. Future revisions will be zipped. Works pretty solid as is, but could use the ability to warm new Searchers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]