date:20080309

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576723#action_12576723
 ] 

Doron Cohen commented on LUCENE-1209:
-

Mark you are right that setConfig is called just once, at start.
At least for setting properties by round this should be sufficient. 
I wonder why this doesn't work for you.

I tried with this one:

{code}
compound=true

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=termVec:false:true
doc.add.log.step=10

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
task.max.depth.log=1

{

{ Populate
CreateIndex
{ AddDoc  : 50
Optimize
CloseIndex


ResetSystemErase
NewRound

} : 2

RepSumByName
RepSelectByPref Populate
{code}

And got this output:
{code}
 Working Directory: work
 Running algorithm from: conf\termVecByRound.alg
  config properties:
 analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
 compound = true
 directory = RamDirectory
 doc.add.log.step = 10
 doc.maker = org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
 doc.stored = true
 doc.term.vector = termVec:false:true
 doc.tokenized = true
 task.max.depth.log = 1
 work.dir = work
 ---
  algorithm:
 Seq {
 Seq_2 {
 Populate {
 CreateIndex
 Seq_50 {
 AddDoc
  * 50
 Optimize
 CloseIndex
 
 ResetSystemErase
 NewRound
 } * 2
 RepSumByName
 RepSelectByPref Populate
 }
 
  starting task: Seq
  starting task: Seq_2
 -- 0.1 sec: main processed (add) 10 docs
 -- 0.1 sec: main processed (add) 20 docs
 -- 0.11 sec: main processed (add) 30 docs
 -- 0.11 sec: main processed (add) 40 docs
 -- 0.11 sec: main processed (add) 50 docs
  SimpleDocMaker statistics (0): 
 num docs added since last inputs reset:   50
 total bytes added since last inputs reset: 42,150
 
 
 
 -- Round 0--1:   doc.term.vector:false--true
 
 -- 0 sec: main processed (add) 60 docs
 -- 0 sec: main processed (add) 70 docs
 -- 0 sec: main processed (add) 80 docs
 -- 0 sec: main processed (add) 90 docs
 -- 0 sec: main processed (add) 100 docs
  SimpleDocMaker statistics (1): 
 num docs added since last inputs reset:   50
 total bytes added since last inputs reset: 42,150
 
 
 
 -- Round 1--2:   doc.term.vector:true--false
 
 
  Report Sum By (any) Name (2 about 3 out of 4)
 Operation   round termVec   runCnt   recsPerRunrec/s  elapsedSec
avgUsedMemavgTotalMem
 Seq_2   0   false1  106530.00.20   
639,912  5,177,344
 Populate-   -2   53706.70.15   
839,552  5,177,344
 
 
  Report Select By Prefix (Populate) (2 about 2 out of 4)
 Operation   round termVec   runCnt   recsPerRunrec/s  elapsedSec
avgUsedMemavgTotalMem
 Populate0   false1   53378.60.14   
858,080  5,177,344
 Populate -  -   1 -  true -  -   1 -  -  -   53 -  - 5,300.0 -  -   0.01 -  -  
821,024 -  - 5,177,344
 
 
 ###  D O N E !!! ###
 
{code}

Note in particular this line:
{code}
[java] -- Round 0--1:   doc.term.vector:false--true 
{code}

Note that a *NewRound* command is required in order for the round number to 
change. 
{code}
NewRound
{code}

A possible cause for error is that the property definition parsing requires a 
property name prefix for multi-valued properties.
So this would not work as expected:
{code}
doc.term.vector=false:true
{code}

But this will work:
{code}
doc.term.vector=termVec:false:true
{code}

If it still doesn't work for you, can you post here the algorithm?

 If setConfig(Config config) is called in resetInputs(), you can turn term 
 vectors off and on by round
 -

 Key: LUCENE-1209
 URL: https://issues.apache.org/jira/browse/LUCENE-1209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Mark Miller
Priority: Trivial
 Attachments: reset_config.patch


 I want to be able to run one benchmark that tests things using term vectors 
 and not using term vectors.
 Currently this is not easy because you cannot specify term vectors per round.
 While you do have to create a new index per round, this automation is 
 preferable to me in comparison to running two separate tests.
 If it doesn't affect anything else, it would be great to

[jira] Created: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

2008-03-09 Thread Michael McCandless (JIRA)

IndexWriter  ConcurrentMergeScheduler deadlock case if starting a merge hits 
an exception
--

 Key: LUCENE-1210
 URL: https://issues.apache.org/jira/browse/LUCENE-1210
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1, 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


If you're using CMS (the default) and mergeInit hits an exception (eg
OOME), we are not properly clearing IndexWriter's internal tracking of
running merges.  This causes IW.close() to hang while it incorrectly
waits for these non-started merges to finish.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

2008-03-09 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576725#action_12576725
 ] 

Michael McCandless commented on LUCENE-1210:


The fix is trivial: add a try/finally to mergeInit to clear the
internal tracking on exception.  I'll commit shortly.

 IndexWriter  ConcurrentMergeScheduler deadlock case if starting a merge hits 
 an exception
 --

 Key: LUCENE-1210
 URL: https://issues.apache.org/jira/browse/LUCENE-1210
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


 If you're using CMS (the default) and mergeInit hits an exception (eg
 OOME), we are not properly clearing IndexWriter's internal tracking of
 running merges.  This causes IW.close() to hang while it incorrectly
 waits for these non-started merges to finish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

2008-03-09 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1210.


Resolution: Fixed

 IndexWriter  ConcurrentMergeScheduler deadlock case if starting a merge hits 
 an exception
 --

 Key: LUCENE-1210
 URL: https://issues.apache.org/jira/browse/LUCENE-1210
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


 If you're using CMS (the default) and mergeInit hits an exception (eg
 OOME), we are not properly clearing IndexWriter's internal tracking of
 running merges.  This causes IW.close() to hang while it incorrectly
 waits for these non-started merges to finish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1211) Small speedups to DocumentsWriter's quickSort

2008-03-09 Thread Michael McCandless (JIRA)

Small speedups to DocumentsWriter's quickSort
-

 Key: LUCENE-1211
 URL: https://issues.apache.org/jira/browse/LUCENE-1211
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1, 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


In working on LUCENE-510 I found that DocumentsWriter's quickSort can
be further optimized to handle the common case of sorting only 2
values.

I ran with this alg:

  analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
  
  doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
  
  docs.file=/Volumes/External/lucene/wiki.txt
  doc.stored = true
  doc.term.vector = true
  doc.add.log.step=2000
  doc.maker.forever = false
  
  directory=FSDirectory
  autocommit=false
  compound=false
  
  ram.flush.mb=64
  
  { Rounds
ResetSystemErase
{ BuildIndex
  CreateIndex
  { AddDocs AddDoc  : 20
  - CloseIndex
}
NewRound
  } : 5
  
  RepSumByPrefRound BuildIndex

Best of 5 was 857.3 docs/sec before the optimization and 881.6 after =
2.8% speedup, on a quad-core Mac Pro with 4-drive RAID 0 array.

The fix is trivial.  I will commit shortly.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1211) Small speedups to DocumentsWriter's quickSort

2008-03-09 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1211.


Resolution: Fixed

 Small speedups to DocumentsWriter's quickSort
 -

 Key: LUCENE-1211
 URL: https://issues.apache.org/jira/browse/LUCENE-1211
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


 In working on LUCENE-510 I found that DocumentsWriter's quickSort can
 be further optimized to handle the common case of sorting only 2
 values.
 I ran with this alg:
   analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
   
   doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
   
   docs.file=/Volumes/External/lucene/wiki.txt
   doc.stored = true
   doc.term.vector = true
   doc.add.log.step=2000
   doc.maker.forever = false
   
   directory=FSDirectory
   autocommit=false
   compound=false
   
   ram.flush.mb=64
   
   { Rounds
 ResetSystemErase
 { BuildIndex
   CreateIndex
   { AddDocs AddDoc  : 20
   - CloseIndex
 }
 NewRound
   } : 5
   
   RepSumByPrefRound BuildIndex
 Best of 5 was 857.3 docs/sec before the optimization and 881.6 after =
 2.8% speedup, on a quad-core Mac Pro with 4-drive RAID 0 array.
 The fix is trivial.  I will commit shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1212) Basic refactoring of DocumentsWriter

2008-03-09 Thread Michael McCandless (JIRA)

Basic refactoring of DocumentsWriter


 Key: LUCENE-1212
 URL: https://issues.apache.org/jira/browse/LUCENE-1212
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3.1, 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


As a starting point for making DocumentsWriter more understandable,
I've fixed its inner classes to be static, and then broke the classes
out into separate sources, all in org.apache.lucene.index package.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1212) Basic refactoring of DocumentsWriter

2008-03-09 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1212:
---

Attachment: LUCENE-1212.patch

Attached patch.  All tests pass.  I will commit in a day or two.

There is a small performance loss with this: 924.5 docs/sec vs 913.4
docs/sec = ~1.2%, best of 5 runs indexing first 200K docs of
Wikipedia.  But I think it's an acceptable tradeoff for cleaner code.

 Basic refactoring of DocumentsWriter
 

 Key: LUCENE-1212
 URL: https://issues.apache.org/jira/browse/LUCENE-1212
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4

 Attachments: LUCENE-1212.patch


 As a starting point for making DocumentsWriter more understandable,
 I've fixed its inner classes to be static, and then broke the classes
 out into separate sources, all in org.apache.lucene.index package.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576770#action_12576770
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-1209 at 3/9/08 6:44 AM:
-

My algorithm is below.

I see Round 0--1:   doc.term.vector:false--true as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for doc.term.vector as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ Rounds
  
ResetSystemErase

CreateIndex
{ MAddDocs AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { SrchTrvRetNewRdr SearchTravRet(10)  : 1000
CloseReader
OpenReader
  { SearchHlgtSameRdr 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
  : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}

  was (Author: [EMAIL PROTECTED]):
My algorithm is below.

I see Round 0--1:   doc.term.vector:false--true as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for doc.term.vector as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

- Mark


code
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ Rounds
  
ResetSystemErase

CreateIndex
{ MAddDocs AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { SrchTrvRetNewRdr SearchTravRet(10)  : 1000
CloseReader
OpenReader
  { SearchHlgtSameRdr 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
  : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
/code
  
 If setConfig(Config config) is called in resetInputs(), you can turn term 
 vectors off and on by round
 -

 Key: LUCENE-1209
 URL: https://issues.apache.org/jira/browse/LUCENE-1209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Mark Miller
Priority: Trivial
 Attachments: reset_config.patch


 I want to be able to run one benchmark that tests things using term vectors 
 and not using term vectors.
 Currently this is not easy because you cannot specify term vectors per round.
 While you do have to create a new index per round, this automation is

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576770#action_12576770
 ] 

Mark Miller commented on LUCENE-1209:
-

My algorithm is below.

I see Round 0--1:   doc.term.vector:false--true as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for doc.term.vector as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

- Mark


code
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ Rounds
  
ResetSystemErase

CreateIndex
{ MAddDocs AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { SrchTrvRetNewRdr SearchTravRet(10)  : 1000
CloseReader
OpenReader
  { SearchHlgtSameRdr 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
  : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
/code

 If setConfig(Config config) is called in resetInputs(), you can turn term 
 vectors off and on by round
 -

 Key: LUCENE-1209
 URL: https://issues.apache.org/jira/browse/LUCENE-1209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Mark Miller
Priority: Trivial
 Attachments: reset_config.patch


 I want to be able to run one benchmark that tests things using term vectors 
 and not using term vectors.
 Currently this is not easy because you cannot specify term vectors per round.
 While you do have to create a new index per round, this automation is 
 preferable to me in comparison to running two separate tests.
 If it doesn't affect anything else, it would be great to have 
 setConfig(Config config) called in BasicDocMaker.resetInputs(). This would 
 keep the term vector options up to date per round if you reset.
 - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576770#action_12576770
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-1209 at 3/9/08 6:51 AM:
-

My algorithm is below.

I see Round 0--1:   doc.term.vector:false--true as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for doc.term.vector as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

Mark you are right that setConfig is called just once, at start.
At least for setting properties by round this should be sufficient.
I wonder why this doesn't work for you. 

I think this admits the problem right? The get property for everything in 
setConfig is only called once...that loads up the false:true, returns false, 
and sets up true to be returned on the next call...the next time you call get 
on Config you will get the true...but there is no next time. Its only done 
once...so it shows up right in the output Round 0--1:   
doc.term.vector:false--true, but its only every called once and so only loads 
false.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ Rounds
  
ResetSystemErase

CreateIndex
{ MAddDocs AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { SrchTrvRetNewRdr SearchTravRet(10)  : 1000
CloseReader
OpenReader
  { SearchHlgtSameRdr 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
  : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}

  was (Author: [EMAIL PROTECTED]):
My algorithm is below.

I see Round 0--1:   doc.term.vector:false--true as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for doc.term.vector as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ Rounds
  
ResetSystemErase

CreateIndex
{ MAddDocs AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { SrchTrvRetNewRdr SearchTravRet(10)  : 1000
CloseReader
OpenReader
  { SearchHlgtSameRdr 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
  : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}
  
 If setConfig(Config config) is called in resetInputs(), you can turn term 
 vectors off and on by round

[jira] Assigned: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-1209:
---

Assignee: Doron Cohen

 If setConfig(Config config) is called in resetInputs(), you can turn term 
 vectors off and on by round
 -

 Key: LUCENE-1209
 URL: https://issues.apache.org/jira/browse/LUCENE-1209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Doron Cohen
Priority: Trivial
 Attachments: reset_config.patch


 I want to be able to run one benchmark that tests things using term vectors 
 and not using term vectors.
 Currently this is not easy because you cannot specify term vectors per round.
 While you do have to create a new index per round, this automation is 
 preferable to me in comparison to running two separate tests.
 If it doesn't affect anything else, it would be great to have 
 setConfig(Config config) called in BasicDocMaker.resetInputs(). This would 
 keep the term vector options up to date per round if you reset.
 - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Doron Cohen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576781#action_12576781
]

Doron Cohen commented on LUCENE-1209:
-

Ok I can see it now, you're right.
So all doc maker per rounds settings were ignored - first round settings were
used.
I am updating TestPerfTasksLogic.testIndexWriterSettings() to catch this bug.
Thanks for catching this,
Doron

If setConfig(Config config) is called in resetInputs(), you can turn term
vectors off and on by round
-

Key: LUCENE-1209
URL: https://issues.apache.org/jira/browse/LUCENE-1209
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Doron Cohen
Priority: Trivial
Attachments: reset_config.patch

I want to be able to run one benchmark that tests things using term vectors
and not using term vectors.
Currently this is not easy because you cannot specify term vectors per round.
While you do have to create a new index per round, this automation is
preferable to me in comparison to running two separate tests.
If it doesn't affect anything else, it would be great to have
setConfig(Config config) called in BasicDocMaker.resetInputs(). This would
keep the term vector options up to date per round if you reset.
- Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-1209:


Attachment: reset_config.patch

same fix + test case that fails without the fix.

 If setConfig(Config config) is called in resetInputs(), you can turn term 
 vectors off and on by round
 -

 Key: LUCENE-1209
 URL: https://issues.apache.org/jira/browse/LUCENE-1209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Doron Cohen
Priority: Trivial
 Attachments: reset_config.patch, reset_config.patch


 I want to be able to run one benchmark that tests things using term vectors 
 and not using term vectors.
 Currently this is not easy because you cannot specify term vectors per round.
 While you do have to create a new index per round, this automation is 
 preferable to me in comparison to running two separate tests.
 If it doesn't affect anything else, it would be great to have 
 setConfig(Config config) called in BasicDocMaker.resetInputs(). This would 
 keep the term vector options up to date per round if you reset.
 - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Doron Cohen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Doron Cohen updated LUCENE-1209:

Attachment: reset_config.patch

QualityTest fails with previous patch, exposing a related bug in
ReutersDocMaker,
of not reseting files list at call to setConfig(), Was not required before, but
now since
setConfig is called more than once must clear the list of collected files.
Attached file fixes this and all benchmark tests pass.

If setConfig(Config config) is called in resetInputs(), you can turn term
vectors off and on by round
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-1209.
-

   Resolution: Fixed
Lucene Fields: [Patch Available]  (was: [Patch Available, New])

Committed, thanks Mark!

 If setConfig(Config config) is called in resetInputs(), you can turn term 
 vectors off and on by round
 -

 Key: LUCENE-1209
 URL: https://issues.apache.org/jira/browse/LUCENE-1209
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Doron Cohen
Priority: Trivial
 Attachments: reset_config.patch, reset_config.patch, 
 reset_config.patch


 I want to be able to run one benchmark that tests things using term vectors 
 and not using term vectors.
 Currently this is not easy because you cannot specify term vectors per round.
 While you do have to create a new index per round, this automation is 
 preferable to me in comparison to running two separate tests.
 If it doesn't affect anything else, it would be great to have 
 setConfig(Config config) called in BasicDocMaker.resetInputs(). This would 
 keep the term vector options up to date per round if you reset.
 - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-09 Thread Trejkaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1213:


Component/s: QueryParser

 MultiFieldQueryParser ignores slop parameter
 

 Key: LUCENE-1213
 URL: https://issues.apache.org/jira/browse/LUCENE-1213
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Reporter: Trejkaz

 MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
 super.getFieldQuery(String, String), thus obliterating any slop parameter 
 present in the query.
 It should probably be changed to call super.getFieldQuery(String, String, 
 int), except doing only that will result in a recursive loop which is a 
 side-effect of what may be a deeper problem in MultiFieldQueryParser -- 
 getFieldQuery(String, String, int) is documented as delegating to 
 getFieldQuery(String, String), yet what it actually does is the exact 
 opposite.  This also causes problems for subclasses which need to override 
 getFieldQuery(String, String) to provide different behaviour.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-09 Thread Trejkaz (JIRA)

MultiFieldQueryParser ignores slop parameter


 Key: LUCENE-1213
 URL: https://issues.apache.org/jira/browse/LUCENE-1213
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Trejkaz


MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
super.getFieldQuery(String, String), thus obliterating any slop parameter 
present in the query.

It should probably be changed to call super.getFieldQuery(String, String, int), 
except doing only that will result in a recursive loop which is a side-effect 
of what may be a deeper problem in MultiFieldQueryParser -- 
getFieldQuery(String, String, int) is documented as delegating to 
getFieldQuery(String, String), yet what it actually does is the exact opposite. 
 This also causes problems for subclasses which need to override 
getFieldQuery(String, String) to provide different behaviour.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-09 Thread Trejkaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trejkaz updated LUCENE-1213:


Attachment: multifield-fix.patch

Attaching one possible fix.  It's more verbose than I wish it could be, but I 
couldn't think of a reliable way to make it delegate as it would require 
casting the result to BooleanQuery to get the clauses our, and a subclass may 
return something else entirely.


 MultiFieldQueryParser ignores slop parameter
 

 Key: LUCENE-1213
 URL: https://issues.apache.org/jira/browse/LUCENE-1213
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Reporter: Trejkaz
 Attachments: multifield-fix.patch


 MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
 super.getFieldQuery(String, String), thus obliterating any slop parameter 
 present in the query.
 It should probably be changed to call super.getFieldQuery(String, String, 
 int), except doing only that will result in a recursive loop which is a 
 side-effect of what may be a deeper problem in MultiFieldQueryParser -- 
 getFieldQuery(String, String, int) is documented as delegating to 
 getFieldQuery(String, String), yet what it actually does is the exact 
 opposite.  This also causes problems for subclasses which need to override 
 getFieldQuery(String, String) to provide different behaviour.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2008-03-09 Thread Asgeir Frimannsson (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576858#action_12576858
]

Asgeir Frimannsson commented on LUCENE-1026:

Is there any specific reason why this indexaccessor is limited to FSDirectory
based indexes? I see FSDirectory.getFile() is used as a unique key in the list
of IndexAccessors in the factory. However, it seems more natural to use
dir.getLockID() for this purpose. Then it would be possible to use a generic
Directory rather than the file-system specific FSDirectory.

Provide a simple way to concurrently access a Lucene index from multiple
threads

Key: LUCENE-1026
URL: https://issues.apache.org/jira/browse/LUCENE-1026
Project: Lucene - Java
Issue Type: New Feature
Components: Index, Search
Reporter: Mark Miller
Priority: Minor
Attachments: DefaultIndexAccessor.java,
DefaultMultiIndexAccessor.java, IndexAccessor-02.04.2008.zip,
IndexAccessor-02.07.2008.zip, IndexAccessor-02.28.2008.zip,
IndexAccessor-1.26.2008.zip, IndexAccessor-2.15.2008.zip, IndexAccessor.java,
IndexAccessor.zip, IndexAccessorFactory.java, MultiIndexAccessor.java,
shai-IndexAccessor-2.zip, shai-IndexAccessor.zip, shai-IndexAccessor3.zip,
SimpleSearchServer.java, StopWatch.java, TestIndexAccessor.java

For building interactive indexes accessed through a network/internet
(multiple threads).
This builds upon the LuceneIndexAccessor patch. That patch was not very
newbie friendly and did not properly handle MultiSearchers (or at the least
made it easy to get into trouble).
This patch simplifies things and provides out of the box support for sharing
the IndexAccessors across threads. There is also a simple test class and
example SearchServer to get you started.
Future revisions will be zipped.
Works pretty solid as is, but could use the ability to warm new Searchers.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Created: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

[jira] Commented: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

[jira] Resolved: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

[jira] Created: (LUCENE-1211) Small speedups to DocumentsWriter's quickSort

[jira] Resolved: (LUCENE-1211) Small speedups to DocumentsWriter's quickSort

[jira] Created: (LUCENE-1212) Basic refactoring of DocumentsWriter

[jira] Updated: (LUCENE-1212) Basic refactoring of DocumentsWriter

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Assigned: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Resolved: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

[jira] Created: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

[jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

20 matches

Site Navigation

Mail list logo

Footer information