[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576770#action_12576770
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-1209 at 3/9/08 6:51 AM:
-

My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for "doc.term.vector" as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

>>Mark you are right that setConfig is called just once, at start.
>>At least for setting properties by round this should be sufficient.
>>I wonder why this doesn't work for you. 

I think this admits the problem right? The get property for everything in 
setConfig is only called once...that loads up the "false:true", returns false, 
and sets up "true" to be returned on the next call...the next time you call get 
on Config you will get the "true"...but there is no next time. Its only done 
once...so it shows up right in the output "Round 0-->1:   
doc.term.vector:false-->true", but its only every called once and so only loads 
false.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
  
ResetSystemErase

CreateIndex
{ "MAddDocs" AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
CloseReader
OpenReader
  { "SearchHlgtSameRdr" 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
 > : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}

  was (Author: [EMAIL PROTECTED]):
My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for "doc.term.vector" as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
  
ResetSystemErase

CreateIndex
{ "MAddDocs" AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
CloseReader
OpenReader
  { "SearchHlgtSameRdr" 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
 > : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}
  
> If setConfig(Config config) is called in resetInputs(), you can turn term 
> vectors off and on by round
> ---

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-09 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576770#action_12576770
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-1209 at 3/9/08 6:44 AM:
-

My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for "doc.term.vector" as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
  
ResetSystemErase

CreateIndex
{ "MAddDocs" AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
CloseReader
OpenReader
  { "SearchHlgtSameRdr" 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
 > : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}

  was (Author: [EMAIL PROTECTED]):
My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put 
a debug print on what is returned from public boolean get (String name, boolean 
dflt), it is only ever called once for "doc.term.vector" as well as the other 
guys in setConfig.

More importantly, lets say I set it to true:falseif I look at the 
work/index directory on the second run, there are certainly term vectors. Thats 
how I noticed this to begin with...I was looking at the index and saw the term 
vector files on every round. Its possible I have something messed up, but every 
time I run through everything again and it really does not seem to be working. 
If I set term vectors to false:true, they are never made in any round.

- Mark



ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# 
-

{ "Rounds"
  
ResetSystemErase

CreateIndex
{ "MAddDocs" AddDoc(60) } : 2
Optimize
CloseIndex
  
OpenReader
  { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
CloseReader
OpenReader
  { "SearchHlgtSameRdr" 
SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
 > : 1000

CloseReader

RepSumByPref SearchHlgtSameRdr

NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs

  
> If setConfig(Config config) is called in resetInputs(), you can turn term 
> vectors off and on by round
> -
>
> Key: LUCENE-1209
> URL: https://issues.apache.org/jira/browse/LUCENE-1209
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Affects Versions: 2.4
>Reporter: Mark Miller
>Priority: Trivial
> Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors 
> and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a n

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

2008-03-08 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576615#action_12576615
 ] 

doronc edited comment on LUCENE-1209 at 3/8/08 1:17 PM:
-

Config maintains properties by round, so this should do the trick: 

{code}
doc.term.vector=tvf:true:false
{code}

It sets term-vectors to true in round 0, false in round 1, true in round 2, etc.
Also, a column is added to the reports with the value of this property ('tvf'). 

Unless you already tried this and it didn't work?


  was (Author: doronc):
Config maintains properties by round, so this should do the trick: 

{code}
doc.term.vector=tvf:true:false:true
{code}

It sets term-vectors to true in round 0, false in round 1, true in round 2, etc.
Also, a column is added to the reports with the value of this property ('tvf'). 

Unless you already tried this and it didn't work?

  
> If setConfig(Config config) is called in resetInputs(), you can turn term 
> vectors off and on by round
> -
>
> Key: LUCENE-1209
> URL: https://issues.apache.org/jira/browse/LUCENE-1209
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Affects Versions: 2.4
>Reporter: Mark Miller
>Priority: Trivial
> Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors 
> and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is 
> preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have 
> setConfig(Config config) called in BasicDocMaker.resetInputs(). This would 
> keep the term vector options up to date per round if you reset.
> - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]