[
https://issues.apache.org/jira/browse/LUCENE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514600
]
Michael McCandless commented on LUCENE-947:
-------------------------------------------
Thanks for the review Doron!
> 1. TestPerfTasksParse - why do you prevent the testing of parsing of
> WriteLineDoc?
> I disabled the special handling of this and the test works as supposed.
Hmmm ... I was seeing a failure if I didn't do that because
WriteLineDoc requires "line.file.out" Config to be set and that test
didn't know to do so. I'll put it back into the test but add
"line.file.out" for this task.
> 2. Documentation of new properties is missing:
> - In CreateIndexTask: ram.flush.mb [0], autocommit [true]
> - In byTask.package.html (same 2 props).
OK, I'll add this and also for "doc.term.vector.{offsets,positions}"
to BasicDocMaker.
> 3. run.flush & aotoCommit should be added & used & documented also in
> openIndexTask (currently only used in createIndexTask).
OK, I'll add this.
> 4. AddDocTask: flushAtRAMUsage - unused?
Yup, this was leftover from pre LUCENE-843 where you had to check RAM
usage after each doc and then flush. I'll remove it and actually just revert
to current AddDocTask.java (I don't need any mods here).
> 5. buil.xml - 1024m as default for running a benchmark seems too much?
> I mean, one of the nice things about Lucene is that it can run for you
> even if you only have few MB of RAM to spare. For someone with a low level
> machine, say 512M only, the JVM might fail to even start, right?
Woops... I didn't mean to put this change in. I'll leave it where it
was (140 MB) and remove the "-server" jvmarg as well. I was hitting
OOM on some Wikipedia algs.
> 6. I like your change of factoring some of the field names into consts. We
> should probably do the same for the rest.
OK I'll pull out the remaining ones...
> 7. I didn' t try the new WriteLineDocTask and LineDocMaker feed. Partly
> because there was no ready to use alg for that under conf/, and also no test
> for that. Do you think we should add at least one of these two (preferably
> both)? - I can help with this.
OK I'll do both of these.
> Some improvements to contrib/benchmark
> --------------------------------------
>
> Key: LUCENE-947
> URL: https://issues.apache.org/jira/browse/LUCENE-947
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: LUCENE-947.patch, LUCENE-947.take2.patch
>
>
> I've made some small improvements to the contrib/benchmark, mostly
> merging in the ad-hoc benchmarking code I've been using in LUCENE-843:
> - Fixed thread safety of DirDocMaker's usage of SimpleDateFormat
> - Print the props in sorted order
> - Added new config "autocommit=true|false" to CreateIndexTask
> - Added new config "ram.flush.mb=int" to AddDocTask
> - Added new configs "doc.term.vector.positions=true|false" and
> "doc.term.vector.offsets=true|false" to BasicDocMaker
> - Added WriteLineDocTask.java, so you can make an alg that uses this
> to build up a single file containing one document per line in a
> single file. EG this alg converts the reuters-out tree into a
> single file that has ~1000 bytes per body field, saved to
> work/reuters.1000.txt:
> docs.dir=reuters-out
> doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker
> line.file.out=work/reuters.1000.txt
> doc.maker.forever=false
> {WriteLineDoc(1000)}: *
> Each line has tab-separted TITLE, DATE, BODY fields.
> - Created feeds/LineDocMaker.java that creates documents read from
> the file created by WriteLineDocTask.java. EG this alg indexes
> all documents created above:
> analyzer=org.apache.lucene.analysis.SimpleAnalyzer
> directory=FSDirectory
> doc.add.log.step=500
> docs.file=work/reuters.1000.txt
> doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
> doc.tokenized=true
> doc.maker.forever=false
> ResetSystemErase
> CreateIndex
> {AddDoc}: *
> CloseIndex
> RepSumByPref AddDoc
> I'll attach initial patch shortly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]