[ 
https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698743#action_12698743
 ] 

Michael McCandless commented on LUCENE-1591:
--------------------------------------------

The test seems to assume you can take any Task in the source tree, and make an 
alg that simply creates that task.

I think that assumption is in fact wrong, because tasks like WriteLineDocTask 
indeed require certain configuration (line.file.out) be set, and the test can't 
know that.  Other tasks in the future will presumably hit the same issue.

Also, thinking about the test, I think it doesn't add much value?  Elsewhere we 
heavily test that the .alg parser works properly.  And all this test does is 
take every task, and stick it in either "XXX",  "[ XXX ] : 2"  or "{ XXX } : 
3", parse it, and verify it parsed properly.

I think we should simply turn those three tests off?  Or, if that seems to 
drastic, simply skipping WriteLineDocTask seems OK too?

> Enable bzip compression in benchmark
> ------------------------------------
>
>                 Key: LUCENE-1591
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1591
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: commons-compress-dev20090413.jar, 
> commons-compress-dev20090413.jar, LUCENE-1591.patch, LUCENE-1591.patch, 
> LUCENE-1591.patch, LUCENE-1591.patch
>
>
> bzip compression can aid the benchmark package by not requiring extracting 
> bzip files (such as enwiki) in order to index them. The plan is to add a 
> config parameter bzip.compression=true/false and in the relevant tasks either 
> decompress the input file or compress the output file using the bzip streams.
> It will add a dependency on ant.jar which contains two classes similar to 
> GZIPOutputStream and GZIPInputStream which compress/decompress files using 
> the bzip algorithm.
> bzip is known to be superior in its compression performance to the gzip 
> algorithm (~20% better compression), although it does the 
> compression/decompression a bit slower.
> I wil post a patch which adds this parameter and implement it in 
> LineDocMaker, EnwikiDocMaker and WriteLineDoc task. Maybe even add the 
> capability to DocMaker or some of the super classes, so it can be inherited 
> by all sub-classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to