[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010039#comment-13010039
 ] 

Doron Cohen commented on LUCENE-2980:
-

bq. Perhaps we should add a specific test in CSTest for this problem? I 
wouldn't use file.delete() as in indicator because on Linux it will pass

Agree, I'll add one.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010043#comment-13010043
 ] 

Doron Cohen commented on LUCENE-2980:
-

bq. Perhaps we should add a specific test in CSTest for this problem? I 
wouldn't use file.delete() as in indicator because on Linux it will pass

Changed my mind about adding this test to ContentSourceTest - I think such a 
test fits more to the CommonCompress project, because it should directly call 
CompressorStreamFactory.createCompressorInputStream(in). In our test we invoke 
ContentSource.getInputStream(File) and so we cannot pass such a close-sensing 
stream. 

But this is a valid point, especially, the test case I provided to COMPRESS-127 
will fail on Windows but will likely pass on Linux. I'll add a reference to 
your comment in COMPRESS-127.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010064#comment-13010064
 ] 

Shai Erera commented on LUCENE-2980:


Agreed.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009796#comment-13009796
 ] 

Shai Erera commented on LUCENE-2980:


Patch looks good. Few tiny comments:

* Should ContentSourceTest extend BenchmarkTestCase?
* I think that instead of assertTrue(testDir.isDirectory()); you can 
assertTrue(testDir.mkdirs());
* In case you wanted a second opinion about the nocommit lines, I think they 
can all go away :).

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-22 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009869#comment-13009869
 ] 

Doron Cohen commented on LUCENE-2980:
-

Thanks Shai!

I fixed the super class and the assert as suggested.

For those nocommits, they stand for a larger problem - I was ready for a 
trivial fix for this bug - just lower case the extension in ContentSource 
before consulting with the map. However the test failed, and I found out that 
this is because the input stream returned by 
CompressorStreamFactory.createCompressorInputStream() does not close its 
underlying stream when it is exhausted or when its close method is called. 

I opened COMPRESS-127 for this.

As a workaround to this bug, ContentSource now returns a wrapper on the input 
stream created by the CsFactory, delegates all methods to it, except for 
close() which is also delegated to the underlying stream. This fix is required 
for the extension letter cases tests to pass, but it fixes a more serious 
problem, - leaking file handles in ContentSource.

As Solr also makes use of CommonCompress I searched in it for references to 
CompressorStreamFactory.createCompressorInputStream() but found none, so it 
seems Solr is not affected by COMPRESS-127.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)

2011-03-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009986#comment-13009986
 ] 

Shai Erera commented on LUCENE-2980:


That's a serious problem - good catch !

Patch looks good. Perhaps we should add a specific test in CSTest for this 
problem? I wouldn't use file.delete() as in indicator because on Linux it will 
pass. Perhaps a test which writes to a byte[] and then an extension of 
ByteArrayInputStream would mark whether close() was called and the test would 
assert on it.

 Benchmark's ContentSource should not rely on file suffixes to be lower cased 
 when detecting file type (gzip/bzip2/text)
 ---

 Key: LUCENE-2980
 URL: https://issues.apache.org/jira/browse/LUCENE-2980
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2980.patch, LUCENE-2980.patch


 file.gz is correctly handled as gzip, but file.GZ handled as text which is 
 wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org