[jira] [Comment Edited] (LUCENE-5941) IndexWriter.forceMerge documentation error

2014-09-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131150#comment-14131150
 ] 

Shai Erera edited comment on LUCENE-5941 at 9/12/14 6:13 AM:
-

Patch modifies the test to assert up to 3X disk usage. I beasted and it fails 
with this:

{noformat}
  [beaster]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestIndexWriterForceMerge 
-Dtests.method=testForceMergeTempSpaceUsage -Dtests.seed=AEA68CE694BB7732 
-Dtests.slow=true -Dtests.locale=ru_RU -Dtests.timezone=US/Pacific-New 
-Dtests.file.encoding=Cp1255
  [beaster] [09:05:02.935] FAILURE 1.52s | 
TestIndexWriterForceMerge.testForceMergeTempSpaceUsage <<<
  [beaster]> Throwable #1: java.lang.AssertionError: forceMerge used too 
much temporary space: starting usage was 385138 bytes; max temp usage was 
1216310 but should have been 1155414 (= 3X starting usage)
  [beaster]>at 
__randomizedtesting.SeedInfo.seed([AEA68CE694BB7732:B4644F15FAAB94F5]:0)
  [beaster]>at org.junit.Assert.fail(Assert.java:93)
  [beaster]>at org.junit.Assert.assertTrue(Assert.java:43)
  [beaster]>at 
org.apache.lucene.index.TestIndexWriterForceMerge.testForceMergeTempSpaceUsage(TestIndexWriterForceMerge.java:162)
{noformat}

I still haven't dug into this, will do so later. But if anyone has an 
explanation to why we may consume up to 4X the starting disk usage, please post 
here.


was (Author: shaie):
Patch modifies the test to assert up to 3X disk usage. I beasted and it fails 
with this:

{noformat}
  [beaster] Started J0 PID(5216@SHAIE-TP).
  [beaster]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestIndexWriterForceMerge 
-Dtests.method=testForceMergeTempSpaceUsage -Dtests.seed=AEA68CE694BB7732 
-Dtests.slow=true -Dtests.locale=ru_RU -Dtests.timezone=US/Pacific-New 
-Dtests.file.encoding=Cp1255
  [beaster] [09:05:02.935] FAILURE 1.52s | 
TestIndexWriterForceMerge.testForceMergeTempSpaceUsage <<<
  [beaster]> Throwable #1: java.lang.AssertionError: forceMerge used too 
much temporary space: starting usage was 385138 bytes; max temp usage was 
1216310 but should have been 1155414 (= 3X starting usage)
  [beaster]>at 
__randomizedtesting.SeedInfo.seed([AEA68CE694BB7732:B4644F15FAAB94F5]:0)
  [beaster]>at org.junit.Assert.fail(Assert.java:93)
  [beaster]>at org.junit.Assert.assertTrue(Assert.java:43)
  [beaster]>at 
org.apache.lucene.index.TestIndexWriterForceMerge.testForceMergeTempSpaceUsage(TestIndexWriterForceMerge.java:162)
{noformat}

I still haven't dug into this, will do so later. But if anyone has an 
explanation to why we may consume up to 4X the starting disk usage, please post 
here.

> IndexWriter.forceMerge documentation error
> --
>
> Key: LUCENE-5941
> URL: https://issues.apache.org/jira/browse/LUCENE-5941
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5941.patch
>
>
> IndexWriter.forceMerge documents that it requires up to 3X *FREE* space in 
> order to run successfully. We even go further with it and test it in 
> TestIWForceMerge.testForceMergeTempSpaceUsage(). But I think that's wrong. I 
> cannot think of a situation where we consume 3X *additional* space during 
> merge:
> * 1X - that's the source segments to be merged
> * 2X - that's the result non-CFS merged segment
> * 3X - that's the CFS creation
> At no point do we publish the non-CFS merged segment, therefore the merge, as 
> I understand it, only consumes up to 2X additional space during that merge.
> And anyway, we only require 2X of additional space of the *largest* merge (or 
> total batch of running merges, depends on your MergeScheduler), not the whole 
> index size. This is an important observation, since if you e.g. have a 500GB 
> index, users shouldn't think they need to reserve an additional 1TB for 
> merging, since most of their big segments won't be merged by default anyway 
> (TieredMP defaults to 5GB largest segment).
> I'll post a patch which fixes the documentation and the test. If anyone can 
> think of a scenario where we consume up to 3X *additional* space, please 
> chime, and I'll only modify IW.forceMerge documentation to explain that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5941) IndexWriter.forceMerge documentation error

2014-09-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131561#comment-14131561
 ] 

Shai Erera edited comment on LUCENE-5941 at 9/12/14 2:04 PM:
-

I added {{dir.setEnableVirusScanner(false);}}, but the test still fails (and I 
verify that before the failure virus scanner is indeed disabled.. I will 
disable virus scanner for this test anyway though, because it asserts disk 
usage and we don't want any surprises.


was (Author: shaie):
I added {{dir.setEnableVirusScanner(false);}}, but the test still fails (and I 
verify that before the failure virus scanner is indeed disabled..

> IndexWriter.forceMerge documentation error
> --
>
> Key: LUCENE-5941
> URL: https://issues.apache.org/jira/browse/LUCENE-5941
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5941.patch
>
>
> IndexWriter.forceMerge documents that it requires up to 3X *FREE* space in 
> order to run successfully. We even go further with it and test it in 
> TestIWForceMerge.testForceMergeTempSpaceUsage(). But I think that's wrong. I 
> cannot think of a situation where we consume 3X *additional* space during 
> merge:
> * 1X - that's the source segments to be merged
> * 2X - that's the result non-CFS merged segment
> * 3X - that's the CFS creation
> At no point do we publish the non-CFS merged segment, therefore the merge, as 
> I understand it, only consumes up to 2X additional space during that merge.
> And anyway, we only require 2X of additional space of the *largest* merge (or 
> total batch of running merges, depends on your MergeScheduler), not the whole 
> index size. This is an important observation, since if you e.g. have a 500GB 
> index, users shouldn't think they need to reserve an additional 1TB for 
> merging, since most of their big segments won't be merged by default anyway 
> (TieredMP defaults to 5GB largest segment).
> I'll post a patch which fixes the documentation and the test. If anyone can 
> think of a scenario where we consume up to 3X *additional* space, please 
> chime, and I'll only modify IW.forceMerge documentation to explain that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org