[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963100#comment-14963100 ] Michael McCandless commented on LUCENE-6829: bq. Michael McCandless, looks like your commit to RAMDirector.listAll() on this issue is the problem: Sigh, sorry [~steve_rowe] and thanks for digging/fixing. Tricky concurrency... > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch, LUCENE-6829.patch, > LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962819#comment-14962819 ] Steve Rowe commented on LUCENE-6829: Another test failure my Jenkins found with the same root cause, only reproduces 8/100 iters for me: {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterCommit -Dtests.method=testPrepareCommitRollback -Dtests.seed=98FCDFF6002C97E1 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/lucene-data/e nwiki.random.lines.txt -Dtests.locale=is -Dtests.timezone=Australia/Sydney -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.32s | TestIndexWriterCommit.testPrepareCommitRollback <<< [junit4]> Throwable #1: java.lang.NullPointerException [junit4]>at __randomizedtesting.SeedInfo.seed([98FCDFF6002C97E1:9 4F93159522F894E]:0) [junit4]>at java.util.ComparableTimSort.binarySort(ComparableTimS ort.java:258) [junit4]>at java.util.ComparableTimSort.sort(ComparableTimSort.ja va:203) [junit4]>at java.util.Arrays.sort(Arrays.java:1246) [junit4]>at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:660) [junit4]>at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:73) [junit4]>at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63) [junit4]>at org.apache.lucene.index.TestIndexWriterCommit.testPrepareCommitRollback(TestIndexWriterCommit.java:599) [junit4]>at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {content=Lucene50(blocksize=128)}, docValues:{}, sim=RandomSimilarityProvider(queryNorm=true,coord=crazy): {content=DFR I(ne)3(800.0)}, locale=is, timezone=Australia/Sydney [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=401791056,total=514850816 [junit4] 2> NOTE: All tests run in this JVM: [TestIndexWriterCommit] {noformat} > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch, LUCENE-6829.patch, > LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958600#comment-14958600 ] ASF subversion and git services commented on LUCENE-6829: - Commit 1708760 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1708760 ] LUCENE-6829: OfflineSorter now uses Directory API; add Directory.createTempOutput and IndexOutput.getName > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch, LUCENE-6829.patch, > LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954520#comment-14954520 ] Dawid Weiss commented on LUCENE-6829: - This is a large patch and I only scanned it briefly. It looks good to me. I don't know how to avoid the virus checker special case (requiring odd hacks in the code to disable it). Also, blocks like this one: {code} + Path tempPath = Files.createTempDirectory(Dictionary.getDefaultTempDir(), "Hunspell"); + boolean success = false; + try (Directory tempDir = FSDirectory.open(tempPath)) { +this.dictionary = new Dictionary(tempDir, "hunspell", affix, dictionaries, ignoreCase); +success = true; + } finally { +// tempPath (directory) should be empty at this point: +if (success) { + Files.delete(tempPath); +} else { + IOUtils.deleteFilesIgnoringExceptions(tempPath); +} + } {code} Is there any reason why we shouldn't just let the regular exception suppression be used here? I know it'd reverse the precedence, but at least you'd get the full picture (temp. file couldn't be deleted too). Isn't this a leftover pattern from before 1.7 days? > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch, LUCENE-6829.patch, > LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954759#comment-14954759 ] Michael McCandless commented on LUCENE-6829: bq. Otherwise you'd have temp files piling up without any visible reason? Exactly! > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch, LUCENE-6829.patch, > LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954742#comment-14954742 ] Dawid Weiss commented on LUCENE-6829: - There was something odd about it. :) I think IOUtils.rm sounds good -- we always attempt to remove the temp. stuff. If we can't do it for some reason, we should fail explicitly. Otherwise you'd have temp files piling up without any visible reason? > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch, LUCENE-6829.patch, > LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954738#comment-14954738 ] Michael McCandless commented on LUCENE-6829: bq. So, to be clear, why isn't the above just: Woops: on exception I had wanted that code block to recursively remove the temp dir, and any files under it, because the temp dir is likely not empty on exception. But the patch does not do that, because {{IOUtils.deleteFilesIgnoringExceptions}} will not recursively delete. I think I need to change it to your new version, but use {{IOUtils.rm}} instead? > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch, LUCENE-6829.patch, > LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949983#comment-14949983 ] Dawid Weiss commented on LUCENE-6829: - bq. StandardOpenOption.CREATE_NEW Oh, I like this! > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949598#comment-14949598 ] Robert Muir commented on LUCENE-6829: - {quote} Second, this is an API break for things that secretly used OfflineSorter (suggesters, Hunspell Dictionary, etc.), but I think this is good because it makes it clear to the caller that disk space is going to be used by this method and lets the call control where, vs running the risk of e.g. filling up your tmp partition. Some things still wanted access to "the default temp dir", so I moved it from OfflineSorter to IOUtils, and kept the test infra initializing that to the mock fs wrapped version. Hmm I think maybe it's just Hunspell, maybe I can remove that from IOUtils and just put it (privately) in Hunspell. I think ideally nothing in Lucene should be secretly using /tmp anymore. {quote} Yeah, +1 for fixing that, I think the break is ok. It was a trap before. {quote} I also think somehow we should extend the "retry the file delete" that IW/IFD provides "down" to things like OfflineSorter {quote} Its more than that right? (with the current patch) Its also inflateGens too? :) Looks like something went horribly wrong with the formatting of TestRandomChains? {quote} I think to fix this more safely we need to add a new required (abstract) method to Directory: {quote} So would FSDirectory implement this with the obvious Files.createTempFile() ? That would be my vote. I guess some could argue its no longer technically write-once, since methods like that rely on the atomicity of creating a (zero-byte) file, but we are just going to append to that file after, once. And its a separate method, solely as a temporary file facility, so I think this is how it should be implemented? Deserves some thought I guess either way. > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6829) OfflineSorter should use Directory API
[ https://issues.apache.org/jira/browse/LUCENE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949608#comment-14949608 ] Robert Muir commented on LUCENE-6829: - and we always have the option of our own simple "temp file logic" that just uses StandardOpenOption.CREATE_NEW to ensure its really unique. I think i like this better...I don't think we have to worry about crazy security concerns like someone guessing what the filename will be. And really we should be opening all files with that flag anyway, its just that we don't yet today because files aren't always write-once in exceptional cases. > OfflineSorter should use Directory API > -- > > Key: LUCENE-6829 > URL: https://issues.apache.org/jira/browse/LUCENE-6829 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6829.patch, LUCENE-6829.patch > > > I think this is a blocker for LUCENE-6825, because the block KD-tree makes > heavy use of OfflineSorter and we don't want to fill up tmp space ... > This should be a straightforward cutover, but there are some challenges, e.g. > the test was failing because virus checker blocked deleting of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org