[ https://issues.apache.org/jira/browse/BEAM-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Weise updated BEAM-3272: ------------------------------- Priority: Minor (was: Critical) > ParDoTranslatorTest: Error creating local cluster while creating checkpoint > file > -------------------------------------------------------------------------------- > > Key: BEAM-3272 > URL: https://issues.apache.org/jira/browse/BEAM-3272 > Project: Beam > Issue Type: Bug > Components: runner-apex > Reporter: Eugene Kirpichov > Priority: Minor > Labels: flake, sickbay > Fix For: 2.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Failed build: > https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-apex/5330/console > Key output: > {code} > 2017-11-29T01:21:26.956 [ERROR] > testAssertionFailure(org.apache.beam.runners.apex.translation.ParDoTranslatorTest) > Time elapsed: 2.007 s <<< ERROR! > java.lang.RuntimeException: Error creating local cluster > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.getController(EmbeddedAppLauncherImpl.java:122) > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:71) > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:46) > at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:197) > at > org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:57) > at > org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:31) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:304) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:290) > at > org.apache.beam.runners.apex.translation.ParDoTranslatorTest.runExpectingAssertionFailure(ParDoTranslatorTest.java:156) > {code} > ... > {code} > Caused by: ExitCodeException exitCode=1: chmod: cannot access > ‘/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/com.datatorrent.stram.StramLocalCluster/checkpoints/2/_tmp’: > No such file or directory > at org.apache.hadoop.util.Shell.runCommand(Shell.java:582) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:866) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:849) > at > org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209) > at > org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328) > at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1017) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:99) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.<init>(ChecksumFs.java:352) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:399) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:584) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:686) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:682) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:688) > at > com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:119) > ... 50 more > {code} > By inspecting code at the stack frames, seems it's trying to copy an > operator's checkpoint "to HDFS" (which in this case is the local disk), but > fails while creating the target file of the copy - creation creates the file > (successfully) and chmods it writable (unsuccessfully). Barring something > subtle (e.g. chmod being not allowed to call immediately after creating a > FileOutputStream), this looks like the whole directory was possibly deleted > from under the process. I don't know why this would be the case though, or > how to debug it. > Either way, the path being accessed is funky: > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/... > - I think it'd be better if this test used a "@Rule TemporaryFolder" to > store Apex checkpoints. I don't know whether the Apex runner allows that, but > I can see how it could help reduce interference between tests and potentially > resolve this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)