Eugene Kirpichov created BEAM-3272:
--------------------------------------

             Summary: ParDoTranslatorTest: Error creating local cluster while 
creating checkpoint file
                 Key: BEAM-3272
                 URL: https://issues.apache.org/jira/browse/BEAM-3272
             Project: Beam
          Issue Type: Bug
          Components: runner-apex
    Affects Versions: 2.3.0
            Reporter: Eugene Kirpichov
            Assignee: Thomas Weise
            Priority: Minor


Failed build: 
https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-apex/5330/console

Key output:

{code}
2017-11-29T01:21:26.956 [ERROR] 
testAssertionFailure(org.apache.beam.runners.apex.translation.ParDoTranslatorTest)
  Time elapsed: 2.007 s  <<< ERROR!
java.lang.RuntimeException: Error creating local cluster
        at 
org.apache.apex.engine.EmbeddedAppLauncherImpl.getController(EmbeddedAppLauncherImpl.java:122)
        at 
org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:71)
        at 
org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:46)
        at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:197)
        at 
org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:57)
        at 
org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:31)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:304)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:290)
        at 
org.apache.beam.runners.apex.translation.ParDoTranslatorTest.runExpectingAssertionFailure(ParDoTranslatorTest.java:156)
{code}
...
{code}
Caused by: ExitCodeException exitCode=1: chmod: cannot access 
‘/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/com.datatorrent.stram.StramLocalCluster/checkpoints/2/_tmp’:
 No such file or directory

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
        at org.apache.hadoop.util.Shell.run(Shell.java:479)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
        at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1017)
        at 
org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:99)
        at 
org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.<init>(ChecksumFs.java:352)
        at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:399)
        at 
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:584)
        at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:686)
        at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:682)
        at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
        at org.apache.hadoop.fs.FileContext.create(FileContext.java:688)
        at 
com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:119)
        ... 50 more
{code}

By inspecting code at the stack frames, seems it's trying to copy an operator's 
checkpoint "to HDFS" (which in this case is the local disk), but fails while 
creating the target file of the copy - creation creates the file (successfully) 
and chmods it writable (unsuccessfully). Barring something subtle (e.g. chmod 
being not allowed to call immediately after creating a FileOutputStream), this 
looks like the whole directory was possibly deleted from under the process. I 
don't know why this would be the case though, or how to debug it.

Either way, the path being accessed is funky: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/...
 - I think it'd be better if this test used a "@Rule TemporaryFolder" to store 
Apex checkpoints. I don't know whether the Apex runner allows that, but I can 
see how it could help reduce interference between tests and potentially resolve 
this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to