[
https://issues.apache.org/jira/browse/SAMZA-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949501#comment-17949501
]
Jon Bringhurst commented on SAMZA-2804:
---------------------------------------
This was merged into trunk.
> run-class.sh concurrency issues when on samza-yarn
> --------------------------------------------------
>
> Key: SAMZA-2804
> URL: https://issues.apache.org/jira/browse/SAMZA-2804
> Project: Samza
> Issue Type: Bug
> Reporter: Jon Bringhurst
> Assignee: Jon Bringhurst
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Several possible issues were identified in run-class.sh, including:
> h2. Race condition in pathing jar manifest creation
> A race condition exists when setting up the classpath during container launch.
> During container launch using samza-yarn, run-class.sh creates a pathing jar
> file (which holds the classpath for the container launch). However, during
> the creation of this pathing jar, temporary files, as well as the pathing jar
> itself is not placed in a location unique to the container. This results in
> multiple containers writing to the same pathing jar location and temporary
> file location, which results in a race condition.
> This race condition may show up in several ways, such as when Yarn removes
> jars from a finished container (other containers will point to a classpath
> which no longer exists) or when multiple run-class.sh scripts attempt to
> write the manifest.txt or pathing jar at the same time.
> Note that host affinity being enabled will make this problem worse. The
> pathing.jar is written to the usercache, so when the container which created
> the pathing.jar is finished and removed, any new container which launches on
> that host will point to jar files which do not exist anymore. When host
> affinity is enabled, it will not move to a new host and just keep failing.
> h2. Container logging directory fallback is not unique for each container
> The fallback log directory is the same among all containers running on the
> same host. It should be unique per-container.
> h2. Container tmp dir is not unique per-container
> The JAVA_TMP_DIR directory is the same for all containers. We should make
> sure that it's safe to use the same directory for all containers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)