[ 
https://issues.apache.org/jira/browse/SAMZA-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949501#comment-17949501
 ] 

Jon Bringhurst commented on SAMZA-2804:
---------------------------------------

This was merged into trunk.

> run-class.sh concurrency issues when on samza-yarn
> --------------------------------------------------
>
>                 Key: SAMZA-2804
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2804
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jon Bringhurst
>            Assignee: Jon Bringhurst
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Several possible issues were identified in run-class.sh, including:
> h2. Race condition in pathing jar manifest creation
> A race condition exists when setting up the classpath during container launch.
> During container launch using samza-yarn, run-class.sh creates a pathing jar 
> file (which holds the classpath for the container launch). However, during 
> the creation of this pathing jar, temporary files, as well as the pathing jar 
> itself is not placed in a location unique to the container. This results in 
> multiple containers writing to the same pathing jar location and temporary 
> file location, which results in a race condition.
> This race condition may show up in several ways, such as when Yarn removes 
> jars from a finished container (other containers will point to a classpath 
> which no longer exists) or when multiple run-class.sh scripts attempt to 
> write the manifest.txt or pathing jar at the same time.
> Note that host affinity being enabled will make this problem worse. The 
> pathing.jar is written to the usercache, so when the container which created 
> the pathing.jar is finished and removed, any new container which launches on 
> that host will point to jar files which do not exist anymore. When host 
> affinity is enabled, it will not move to a new host and just keep failing.
> h2. Container logging directory fallback is not unique for each container
> The fallback log directory is the same among all containers running on the 
> same host. It should be unique per-container.
> h2. Container tmp dir is not unique per-container
> The JAVA_TMP_DIR directory is the same for all containers. We should make 
> sure that it's safe to use the same directory for all containers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to