[ 
https://issues.apache.org/jira/browse/SYSTEMML-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691648#comment-15691648
 ] 

Felix Schüler edited comment on SYSTEMML-1127 at 11/23/16 11:36 PM:
--------------------------------------------------------------------

So I see two ways to go here and would need some more info on what's going on 
to decide which one to chose:

1) Give each thread its own cache directory
2) Synchronize the LocalFileUtils.createLocalFileIfNotExist() (or the 
createWorkingDirectoryWithUUID) method and have threads share the cache

It seems like the parfor workers use the folder created in 
/tmp/systemml/pid_host use this directory as cache. Is this a cache per process 
or per thread? If a worker spawns multiple threads they will run in the same 
process and a call to create this directory will generate a race condition and 
throw an error. [~mboehm7] could you give me some advice on this?


was (Author: fschueler):
So I see two ways to go here and would need some more info on what's going on 
to decide which one to chose:

1) Give each thread its own cache directory
2) Synchronize the LocalFileUtils.createLocalFileIfNotExist() method and have 
threads share the cache

It seems like the parfor workers use the folder created in 
/tmp/systemml/pid_host use this directory as cache. Is this a cache per process 
or per thread? If a worker spawns multiple threads they will run in the same 
process and a call to create this directory will generate a race condition and 
throw an error. [~mboehm7] could you give me some advice on this?

> Distributed unique IDs are not unique
> -------------------------------------
>
>                 Key: SYSTEMML-1127
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1127
>             Project: SystemML
>          Issue Type: Bug
>          Components: ParFor
>            Reporter: Felix Schüler
>
> When executing a Spark parfor, the SparkParforWorker throws an exception 
> which states that the localtmpdir could not be created. This is due to the 
> fact that multiple executors are running multithreaded on the same worker. 
> The createDistributedUniqueID() method in the IDHander.java creates unique 
> IDs only per pid and host, not per thread. This could potentially be solved 
> by adding the threadID to the unique ID. The question is if every thread 
> should have its own cache or if the logic should be changed so that the first 
> creation will be successful and then the threads share one cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to