[ 
https://issues.apache.org/jira/browse/SINGA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangwei updated SINGA-35:
-------------------------
    Description: 
In SINGA-6, we added thread specific Singleton which is used to get a  
Mshadow::Random object for each thread. Hence the random number generation is 
thead-safe if using Mshadow::Random and TSingleton.

But for other places, we are still using rand() without seeding it. One problem 
is that two process would have the same random numbers if using rand(). One 
case is the AllReduce training. If two processes (each has one worker) are 
launched, then they would randomly skip the same number of training records 
because the ShardDataLayer uses rand() to get the skip number. In other words, 
the two workers' training are exactly the same. There is not difference 
(training time and accuracy) compared with non-distributed training. 

A thread specific or global random number generator should be well initialized 
to avoid the above case. The thread specific generator can be passed to Mshadow 
for random distribution generation. 

  was:
In SINGA-6, we added thread specific Singleton which is used to get a  
Mshadow::Random object for each thread. Hence the random number generation is 
thead-safe if using Mshadow::Random and TSingleton.

But for other places, we are still using rand() without seeding it. One problem 
is that two process would have the same random numbers if using rand(). One 
case is the AllReduce training. If we processes (each has one worker) are 
launched, then they would randomly skip the the number of training records 
because the ShardDataLayer uses rand() to get the skip number. In other words, 
the two workers' training are exactly the same. There is not difference 
(training time and accuracy) compared with non-distributed training. 

A thread specific or global random number generator should be well initialized 
to avoid the above case. The thread specific generator can be passed to Mshadow 
for random distribution generation. 


> Add random number generators
> ----------------------------
>
>                 Key: SINGA-35
>                 URL: https://issues.apache.org/jira/browse/SINGA-35
>             Project: Singa
>          Issue Type: Bug
>            Reporter: wangwei
>
> In SINGA-6, we added thread specific Singleton which is used to get a  
> Mshadow::Random object for each thread. Hence the random number generation is 
> thead-safe if using Mshadow::Random and TSingleton.
> But for other places, we are still using rand() without seeding it. One 
> problem is that two process would have the same random numbers if using 
> rand(). One case is the AllReduce training. If two processes (each has one 
> worker) are launched, then they would randomly skip the same number of 
> training records because the ShardDataLayer uses rand() to get the skip 
> number. In other words, the two workers' training are exactly the same. There 
> is not difference (training time and accuracy) compared with non-distributed 
> training. 
> A thread specific or global random number generator should be well 
> initialized to avoid the above case. The thread specific generator can be 
> passed to Mshadow for random distribution generation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to