[
https://issues.apache.org/jira/browse/SINGA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wangwei updated SINGA-35:
-------------------------
Description:
In SINGA-6, we added thread specific Singleton which is used to get a
Mshadow::Random object for each thread. Hence the random number generation is
thead-safe if using Mshadow::Random and TSingleton.
But for other places, we are still using rand() without seeding it. One problem
is that two process would have the same random numbers if using rand(). One
case is the AllReduce training. If two processes (each has one worker) are
launched, then they would randomly skip the same number of training records
because the ShardDataLayer uses rand() to get the skip number. In other words,
the two workers' training are exactly the same. There is not difference
(training time and accuracy) compared with non-distributed training.
A thread specific or global random number generator should be well initialized
to avoid the above case. The thread specific generator can be passed to Mshadow
for random distribution generation.
was:
In SINGA-6, we added thread specific Singleton which is used to get a
Mshadow::Random object for each thread. Hence the random number generation is
thead-safe if using Mshadow::Random and TSingleton.
But for other places, we are still using rand() without seeding it. One problem
is that two process would have the same random numbers if using rand(). One
case is the AllReduce training. If we processes (each has one worker) are
launched, then they would randomly skip the the number of training records
because the ShardDataLayer uses rand() to get the skip number. In other words,
the two workers' training are exactly the same. There is not difference
(training time and accuracy) compared with non-distributed training.
A thread specific or global random number generator should be well initialized
to avoid the above case. The thread specific generator can be passed to Mshadow
for random distribution generation.
> Add random number generators
> ----------------------------
>
> Key: SINGA-35
> URL: https://issues.apache.org/jira/browse/SINGA-35
> Project: Singa
> Issue Type: Bug
> Reporter: wangwei
>
> In SINGA-6, we added thread specific Singleton which is used to get a
> Mshadow::Random object for each thread. Hence the random number generation is
> thead-safe if using Mshadow::Random and TSingleton.
> But for other places, we are still using rand() without seeding it. One
> problem is that two process would have the same random numbers if using
> rand(). One case is the AllReduce training. If two processes (each has one
> worker) are launched, then they would randomly skip the same number of
> training records because the ShardDataLayer uses rand() to get the skip
> number. In other words, the two workers' training are exactly the same. There
> is not difference (training time and accuracy) compared with non-distributed
> training.
> A thread specific or global random number generator should be well
> initialized to avoid the above case. The thread specific generator can be
> passed to Mshadow for random distribution generation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)