Didn't know about the no-html-policy, sorry, the red ones were -

vishid[numItems][softmax][totalFeatures]; DistCache

visbiases[numItems][softmax]; DistCache

hidbiases[totalFeatures]; DistCache

CDinc[numItems][softmax][totalFeatures]; DistCache

hidbiasinc[totalFeatures]; DistCache

visbiasinc[numItems][softmax]; DistCache


And the green ones were -


CDpos[numItems][softmax][totalFeatures]; X 0

CDneg[numItems][softmax][totalFeatures]; X 0

poshidprobs[totalFeatures]; X Depend on DistCache variables

poshidstates[totalFeatures]; X Depend on DistCache variables

curposhidstates[totalFeatures]; X Depend on DistCache variables

poshidact[totalFeatures]; X 0

neghidact[totalFeatures]; X 0

neghidprobs[totalFeatures]; X Depend on DistCache variables

neghidstates[totalFeatures]; X Depend on DistCache variables

nvp2[numItems][softmax]; X 0

negvisprobs[numItems][softmax]; X 0

negvissoftmax[numItems]; X 0

posvisact[numItems][softmax]; X 0

negvisact[numItems][softmax]; X 0


The redistributed 'red' data structures are proportional to numItems...I am
also trying a version where the mapreduce occurs operation-wise(by using the
original algorithm but replacing Matrices with DistributedRowMatrices and so
on) - this version is being pushed to the rbm2 branch on my github repo. In
this approach, I am handling some of the above 3D matrices as vectors of
DistributedRowMatrices(this is in my working copy).


I am not sure userwise or operationwise is better though. The paper does
talk about parallelization - and it prefers to do it user-wise, and then
take the average of the biases after each iteration, which is why I didn't
try the operation-wise mapred earlier. I am also not perfectly sure the
approach will outperform the one they suggest. Jake also had a look at the
same thing earlier and he suggested user-wise parallelization.

Reply via email to