Didn't know about the no-html-policy, sorry, the red ones were - vishid[numItems][softmax][totalFeatures]; DistCache
visbiases[numItems][softmax]; DistCache hidbiases[totalFeatures]; DistCache CDinc[numItems][softmax][totalFeatures]; DistCache hidbiasinc[totalFeatures]; DistCache visbiasinc[numItems][softmax]; DistCache And the green ones were - CDpos[numItems][softmax][totalFeatures]; X 0 CDneg[numItems][softmax][totalFeatures]; X 0 poshidprobs[totalFeatures]; X Depend on DistCache variables poshidstates[totalFeatures]; X Depend on DistCache variables curposhidstates[totalFeatures]; X Depend on DistCache variables poshidact[totalFeatures]; X 0 neghidact[totalFeatures]; X 0 neghidprobs[totalFeatures]; X Depend on DistCache variables neghidstates[totalFeatures]; X Depend on DistCache variables nvp2[numItems][softmax]; X 0 negvisprobs[numItems][softmax]; X 0 negvissoftmax[numItems]; X 0 posvisact[numItems][softmax]; X 0 negvisact[numItems][softmax]; X 0 The redistributed 'red' data structures are proportional to numItems...I am also trying a version where the mapreduce occurs operation-wise(by using the original algorithm but replacing Matrices with DistributedRowMatrices and so on) - this version is being pushed to the rbm2 branch on my github repo. In this approach, I am handling some of the above 3D matrices as vectors of DistributedRowMatrices(this is in my working copy). I am not sure userwise or operationwise is better though. The paper does talk about parallelization - and it prefers to do it user-wise, and then take the average of the biases after each iteration, which is why I didn't try the operation-wise mapred earlier. I am also not perfectly sure the approach will outperform the one they suggest. Jake also had a look at the same thing earlier and he suggested user-wise parallelization.
