Hi, I have been using SystemML for sometime and I am finding it extremely useful for scaling up my algorithm using Spark. However there area few aspects which I am fully not understanding and would like to have some clarification
My System Configuration: 244gb RAM, 32 Cores. My spark Configuration: 'spark.executor.cores', '4' 'spark.driver.memory', '80g' 'spark.executor.memory', '20g' 'spark.memory.fraction', '0.75' 'spark.worker.cleanup.enabled', 'true' 'spark.default.parallelism','1' I have a process in R which I am trying to implement. The process is similar to randomForest involving growing trees. Now The way the process is in R I parallelize it using the parLapply statement where n trees are grown in n parallel processes. I have implemented the algorithm in an identical way and tried running it using parfor loop. There are two main issues I am facing 1. In R using ncore = 16 i get 30 trees in 10 mins but in spark via systemml the process is taking 1 hour. 2. Also I have noticed that if one tree takes 2 mins to run 5 trees take 7-8 mins to run. It seems to me I am unable to parallelize the process by trees in SystemML It would be great if someone can help me out with this Thank you Rajarshi