Re: Benchmaking col vs row similarities
I will increase memory for the job...that will also fix it right ? On Apr 10, 2015 12:43 PM, Reza Zadeh r...@databricks.com wrote: You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am benchmarking row vs col similarity flow on 60M x 10M matrices... Details are in this JIRA: https://issues.apache.org/jira/browse/SPARK-4823 For testing I am using Netflix data since the structure is very similar: 50k x 17K near dense similarities.. Items are 17K and so I did not activate threshold in colSimilarities yet (it's at 1e-4) Running Spark on YARN with 20 nodes, 4 cores, 16 gb, shuffle threshold 0.6 I keep getting these from col similarity code from 1.2 branch. Should I use Master ? 15/04/10 11:08:36 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, tblpmidn36adv-hdp.tdc.vzwcorp.com, 44410) with no recent heart beats: 50315ms exceeds 45000ms 15/04/10 11:09:12 ERROR ContextCleaner: Error cleaning broadcast 1012 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227) at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66) at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138) at scala.Option.foreach(Option.scala:236) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) at org.apache.spark.ContextCleaner.org $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133) at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) I knew how to increase the 45 ms to something higher as it is compute heavy job but in YARN, I am not sure how to set that config.. But in any-case that's a warning and should not affect the job... Any idea how to improve the runtime other than increasing threshold to 1e-2 ? I will do that next Was netflix dataset benchmarked for col based similarity flow before ? similarity output from this dataset becomes near dense and so it is interesting for stress testing... Thanks. Deb
Re: Benchmaking col vs row similarities
Depends... The heartbeat you received happens due to GC pressure (probably due to Full GC). If you increase the memory too much, the GC's may be less frequent, but the Full GC's may take longer. Try increasing the following confs: spark.executor.heartbeatInterval spark.core.connection.ack.wait.timeout Best, Burak On Fri, Apr 10, 2015 at 8:52 PM, Debasish Das debasish.da...@gmail.com wrote: I will increase memory for the job...that will also fix it right ? On Apr 10, 2015 12:43 PM, Reza Zadeh r...@databricks.com wrote: You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am benchmarking row vs col similarity flow on 60M x 10M matrices... Details are in this JIRA: https://issues.apache.org/jira/browse/SPARK-4823 For testing I am using Netflix data since the structure is very similar: 50k x 17K near dense similarities.. Items are 17K and so I did not activate threshold in colSimilarities yet (it's at 1e-4) Running Spark on YARN with 20 nodes, 4 cores, 16 gb, shuffle threshold 0.6 I keep getting these from col similarity code from 1.2 branch. Should I use Master ? 15/04/10 11:08:36 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, tblpmidn36adv-hdp.tdc.vzwcorp.com, 44410) with no recent heart beats: 50315ms exceeds 45000ms 15/04/10 11:09:12 ERROR ContextCleaner: Error cleaning broadcast 1012 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227) at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66) at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138) at scala.Option.foreach(Option.scala:236) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) at org.apache.spark.ContextCleaner.org $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133) at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) I knew how to increase the 45 ms to something higher as it is compute heavy job but in YARN, I am not sure how to set that config.. But in any-case that's a warning and should not affect the job... Any idea how to improve the runtime other than increasing threshold to 1e-2 ? I will do that next Was netflix dataset benchmarked for col based similarity flow before ? similarity output from this dataset becomes near dense and so it is interesting for stress testing... Thanks. Deb
Re: Benchmaking col vs row similarities
You should pull in this PR: https://github.com/apache/spark/pull/5364 It should resolve that. It is in master. Best, Reza On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am benchmarking row vs col similarity flow on 60M x 10M matrices... Details are in this JIRA: https://issues.apache.org/jira/browse/SPARK-4823 For testing I am using Netflix data since the structure is very similar: 50k x 17K near dense similarities.. Items are 17K and so I did not activate threshold in colSimilarities yet (it's at 1e-4) Running Spark on YARN with 20 nodes, 4 cores, 16 gb, shuffle threshold 0.6 I keep getting these from col similarity code from 1.2 branch. Should I use Master ? 15/04/10 11:08:36 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(5, tblpmidn36adv-hdp.tdc.vzwcorp.com, 44410) with no recent heart beats: 50315ms exceeds 45000ms 15/04/10 11:09:12 ERROR ContextCleaner: Error cleaning broadcast 1012 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227) at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45) at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66) at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138) at scala.Option.foreach(Option.scala:236) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468) at org.apache.spark.ContextCleaner.org $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133) at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) I knew how to increase the 45 ms to something higher as it is compute heavy job but in YARN, I am not sure how to set that config.. But in any-case that's a warning and should not affect the job... Any idea how to improve the runtime other than increasing threshold to 1e-2 ? I will do that next Was netflix dataset benchmarked for col based similarity flow before ? similarity output from this dataset becomes near dense and so it is interesting for stress testing... Thanks. Deb