Re: Benchmaking col vs row similarities

2015-04-10 Thread Debasish Das
I will increase memory for the job...that will also fix it right ?
On Apr 10, 2015 12:43 PM, Reza Zadeh r...@databricks.com wrote:

 You should pull in this PR: https://github.com/apache/spark/pull/5364
 It should resolve that. It is in master.
 Best,
 Reza

 On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com
 wrote:

 Hi,

 I am benchmarking row vs col similarity flow on 60M x 10M matrices...

 Details are in this JIRA:

 https://issues.apache.org/jira/browse/SPARK-4823

 For testing I am using Netflix data since the structure is very similar:
 50k x 17K near dense similarities..

 Items are 17K and so I did not activate threshold in colSimilarities yet
 (it's at 1e-4)

 Running Spark on YARN with 20 nodes, 4 cores, 16 gb, shuffle threshold 0.6

 I keep getting these from col similarity code from 1.2 branch. Should I
 use Master ?

 15/04/10 11:08:36 WARN BlockManagerMasterActor: Removing BlockManager
 BlockManagerId(5, tblpmidn36adv-hdp.tdc.vzwcorp.com, 44410) with no
 recent heart beats: 50315ms exceeds 45000ms

 15/04/10 11:09:12 ERROR ContextCleaner: Error cleaning broadcast 1012

 java.util.concurrent.TimeoutException: Futures timed out after [30
 seconds]

 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

 at scala.concurrent.Await$.result(package.scala:107)

 at
 org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137)

 at
 org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227)

 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)

 at
 org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)

 at
 org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138)

 at scala.Option.foreach(Option.scala:236)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)

 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)

 at org.apache.spark.ContextCleaner.org
 $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)

 at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)

 I knew how to increase the 45 ms to something higher as it is compute
 heavy job but in YARN, I am not sure how to set that config..

 But in any-case that's a warning and should not affect the job...

 Any idea how to improve the runtime other than increasing threshold to
 1e-2 ? I will do that next

 Was netflix dataset benchmarked for col based similarity flow before ?
 similarity output from this dataset becomes near dense and so it is
 interesting for stress testing...

 Thanks.

 Deb





Re: Benchmaking col vs row similarities

2015-04-10 Thread Burak Yavuz
Depends... The heartbeat you received happens due to GC pressure (probably
due to Full GC). If you increase the memory too much, the GC's may be less
frequent, but the Full GC's may take longer. Try increasing the following
confs:

spark.executor.heartbeatInterval
spark.core.connection.ack.wait.timeout

Best,
Burak

On Fri, Apr 10, 2015 at 8:52 PM, Debasish Das debasish.da...@gmail.com
wrote:

 I will increase memory for the job...that will also fix it right ?
 On Apr 10, 2015 12:43 PM, Reza Zadeh r...@databricks.com wrote:

 You should pull in this PR: https://github.com/apache/spark/pull/5364
 It should resolve that. It is in master.
 Best,
 Reza

 On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com
 wrote:

 Hi,

 I am benchmarking row vs col similarity flow on 60M x 10M matrices...

 Details are in this JIRA:

 https://issues.apache.org/jira/browse/SPARK-4823

 For testing I am using Netflix data since the structure is very similar:
 50k x 17K near dense similarities..

 Items are 17K and so I did not activate threshold in colSimilarities yet
 (it's at 1e-4)

 Running Spark on YARN with 20 nodes, 4 cores, 16 gb, shuffle threshold
 0.6

 I keep getting these from col similarity code from 1.2 branch. Should I
 use Master ?

 15/04/10 11:08:36 WARN BlockManagerMasterActor: Removing BlockManager
 BlockManagerId(5, tblpmidn36adv-hdp.tdc.vzwcorp.com, 44410) with no
 recent heart beats: 50315ms exceeds 45000ms

 15/04/10 11:09:12 ERROR ContextCleaner: Error cleaning broadcast 1012

 java.util.concurrent.TimeoutException: Futures timed out after [30
 seconds]

 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

 at scala.concurrent.Await$.result(package.scala:107)

 at
 org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137)

 at
 org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227)

 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)

 at
 org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)

 at
 org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138)

 at scala.Option.foreach(Option.scala:236)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)

 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)

 at org.apache.spark.ContextCleaner.org
 $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)

 at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)

 I knew how to increase the 45 ms to something higher as it is compute
 heavy job but in YARN, I am not sure how to set that config..

 But in any-case that's a warning and should not affect the job...

 Any idea how to improve the runtime other than increasing threshold to
 1e-2 ? I will do that next

 Was netflix dataset benchmarked for col based similarity flow before ?
 similarity output from this dataset becomes near dense and so it is
 interesting for stress testing...

 Thanks.

 Deb





Re: Benchmaking col vs row similarities

2015-04-10 Thread Reza Zadeh
You should pull in this PR: https://github.com/apache/spark/pull/5364
It should resolve that. It is in master.
Best,
Reza

On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das debasish.da...@gmail.com
wrote:

 Hi,

 I am benchmarking row vs col similarity flow on 60M x 10M matrices...

 Details are in this JIRA:

 https://issues.apache.org/jira/browse/SPARK-4823

 For testing I am using Netflix data since the structure is very similar:
 50k x 17K near dense similarities..

 Items are 17K and so I did not activate threshold in colSimilarities yet
 (it's at 1e-4)

 Running Spark on YARN with 20 nodes, 4 cores, 16 gb, shuffle threshold 0.6

 I keep getting these from col similarity code from 1.2 branch. Should I
 use Master ?

 15/04/10 11:08:36 WARN BlockManagerMasterActor: Removing BlockManager
 BlockManagerId(5, tblpmidn36adv-hdp.tdc.vzwcorp.com, 44410) with no
 recent heart beats: 50315ms exceeds 45000ms

 15/04/10 11:09:12 ERROR ContextCleaner: Error cleaning broadcast 1012

 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]

 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

 at scala.concurrent.Await$.result(package.scala:107)

 at
 org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137)

 at
 org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227)

 at
 org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)

 at
 org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)

 at
 org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:185)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:147)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:138)

 at scala.Option.foreach(Option.scala:236)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:138)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)

 at
 org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)

 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)

 at org.apache.spark.ContextCleaner.org
 $apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)

 at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)

 I knew how to increase the 45 ms to something higher as it is compute
 heavy job but in YARN, I am not sure how to set that config..

 But in any-case that's a warning and should not affect the job...

 Any idea how to improve the runtime other than increasing threshold to
 1e-2 ? I will do that next

 Was netflix dataset benchmarked for col based similarity flow before ?
 similarity output from this dataset becomes near dense and so it is
 interesting for stress testing...

 Thanks.

 Deb