subject:"\"count vs countByValue in for\\\\\\\/yield\""

count vs countByValue in for/yield

2014-07-15 Thread Ognen Duzlevski


Hello,

I am curious about something:

val result = for {
  (dt,evrdd) <- evrdds
  val ct = evrdd.count
} yield (dt->ct)

works.

val result = for {
  (dt,evrdd) <- evrdds
  val ct = evrdd.countByValue
} yield (dt->ct)

does not work. I get:
14/07/15 16:46:33 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/07/15 16:46:33 WARN TaskSetManager: Loss was due to 
java.lang.NullPointerException

java.lang.NullPointerException
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)

What is the difference? Is it in the fact that countByValue passes back 
a Map and count passes back a Long?


Thanks!
Ognen

Re: count vs countByValue in for/yield

2014-07-16 Thread Ognen Duzlevski


Hello all,

Can anyone offer any insight on the below?

Both are "legal" Spark but the first one works, the latter one does not. 
They both work on a local machine but in a standalone cluster the one 
with countByValue fails.


Thanks!
Ognen

On 7/15/14, 2:23 PM, Ognen Duzlevski wrote:

Hello,

I am curious about something:

val result = for {
  (dt,evrdd) <- evrdds
  val ct = evrdd.count
} yield (dt->ct)

works.

val result = for {
  (dt,evrdd) <- evrdds
  val ct = evrdd.countByValue
} yield (dt->ct)

does not work. I get:
14/07/15 16:46:33 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/07/15 16:46:33 WARN TaskSetManager: Loss was due to 
java.lang.NullPointerException

java.lang.NullPointerException
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)

at org.apache.spark.scheduler.Task.run(Task.scala:51)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)

What is the difference? Is it in the fact that countByValue passes 
back a Map and count passes back a Long?


Thanks!
Ognen

count vs countByValue in for/yield

Re: count vs countByValue in for/yield

2 matches

Site Navigation

Mail list logo

Footer information