Re: reduceByKey as Action or Transformation

2016-04-25 Thread Sumedh Wale

  
  
On Monday 25 April 2016 11:28 PM,
  Weiping Qu wrote:


  
  Dear Ted,
  
  You are right. ReduceByKey is transformation. My fault.
  I would rephrase my question using following code snippet.
  object ScalaApp {
  
    def main(args: Array[String]): Unit ={
      val conf = new
  SparkConf().setAppName("ScalaApp").setMaster("local")
      val sc = new SparkContext(conf)
      //val textFile: RDD[String] =
      val file = sc.textFile("/home/usr/test.dat")
      val output = file.flatMap(line => line.split(" "))
    .map(word => (word, 1))
    .reduceByKey(_ + _)
  
      output.persist()
      output.count()
      output.collect()
  }
  
  It's a simple code snippet. 
  I realize that the first action count() would trigger the
  execution based on HadoopRDD, MapParititonRDD and the reduceByKey
  will take the ShuffleRDD as input to perform the count.


The count() will trigger both the execution as well as the
persistence of output RDD (as each partition is iterated).

 The second action collect just perform the collect
  over the same ShuffleRDD.


It will use the persisted ShuffleRDD blocks.

 I think the re-calculation will also be carried out
  over ShuffleRDD instead of re-executing preceding HadoopRDD and
  MapParitionRDD in case one partition of persisted output is
  missing.
  Am I right?


Since it is a partition of persisted ShuffleRDD that is missing, the
partition will have to be recreated from the base HadoopRDD. To
avoid it, one can checkpoint the ShuffleRDD if required.

 
  Thanks and Regards,
  Weiping
  
  
  
  


regards
-- 
Sumedh Wale
SnappyData (http://www.snappydata.io)
  


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: reduceByKey as Action or Transformation

2016-04-25 Thread Weiping Qu

Dear Ted,

You are right. ReduceByKey is transformation. My fault.
I would rephrase my question using following code snippet.
object ScalaApp {

  def main(args: Array[String]): Unit ={
val conf = new SparkConf().setAppName("ScalaApp").setMaster("local")
val sc = new SparkContext(conf)
//val textFile: RDD[String] =
val file = sc.textFile("/home/usr/test.dat")
val output = file.flatMap(line => line.split(" "))
  .map(word => (word, 1))
  .reduceByKey(_ + _)

output.persist()
output.count()
output.collect()
}

It's a simple code snippet.
I realize that the first action count() would trigger the execution 
based on HadoopRDD, MapParititonRDD and the reduceByKey will take the 
ShuffleRDD as input to perform the count.

The second action collect just perform the collect over the same ShuffleRDD.
I think the re-calculation will also be carried out over ShuffleRDD 
instead of re-executing preceding HadoopRDD and MapParitionRDD in case 
one partition of persisted output is missing.

Am I right?

Thanks and Regards,
Weiping

On 25.04.2016 17:46, Ted Yu wrote:

Can you show snippet of your code which demonstrates what you observed ?

Thansk

On Mon, Apr 25, 2016 at 8:38 AM, Weiping Qu > wrote:


Thanks.
I read that from the specification.
I thought the way people distinguish actions and transformations
depends on whether they are lazily executed or not.
As far as I saw from my codes, the reduceByKey will be executed
without any operations in the Action category.
Please correct me if I am wrong.

Thanks,
Regards,
Weiping

On 25.04.2016 17 :20, Chadha Pooja wrote:

Reduce By Key is a Transformation


http://spark.apache.org/docs/latest/programming-guide.html#transformations

Thanks

_

Pooja Chadha
Senior Architect
THE BOSTON CONSULTING GROUP
Mobile +1 617 794 3862 


_




-Original Message-
From: Weiping Qu [mailto:q...@informatik.uni-kl.de
]
Sent: Monday, April 25, 2016 11:05 AM
To: u...@spark.incubator.apache.org

Subject: reduceByKey as Action or Transformation

Hi,

I'd like just to verify that whether reduceByKey is
transformation or
actions.
As written in RDD papers, spark flow will not be triggered only if
actions are reached.
I tried and saw that the my flow will be executed once there is a
reduceByKey while it is categorized into transformations in
Spark 1.6.1
specification.

Thanks and Regards,
Weiping

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org



__
The Boston Consulting Group, Inc.
  This e-mail message may contain confidential and/or
privileged information.
If you are not an addressee or otherwise authorized to receive
this message,
you should not use, copy, disclose or take any action based on
this e-mail or
any information contained in the message. If you have received
this material
in error, please advise the sender immediately by reply e-mail
and delete this
message. Thank you.



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org







Re: reduceByKey as Action or Transformation

2016-04-25 Thread Ted Yu
Can you show snippet of your code which demonstrates what you observed ?

Thansk

On Mon, Apr 25, 2016 at 8:38 AM, Weiping Qu  wrote:

> Thanks.
> I read that from the specification.
> I thought the way people distinguish actions and transformations depends
> on whether they are lazily executed or not.
> As far as I saw from my codes, the reduceByKey will be executed without
> any operations in the Action category.
> Please correct me if I am wrong.
>
> Thanks,
> Regards,
> Weiping
>
> On 25.04.2016 17:20, Chadha Pooja wrote:
>
>> Reduce By Key is a Transformation
>>
>> http://spark.apache.org/docs/latest/programming-guide.html#transformations
>>
>> Thanks
>>
>> _
>>
>> Pooja Chadha
>> Senior Architect
>> THE BOSTON CONSULTING GROUP
>> Mobile +1 617 794 3862
>>
>>
>> _
>>
>>
>>
>> -Original Message-
>> From: Weiping Qu [mailto:q...@informatik.uni-kl.de]
>> Sent: Monday, April 25, 2016 11:05 AM
>> To: u...@spark.incubator.apache.org
>> Subject: reduceByKey as Action or Transformation
>>
>> Hi,
>>
>> I'd like just to verify that whether reduceByKey is transformation or
>> actions.
>> As written in RDD papers, spark flow will not be triggered only if
>> actions are reached.
>> I tried and saw that the my flow will be executed once there is a
>> reduceByKey while it is categorized into transformations in Spark 1.6.1
>> specification.
>>
>> Thanks and Regards,
>> Weiping
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>> __
>> The Boston Consulting Group, Inc.
>>   This e-mail message may contain confidential and/or privileged
>> information.
>> If you are not an addressee or otherwise authorized to receive this
>> message,
>> you should not use, copy, disclose or take any action based on this
>> e-mail or
>> any information contained in the message. If you have received this
>> material
>> in error, please advise the sender immediately by reply e-mail and delete
>> this
>> message. Thank you.
>>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: reduceByKey as Action or Transformation

2016-04-25 Thread Weiping Qu

Thanks.
I read that from the specification.
I thought the way people distinguish actions and transformations depends 
on whether they are lazily executed or not.
As far as I saw from my codes, the reduceByKey will be executed without 
any operations in the Action category.

Please correct me if I am wrong.

Thanks,
Regards,
Weiping

On 25.04.2016 17:20, Chadha Pooja wrote:

Reduce By Key is a Transformation

http://spark.apache.org/docs/latest/programming-guide.html#transformations

Thanks
_

Pooja Chadha
Senior Architect
THE BOSTON CONSULTING GROUP
Mobile +1 617 794 3862

_


-Original Message-
From: Weiping Qu [mailto:q...@informatik.uni-kl.de]
Sent: Monday, April 25, 2016 11:05 AM
To: u...@spark.incubator.apache.org
Subject: reduceByKey as Action or Transformation

Hi,

I'd like just to verify that whether reduceByKey is transformation or
actions.
As written in RDD papers, spark flow will not be triggered only if
actions are reached.
I tried and saw that the my flow will be executed once there is a
reduceByKey while it is categorized into transformations in Spark 1.6.1
specification.

Thanks and Regards,
Weiping

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

__
The Boston Consulting Group, Inc.
  
This e-mail message may contain confidential and/or privileged information.

If you are not an addressee or otherwise authorized to receive this message,
you should not use, copy, disclose or take any action based on this e-mail or
any information contained in the message. If you have received this material
in error, please advise the sender immediately by reply e-mail and delete this
message. Thank you.



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org