Re: How Does aggregate work

2015-03-23 Thread Paweł Szulc
It is actually number of cores. If your processor has hyperthreading then
it will be more (number of processors your OS sees)

niedz., 22 mar 2015, 4:51 PM Ted Yu użytkownik 
napisał:

> I assume spark.default.parallelism is 4 in the VM Ashish was using.
>
> Cheers
>


Re: How Does aggregate work

2015-03-22 Thread Ted Yu
I assume spark.default.parallelism is 4 in the VM Ashish was using.

Cheers


Re: How Does aggregate work

2015-03-22 Thread Dean Wampler
2 is added every time the final partition aggregator is called. The result
of summing the elements across partitions is 9 of course. If you force a
single partition (using spark-shell in local mode):

scala> val data = sc.parallelize(List(2,3,4),1)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 11

The 2nd function is still called, even though there is only one partition
(presumably either x or y is set to 0).

For every additional partition you specify as the 2nd arg. to parallelize,
the 2nd function will be called again:


scala> val data = sc.parallelize(List(2,3,4),1)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 11

scala> val data = sc.parallelize(List(2,3,4),2)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 13

scala> val data = sc.parallelize(List(2,3,4),3)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 15

scala> val data = sc.parallelize(List(2,3,4),4)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 17

Hence, it appears that not specifying the 2nd argument resulted in 4
partitions, even though you only had three elements in the list.

If p_i is the ith partition, the final sum appears to be:

(2 + ... (2 + (2 + (2 + 0 + p_1) + p_2) + p_3) ...)


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Sun, Mar 22, 2015 at 8:05 AM, ashish.usoni 
wrote:

> Hi ,
> I am not able to understand how aggregate function works, Can some one
> please explain how below result came
> I am running spark using cloudera VM
>
> The result in below is 17 but i am not able to find out how it is
> calculating 17
> val data = sc.parallelize(List(2,3,4))
> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
> *res21: Int = 17*
>
> Also when i try to change the 2nd parameter in sc.parallelize i get
> different result
>
> val data = sc.parallelize(List(2,3,4),2)
> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
> *res21: Int = 13*
>
> Thanks for the help.
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


How Does aggregate work

2015-03-22 Thread ashish.usoni
Hi , 
I am not able to understand how aggregate function works, Can some one
please explain how below result came 
I am running spark using cloudera VM 

The result in below is 17 but i am not able to find out how it is
calculating 17
val data = sc.parallelize(List(2,3,4))
data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
*res21: Int = 17*

Also when i try to change the 2nd parameter in sc.parallelize i get
different result 

val data = sc.parallelize(List(2,3,4),2)
data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
*res21: Int = 13*

Thanks for the help.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org