Re: Can Pyspark access Scala API?

2016-05-18 Thread Abi
> >Can you tell us your use case ? > >On Tue, May 17, 2016 at 9:16 PM, Abi <analyst.tech.j...@gmail.com> >wrote: > >> Can Pyspark access Scala API? The accumulator in pysPark does not >have >> local variable available . The Scala API does have it available

Can Pyspark access Scala API?

2016-05-17 Thread Abi
Can Pyspark access Scala API? The accumulator in pysPark does not have local variable available . The Scala API does have it available

Re: My notes on Spark Performance & Tuning Guide

2016-05-17 Thread Abi
Please include me too On May 12, 2016 6:08:14 AM EDT, Mich Talebzadeh wrote: >Hi Al,, > > >Following the threads in spark forum, I decided to write up on >configuration of Spark including allocation of resources and >configuration >of driver, executors, threads,

Re: Pyspark accumulator

2016-05-13 Thread Abi
On Tue, May 10, 2016 at 2:24 PM, Abi <analyst.tech.j...@gmail.com> wrote: > 1. How come pyspark does not provide the localvalue function like scala ? > > 2. Why is pyspark more restrictive than scala ?

Re: pyspark mappartions ()

2016-05-13 Thread Abi
On Tue, May 10, 2016 at 2:20 PM, Abi <analyst.tech.j...@gmail.com> wrote: > Is there any example of this ? I want to see how you write the the > iterable example

broadcast variable not picked up

2016-05-13 Thread abi
def kernel(arg): input = broadcast_var.value + 1 #some processing with input def foo(): broadcast_var = sc.broadcast(var) rdd.foreach(kernel) def main(): #something In this code , I get the following error: NameError: global name 'broadcast_var ' is not defined

pandas dataframe broadcasted. giving errors in datanode function called kernel

2016-05-13 Thread abi
pandas dataframe is broadcasted successfully. giving errors in datanode function called kernel Code: dataframe_broadcast = sc.broadcast(dataframe) def kernel(): df_v = dataframe_broadcast.value Error: I get this error when I try accessing the value member of the broadcast variable.

Re: Pyspark accumulator

2016-05-10 Thread Abi
On May 10, 2016 2:24:41 PM EDT, Abi <analyst.tech.j...@gmail.com> wrote: >1. How come pyspark does not provide the localvalue function like scala >? > >2. Why is pyspark more restrictive than scala ?

Re: Accumulator question

2016-05-10 Thread Abi
On May 9, 2016 8:24:06 PM EDT, Abi <analyst.tech.j...@gmail.com> wrote: >I am splitting an integer array in 2 partitions and using an >accumulator to sum the array. problem is > >1. I am not seeing execution time becoming half of a linear summing. > >2. The s

Re: pyspark mappartions ()

2016-05-10 Thread Abi
On May 10, 2016 2:20:25 PM EDT, Abi <analyst.tech.j...@gmail.com> wrote: >Is there any example of this ? I want to see how you write the the >iterable example

Hi test

2016-05-10 Thread Abi
Hello test

Pyspark accumulator

2016-05-10 Thread Abi
1. How come pyspark does not provide the localvalue function like scala ? 2. Why is pyspark more restrictive than scala ?

pyspark mappartions ()

2016-05-10 Thread Abi
Is there any example of this ? I want to see how you write the the iterable example

Re: Accumulator question

2016-05-09 Thread Abi
it is "waiting" for the first node to finish. Hence, I am given the impression using accumulator.sum () in the kernel and rdd.foreach (kernel) is making things sequential. Any api/setting suggestions where I could make things parallel ? On Mon, May 9, 2016 at 8:24 PM, Abi <analyst.tech.j

Accumulator question

2016-05-09 Thread Abi
I am splitting an integer array in 2 partitions and using an accumulator to sum the array. problem is 1. I am not seeing execution time becoming half of a linear summing. 2. The second node (from looking at timestamps) takes 3 times as long as the first node. This gives the impression it is