>
>Can you tell us your use case ?
>
>On Tue, May 17, 2016 at 9:16 PM, Abi <analyst.tech.j...@gmail.com>
>wrote:
>
>> Can Pyspark access Scala API? The accumulator in pysPark does not
>have
>> local variable available . The Scala API does have it available
Can Pyspark access Scala API? The accumulator in pysPark does not have local
variable available . The Scala API does have it available
Please include me too
On May 12, 2016 6:08:14 AM EDT, Mich Talebzadeh
wrote:
>Hi Al,,
>
>
>Following the threads in spark forum, I decided to write up on
>configuration of Spark including allocation of resources and
>configuration
>of driver, executors, threads,
On Tue, May 10, 2016 at 2:24 PM, Abi <analyst.tech.j...@gmail.com> wrote:
> 1. How come pyspark does not provide the localvalue function like scala ?
>
> 2. Why is pyspark more restrictive than scala ?
On Tue, May 10, 2016 at 2:20 PM, Abi <analyst.tech.j...@gmail.com> wrote:
> Is there any example of this ? I want to see how you write the the
> iterable example
def kernel(arg):
input = broadcast_var.value + 1
#some processing with input
def foo():
broadcast_var = sc.broadcast(var)
rdd.foreach(kernel)
def main():
#something
In this code , I get the following error:
NameError: global name 'broadcast_var ' is not defined
pandas dataframe is broadcasted successfully. giving errors in datanode
function called kernel
Code:
dataframe_broadcast = sc.broadcast(dataframe)
def kernel():
df_v = dataframe_broadcast.value
Error:
I get this error when I try accessing the value member of the broadcast
variable.
On May 10, 2016 2:24:41 PM EDT, Abi <analyst.tech.j...@gmail.com> wrote:
>1. How come pyspark does not provide the localvalue function like scala
>?
>
>2. Why is pyspark more restrictive than scala ?
On May 9, 2016 8:24:06 PM EDT, Abi <analyst.tech.j...@gmail.com> wrote:
>I am splitting an integer array in 2 partitions and using an
>accumulator to sum the array. problem is
>
>1. I am not seeing execution time becoming half of a linear summing.
>
>2. The s
On May 10, 2016 2:20:25 PM EDT, Abi <analyst.tech.j...@gmail.com> wrote:
>Is there any example of this ? I want to see how you write the the
>iterable example
Hello test
1. How come pyspark does not provide the localvalue function like scala ?
2. Why is pyspark more restrictive than scala ?
Is there any example of this ? I want to see how you write the the iterable
example
it is "waiting" for the first
node to finish.
Hence, I am given the impression using accumulator.sum () in the kernel and
rdd.foreach (kernel) is making things sequential.
Any api/setting suggestions where I could make things parallel ?
On Mon, May 9, 2016 at 8:24 PM, Abi <analyst.tech.j
I am splitting an integer array in 2 partitions and using an accumulator to
sum the array. problem is
1. I am not seeing execution time becoming half of a linear summing.
2. The second node (from looking at timestamps) takes 3 times as long as the
first node. This gives the impression it is
15 matches
Mail list logo