(www.good.com)
-Original Message-
From: Vadim Bichutskiy
[vadim.bichuts...@gmail.commailto:vadim.bichuts...@gmail.com]
Sent: Thursday, April 23, 2015 12:00 PM Eastern Standard Time
To: Tathagata Das
Cc: user@spark.apache.org
Subject: Re: Map Question
Here it is. How do I access
Here it is. How do I access a broadcastVar in a function that's in another
module (process_stuff.py below):
Thanks,
Vadim
main.py
---
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.sql import SQLContext
from process_stuff import
12:00 PM Eastern Standard Time
*To: *Tathagata Das
*Cc: *user@spark.apache.org
*Subject: *Re: Map Question
Here it is. How do I access a broadcastVar in a function that's in another
module (process_stuff.py below):
Thanks,
Vadim
main.py
---
from pyspark import SparkContext
Is the mylist present on every executor? If not, then you have to pass it
on. And broadcasts are the best way to pass them on. But note that once
broadcasted it will immutable at the executors, and if you update the list
at the driver, you will have to broadcast it again.
TD
On Wed, Apr 22, 2015
Can I use broadcast vars in local mode?
ᐧ
On Wed, Apr 22, 2015 at 2:06 PM, Tathagata Das t...@databricks.com wrote:
Yep. Not efficient. Pretty bad actually. That's why broadcast variable
were introduced right at the very beginning of Spark.
On Wed, Apr 22, 2015 at 10:58 AM, Vadim
Absolutely. The same code would work for local as well as distributed mode!
On Wed, Apr 22, 2015 at 11:08 AM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Can I use broadcast vars in local mode?
ᐧ
On Wed, Apr 22, 2015 at 2:06 PM, Tathagata Das t...@databricks.com
wrote:
Yep. Not
Yep. Not efficient. Pretty bad actually. That's why broadcast variable were
introduced right at the very beginning of Spark.
On Wed, Apr 22, 2015 at 10:58 AM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Thanks TD. I was looking into broadcast variables.
Right now I am running it
Can you give full code? especially the myfunc?
On Wed, Apr 22, 2015 at 2:20 PM, Vadim Bichutskiy
vadim.bichuts...@gmail.com wrote:
Here's what I did:
print 'BROADCASTING...'
broadcastVar = sc.broadcast(mylist)
print broadcastVar
print broadcastVar.value
print 'FINISHED BROADCASTING...'
Here's what I did:
print 'BROADCASTING...'
broadcastVar = sc.broadcast(mylist)
print broadcastVar
print broadcastVar.value
print 'FINISHED BROADCASTING...'
The above works fine,
but when I call myrdd.map(myfunc) I get *NameError: global name
'broadcastVar' is not defined*
The myfunc function