Correct me if I'm wrong, but he can actually run thus code without
broadcasting the users map, however the code will be less efficient.
czw., 26 lut 2015, 12:31 PM Sean Owen użytkownik so...@cloudera.com
napisał:
Yes, but there is no concept of executors 'deleting' an RDD. And you
would want
Yes that's correct; it works but broadcasting would be more efficient.
On Thu, Feb 26, 2015 at 1:20 PM, Paweł Szulc paul.sz...@gmail.com wrote:
Correct me if I'm wrong, but he can actually run thus code without
broadcasting the users map, however the code will be less efficient.
czw., 26
No. That code is just Scala code executing on the driver. usersMap is
a local object. This bit has nothing to do with Spark.
Yes you would have to broadcast it to use it efficient in functions
(not on the driver).
On Thu, Feb 26, 2015 at 10:24 AM, Guillermo Ortiz konstt2...@gmail.com wrote:
So,
I have a question,
If I execute this code,
val users = sc.textFile(/tmp/users.log).map(x = x.split(,)).map(
v = (v(0), v(1)))
val contacts = sc.textFile(/tmp/contacts.log).map(y =
y.split(,)).map( v = (v(0), v(1)))
val usersMap = contacts.collectAsMap()
contacts.map(v = (v._1, (usersMap(v._1),
No, it exists only on the driver, not the executors. Executors don't
retain partitions unless they are supposed to be persisted.
Generally, broadcasting a small Map to accomplish a join 'manually' is
more efficient than a join, but you are right that this is mostly
because joins usually involve
So, on my example, when I execute:
val usersMap = contacts.collectAsMap() -- Map goes to the driver and
just lives there in the beginning.
contacts.map(v = (v._1, (usersMap(v._1), v._2))).collect
When I execute usersMap(v._1),
Does driver has to send to the executorX the value which it needs? I
Isn't it contacts.map(v = (v._1, (usersMap(v._1), v._2))).collect()
executed in the executors? why is it executed in the driver?
contacts are not a local object, right?
2015-02-26 11:27 GMT+01:00 Sean Owen so...@cloudera.com:
No. That code is just Scala code executing on the driver. usersMap
Yes, in that code, usersMap has been serialized to every executor.
I thought you were referring to accessing the copy in the driver.
On Thu, Feb 26, 2015 at 10:47 AM, Guillermo Ortiz konstt2...@gmail.com wrote:
Isn't it contacts.map(v = (v._1, (usersMap(v._1), v._2))).collect()
executed in the
One last time to be sure I got it right, the executing sequence here
goes like this?:
val usersMap = contacts.collectAsMap()
#The contacts RDD is collected by the executors and sent to the
driver, the executors delete the rdd
contacts.map(v = (v._1, (usersMap(v._1), v._2))).collect()
#The userMap
Yes, but there is no concept of executors 'deleting' an RDD. And you
would want to broadcast the usersMap if you're using it this way.
On Thu, Feb 26, 2015 at 11:26 AM, Guillermo Ortiz konstt2...@gmail.com wrote:
One last time to be sure I got it right, the executing sequence here
goes like
10 matches
Mail list logo