Lifecycle of a map function

2020-04-07 Thread Vadim Vararu
Hi all,

I'm trying to guess understand what is the lifecycle of a map function in 
spark/yarn context. My understanding is that function is instantiated on the 
master and then passed to each executor (serialized/deserialized).

What I'd like to confirm is that the function is 
initialized/loaded/deserialized once per executor (JVM in yarn) and lives as 
long as executor lives and not once per task (logical unit of work to do).

Could you please explain or, better, give some links to source code or 
documentation? I've tried to take a look in Task.scala and ResultTask.scala but 
I'm not familiar with Scala and didn't find where exactly is function lifecycle 
managed.


Thanks in advance,
Vadim.


unsubscribe

2016-05-04 Thread Vadim Vararu

unsubscribe

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Is JavaSparkContext.wholeTextFiles distributed?

2016-04-26 Thread Vadim Vararu
Spark can create distributed datasets from any storage source supported 
by Hadoop, including your local file system, HDFS, Cassandra, HBase, 
Amazon S3 , etc. Spark supports 
text files, SequenceFiles 
, 
and any other Hadoop InputFormat 
.


Text file RDDs can be created using |SparkContext|’s |textFile| method. 
This method takes an URI for the file (either a local path on the 
machine, or a |hdfs://|, |s3n://|, etc URI) and reads it as a collection 
of lines. Here is an example invocation



I could not find an concrete statement where it says either the read 
(more than one file) is distributed or not.


On 26.04.2016 18:00, Hyukjin Kwon wrote:

then this would not be distributed




Is JavaSparkContext.wholeTextFiles distributed?

2016-04-26 Thread Vadim Vararu

Hi guys,

I'm trying to read many filed from s3 using 
JavaSparkContext.wholeTextFiles(...). Is that executed in a distributed 
manner? Please give me a link to the place in documentation where it's 
specified.


Thanks, Vadim.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org