Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread John Zhuge
Sounds good. Should we add another paragraph after this paragraph in configuration.md to explain executor env as well? I will be happy to upload a simple patch. Note: When running Spark on YARN in cluster mode, environment variables > need to be set using the

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
Because spark-env.sh is something that makes sense only on the gateway machine (where the app is being submitted from). On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge wrote: > Thanks Jacek and Marcelo! > > Any reason it is not sourced? Any security consideration? > > > On Wed,

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread John Zhuge
Thanks Jacek and Marcelo! Any reason it is not sourced? Any security consideration? On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin wrote: > On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop

Re: Apache Spark - Question about Structured Streaming Sink addBatch dataframe size

2018-01-03 Thread Tathagata Das
1. It is all the result data in that trigger. Note that it takes a DataFrame which is a purely logical representation of data and has no association with partitions, etc. which are physical representations. 2. If you want to limit the amount of data that is processed in a trigger, then you should

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Marcelo Vanzin
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote: > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is > spark-env.sh sourced when starting the Spark AM container or the executor > container? No, it's not. -- Marcelo

Apache Spark - Question about Structured Streaming Sink addBatch dataframe size

2018-01-03 Thread M Singh
Hi: The documentation for Sink.addBatch is as follows:   /**   * Adds a batch of data to this sink. The data for a given `batchId` is deterministic and if   * this method is called more than once with the same batchId (which will happen in the case of   * failures), then `data` should only be

Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?

2018-01-03 Thread Jacek Laskowski
Hi, My understanding is that AM with the driver (in cluster deploy mode) and executors are simple Java processes with their settings set one by one while submitting a Spark application for execution and creating ContainerLaunchContext for launching YARN containers. See