Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Mark Hamstra
The latency to start a Spark Job is nowhere close to 2-4 seconds under typical conditions. You appear to be creating a new Spark Application everytime instead of running multiple Jobs in one Application. On Fri, Jul 6, 2018 at 3:12 AM Tien Dat wrote: > Dear Timothy, > > It works like a charm

Unable to see the table created using saveAsTable From Beeline. Please help!

2018-07-06 Thread anna stax
I am running spark 2.1.0 on AWS EMR In my Zeppelin Note I am creating a table df.write .format("parquet") .saveAsTable("default.1test") and I see the table when I spark.catalog.listTables().show() +++---+-+---+ |

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Timothy Chen
I know there are some community efforts shown in Spark summits before, mostly around reusing the same Spark context with multiple “jobs”. I don’t think reducing Spark job startup time is a community priority afaik. Tim On Fri, Jul 6, 2018 at 7:12 PM Tien Dat wrote: > Dear Timothy, > > It works

Re: Retry option and range resource configuration for Spark job on Mesos

2018-07-06 Thread Timothy Chen
Hi Tien, There is no retry on the job level as we expect the user to retry, and as you mention we tolerate tasks retry already. There is no request/limit type resource configuration that you described in Mesos (yet). So for 2) that’s not possible at the moment. Tim On Fri, Jul 6, 2018 at

Re: How to avoid duplicate column names after join with multiple conditions

2018-07-06 Thread Gokula Krishnan D
Nirav, withColumnRenamed() API might help but it does not different column and renames all the occurrences of the given column. either use select() API and rename as you want. Thanks & Regards, Gokula Krishnan* (Gokul)* On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel wrote: > Expr is `df1(a)

Retry option and range resource configuration for Spark job on Mesos

2018-07-06 Thread Tien Dat
Dear all, We are running Spark with Mesos as the resource manager. We are interesting in some aspect, such as: 1, Is it possible to configure a specific job with a number of maximum retries? I meant here is the retry at job level, NOT the /spark.task.maxFailures/ which is for the task with a

Spark 2.3 Kubernetes error

2018-07-06 Thread purna pradeep
> Hello, > > > > When I’m trying to set below options to spark-submit command on k8s Master > getting below error in spark-driver pod logs > > > > --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost > -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \ > > --conf

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Tien Dat
Dear Timothy, It works like a charm now. BTW (don't judge me if I am to greedy :-)), the latency to start a Spark job is around 2-4 seconds, unless I am not aware of some awesome optimization on Spark. Do you know if Spark community is working on reducing this latency? Best -- Sent from:

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Timothy Chen
Got it, then you can have an extracted Spark directory on each host on the same location, and don’t specify SPARK_EXECUTOR_URI. Instead, set spark.mesos.executor.home to that directory. This should effectively do what you want, which avoids extracting and fetching and just executed the command.

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-06 Thread Amiya Mishra
Hi Tathagata, Is there any limitation of below code while writing to multiple file ? val inputdf:DataFrame = sparkSession.readStream.schema(schema).format("csv").option("delimiter",",").csv("src/main/streamingInput") query1 =

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Tien Dat
Thank you for your answer. The think it I actually pointed to a local binary file. And Mesos locally copied the binary file to a specific folder in /var/lib/mesos/... and extract it to every time it launched an Spark executor. With the fetch cache, the copy time is reduced, but the reduction is

Re: [SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Timothy Chen
If its available locally on each host, then don’t specify a remote url but a local file uri instead. We have a fetcher cache in Mesos a while ago, I believe there is integration in the Spark framework if you look at the documentation as well. With the fetcher cache enabled Mesos agent will cache

[SPARK on MESOS] Avoid re-fetching Spark binary

2018-07-06 Thread Tien Dat
Dear all, We are running Spark with Mesos as the master for resource management. In our cluster, there are jobs that require very short response time (near real time applications), which usually around 3-5 seconds. In order to Spark to execute with Mesos, one has to specify the

unsubscribe

2018-07-06 Thread BEKHTI, Abdelmajid
unsubscribe