Le 06 nov. 2017 à 19:56, Nicolas Paris écrivait :
> Can anyone clarify the driver memory aspects of pySpark?
> According to [1], spark.driver.memory limits JVM + python memory.
>
> In case:
> spark.driver.memory=2G
> Then does it mean the user won't be able to use more than 2G, whatever
> the
Hi,
The
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
has a merge method but defined private to sql package and used for parquet
merge schema. Is there a reason why this method cannot be public ? If not
this method, is there a equivalent that is available to merge
You have to deal with failed jobs. In example try catch in your code.
Br Margus Roo
> On 9 Nov 2017, at 05:37, bing...@iflytek.com wrote:
>
> Dear,All
> I have a simple spark job, as below, all tasks in the stage 2(sth failed,
> retry) already finished. But the next stage never run.
>
>
Hi,
I'm using spark standalone in aws ec2 .And I'm using spark rest
API http::8080/Json to get completed apps but the Json completed
apps as empty array though the job ran successfully.
In this case, the only way to check the status is via REST calls to the Spark
json API, accessible at http://:/json/
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
Thank you for the reply.
I am currently not using SparkLauncher to launch my driver. Rather, I am
using the old fashion spark-submit and moving to SparkLauncher is not an
option right now.
Do I have any options there?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
The relevant config is spark.sql.shuffle.partitions. Note that once you
start a query, this number is fixed. The config will only affect queries
starting from an empty checkpoint.
On Wed, Nov 8, 2017 at 7:34 AM, Teemu Heikkilä wrote:
> I have spark structured streaming job
I have spark structured streaming job and I'm crunching through few terabytes
of data.
I'm using file stream reader and it works flawlessly, I can adjust the
partitioning of that with spark.default.parallelism
However I'm doing sessionization for the data after loading it and I'm
currently
Hi,
Is there a ways to measure idle time for spark executor. Any metrics
or accumulators
currently exposed.
Regards,
Samar