I have spark structured streaming job and I'm crunching through few terabytes
of data.
I'm using file stream reader and it works flawlessly, I can adjust the
partitioning of that with spark.default.parallelism
However I'm doing sessionization for the data after loading it and I'm
currently
The relevant config is spark.sql.shuffle.partitions. Note that once you
start a query, this number is fixed. The config will only affect queries
starting from an empty checkpoint.
On Wed, Nov 8, 2017 at 7:34 AM, Teemu Heikkilä wrote:
> I have spark structured streaming job
Thank you for the reply.
I am currently not using SparkLauncher to launch my driver. Rather, I am
using the old fashion spark-submit and moving to SparkLauncher is not an
option right now.
Do I have any options there?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
In this case, the only way to check the status is via REST calls to the Spark
json API, accessible at http://:/json/
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
Hi,
Is there a ways to measure idle time for spark executor. Any metrics
or accumulators
currently exposed.
Regards,
Samar
You have to deal with failed jobs. In example try catch in your code.
Br Margus Roo
> On 9 Nov 2017, at 05:37, bing...@iflytek.com wrote:
>
> Dear,All
> I have a simple spark job, as below, all tasks in the stage 2(sth failed,
> retry) already finished. But the next stage never run.
>
>
Hi,
The
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
has a merge method but defined private to sql package and used for parquet
merge schema. Is there a reason why this method cannot be public ? If not
this method, is there a equivalent that is available to merge
Hi,
I'm using spark standalone in aws ec2 .And I'm using spark rest
API http::8080/Json to get completed apps but the Json completed
apps as empty array though the job ran successfully.
Le 06 nov. 2017 à 19:56, Nicolas Paris écrivait :
> Can anyone clarify the driver memory aspects of pySpark?
> According to [1], spark.driver.memory limits JVM + python memory.
>
> In case:
> spark.driver.memory=2G
> Then does it mean the user won't be able to use more than 2G, whatever
> the