[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over

2022-05-07 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533313#comment-17533313
 ] 

Itay Bittan commented on SPARK-28594:
-

Hi,

 

Just want to highlight the cost (in terms of money) of the new feature.

I'm running tens of thousands of Spark jobs (in Kubernetes) every day.

I have noticed that I pay dozens of dollars for `ListBucket` operation in S3.

After debugging spark-history I found that every 10s 
([default|https://spark.apache.org/docs/latest/monitoring.html#spark-history-server-configuration-options])
 we perform O(N) `ListBucket` operations - to get the content each folder.

A better solution could be to perform a deep listing as suggested 
[here|https://stackoverflow.com/a/71195428/1011253].

I tried to do it but it seems like there's abstract file system class and it 
would require a massive change.

> Allow event logs for running streaming apps to be rolled over
> -
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23977) Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism

2022-02-16 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493293#comment-17493293
 ] 

Itay Bittan commented on SPARK-23977:
-

Hi all,

 

I follow the 
[recommendations|https://spark.apache.org/docs/latest/cloud-integration.html#parquet-io-settings]
 and getting the following warning:
{code:java}
2022-02-16 15:22:03.292 WARN FlowThread0 ParquetOutputFormat: Setting 
parquet.enable.summary-metadata is deprecated, please use 
parquet.summary.metadata.level {code}
What would be the recommended value for `parquet.summary.metadata.level ` ? 
 

> Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism
> ---
>
> Key: SPARK-23977
> URL: https://issues.apache.org/jira/browse/SPARK-23977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.0.0
>
>
> Hadoop 3.1 adds a mechanism for job-specific and store-specific committers 
> (MAPREDUCE-6823, MAPREDUCE-6956), and one key implementation, S3A committers, 
> HADOOP-13786
> These committers deliver high-performance output of MR and spark jobs to S3, 
> and offer the key semantics which Spark depends on: no visible output until 
> job commit, a failure of a task at an stage, including partway through task 
> commit, can be handled by executing and committing another task attempt. 
> In contrast, the FileOutputFormat commit algorithms on S3 have issues:
> * Awful performance because files are copied by rename
> * FileOutputFormat v1: weak task commit failure recovery semantics as the 
> (v1) expectation: "directory renames are atomic" doesn't hold.
> * S3 metadata eventual consistency can cause rename to miss files or fail 
> entirely (SPARK-15849)
> Note also that FileOutputFormat "v2" commit algorithm doesn't offer any of 
> the commit semantics w.r.t observability of or recovery from task commit 
> failure, on any filesystem.
> The S3A committers address these by way of uploading all data to the 
> destination through multipart uploads, uploads which are only completed in 
> job commit.
> The new {{PathOutputCommitter}} factory mechanism allows applications to work 
> with the S3A committers and any other, by adding a plugin mechanism into the 
> MRv2 FileOutputFormat class, where it job config and filesystem configuration 
> options can dynamically choose the output committer.
> Spark can use these with some binding classes to 
> # Add a subclass of {{HadoopMapReduceCommitProtocol}} which uses the MRv2 
> classes and {{PathOutputCommitterFactory}} to create the committers.
> # Add a {{BindingParquetOutputCommitter extends ParquetOutputCommitter}}
> to wire up Parquet output even when code requires the committer to be a 
> subclass of {{ParquetOutputCommitter}}
> This patch builds on SPARK-23807 for setting up the dependencies.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33288) Support k8s cluster manager with stage level scheduling

2021-03-05 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296125#comment-17296125
 ] 

Itay Bittan commented on SPARK-33288:
-

[~tgraves] you are right! I used my compiled version (3.0.1) instead of the 
official 3.1.1! thanks!

> Support k8s cluster manager with stage level scheduling
> ---
>
> Key: SPARK-33288
> URL: https://issues.apache.org/jira/browse/SPARK-33288
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.1.0
>
>
> Kubernetes supports dynamic allocation via the 
> {{spark.dynamicAllocation.shuffleTracking.enabled}}
> {{config, we can add support for stage level scheduling when that is turned 
> on.  }}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33288) Support k8s cluster manager with stage level scheduling

2021-03-05 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296027#comment-17296027
 ] 

Itay Bittan commented on SPARK-33288:
-

Hi!

I just upgraded from spark 3.0.1 to 3.1.1 and I'm having an issue with the 
resourceProfileId.

I saw that this [PR|https://github.com/apache/spark/pull/30204/files] added 
this argument but I didn't found documentation about that.

I'm running a simple spark application in client-mode via Jupyter (as the 
driver pod) + pyspark.

I also asked in 
[SO|https://stackoverflow.com/questions/66482218/spark-executors-fails-on-kubernetes-resourceprofileid-is-missing].

I'll appreciate any clue

> Support k8s cluster manager with stage level scheduling
> ---
>
> Key: SPARK-33288
> URL: https://issues.apache.org/jira/browse/SPARK-33288
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.1.0
>
>
> Kubernetes supports dynamic allocation via the 
> {{spark.dynamicAllocation.shuffleTracking.enabled}}
> {{config, we can add support for stage level scheduling when that is turned 
> on.  }}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2020-11-02 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224799#comment-17224799
 ] 

Itay Bittan commented on SPARK-26365:
-

thanks [~oscar.bonilla].

We ended up with a temporary solution:
{code:java}
spark-submit .. 2>&1 | tee output.log ; grep -q \"exit code: 0\" 
output.log{code}

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Oscar Bonilla
>Priority: Minor
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2020-11-02 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224569#comment-17224569
 ] 

Itay Bittan commented on SPARK-26365:
-

Hi, we are having the same issue.

It's critical in a scenario that triggers another job based on the first app 
success/failure.

Any idea for a workaround meanwhile?

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Oscar Bonilla
>Priority: Minor
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org