[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over
[ https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533313#comment-17533313 ] Itay Bittan commented on SPARK-28594: - Hi, Just want to highlight the cost (in terms of money) of the new feature. I'm running tens of thousands of Spark jobs (in Kubernetes) every day. I have noticed that I pay dozens of dollars for `ListBucket` operation in S3. After debugging spark-history I found that every 10s ([default|https://spark.apache.org/docs/latest/monitoring.html#spark-history-server-configuration-options]) we perform O(N) `ListBucket` operations - to get the content each folder. A better solution could be to perform a deep listing as suggested [here|https://stackoverflow.com/a/71195428/1011253]. I tried to do it but it seems like there's abstract file system class and it would require a massive change. > Allow event logs for running streaming apps to be rolled over > - > > Key: SPARK-28594 > URL: https://issues.apache.org/jira/browse/SPARK-28594 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Stephen Levett >Assignee: Jungtaek Lim >Priority: Major > Labels: releasenotes > Fix For: 3.0.0 > > > At all current Spark releases when event logging on spark streaming is > enabled the event logs grow massively. The files continue to grow until the > application is stopped or killed. > The Spark history server then has difficulty processing the files. > https://issues.apache.org/jira/browse/SPARK-8617 > Addresses .inprogress files but not event log files that are still running. > Identify a mechanism to set a "max file" size so that the file is rolled over > when it reaches this size? > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23977) Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism
[ https://issues.apache.org/jira/browse/SPARK-23977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493293#comment-17493293 ] Itay Bittan commented on SPARK-23977: - Hi all, I follow the [recommendations|https://spark.apache.org/docs/latest/cloud-integration.html#parquet-io-settings] and getting the following warning: {code:java} 2022-02-16 15:22:03.292 WARN FlowThread0 ParquetOutputFormat: Setting parquet.enable.summary-metadata is deprecated, please use parquet.summary.metadata.level {code} What would be the recommended value for `parquet.summary.metadata.level ` ? > Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism > --- > > Key: SPARK-23977 > URL: https://issues.apache.org/jira/browse/SPARK-23977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.0.0 > > > Hadoop 3.1 adds a mechanism for job-specific and store-specific committers > (MAPREDUCE-6823, MAPREDUCE-6956), and one key implementation, S3A committers, > HADOOP-13786 > These committers deliver high-performance output of MR and spark jobs to S3, > and offer the key semantics which Spark depends on: no visible output until > job commit, a failure of a task at an stage, including partway through task > commit, can be handled by executing and committing another task attempt. > In contrast, the FileOutputFormat commit algorithms on S3 have issues: > * Awful performance because files are copied by rename > * FileOutputFormat v1: weak task commit failure recovery semantics as the > (v1) expectation: "directory renames are atomic" doesn't hold. > * S3 metadata eventual consistency can cause rename to miss files or fail > entirely (SPARK-15849) > Note also that FileOutputFormat "v2" commit algorithm doesn't offer any of > the commit semantics w.r.t observability of or recovery from task commit > failure, on any filesystem. > The S3A committers address these by way of uploading all data to the > destination through multipart uploads, uploads which are only completed in > job commit. > The new {{PathOutputCommitter}} factory mechanism allows applications to work > with the S3A committers and any other, by adding a plugin mechanism into the > MRv2 FileOutputFormat class, where it job config and filesystem configuration > options can dynamically choose the output committer. > Spark can use these with some binding classes to > # Add a subclass of {{HadoopMapReduceCommitProtocol}} which uses the MRv2 > classes and {{PathOutputCommitterFactory}} to create the committers. > # Add a {{BindingParquetOutputCommitter extends ParquetOutputCommitter}} > to wire up Parquet output even when code requires the committer to be a > subclass of {{ParquetOutputCommitter}} > This patch builds on SPARK-23807 for setting up the dependencies. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33288) Support k8s cluster manager with stage level scheduling
[ https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296125#comment-17296125 ] Itay Bittan commented on SPARK-33288: - [~tgraves] you are right! I used my compiled version (3.0.1) instead of the official 3.1.1! thanks! > Support k8s cluster manager with stage level scheduling > --- > > Key: SPARK-33288 > URL: https://issues.apache.org/jira/browse/SPARK-33288 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.1.0 > > > Kubernetes supports dynamic allocation via the > {{spark.dynamicAllocation.shuffleTracking.enabled}} > {{config, we can add support for stage level scheduling when that is turned > on. }} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33288) Support k8s cluster manager with stage level scheduling
[ https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296027#comment-17296027 ] Itay Bittan commented on SPARK-33288: - Hi! I just upgraded from spark 3.0.1 to 3.1.1 and I'm having an issue with the resourceProfileId. I saw that this [PR|https://github.com/apache/spark/pull/30204/files] added this argument but I didn't found documentation about that. I'm running a simple spark application in client-mode via Jupyter (as the driver pod) + pyspark. I also asked in [SO|https://stackoverflow.com/questions/66482218/spark-executors-fails-on-kubernetes-resourceprofileid-is-missing]. I'll appreciate any clue > Support k8s cluster manager with stage level scheduling > --- > > Key: SPARK-33288 > URL: https://issues.apache.org/jira/browse/SPARK-33288 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Spark Core >Affects Versions: 3.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.1.0 > > > Kubernetes supports dynamic allocation via the > {{spark.dynamicAllocation.shuffleTracking.enabled}} > {{config, we can add support for stage level scheduling when that is turned > on. }} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224799#comment-17224799 ] Itay Bittan commented on SPARK-26365: - thanks [~oscar.bonilla]. We ended up with a temporary solution: {code:java} spark-submit .. 2>&1 | tee output.log ; grep -q \"exit code: 0\" output.log{code} > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, > spark-3.0.0-raise-exception-k8s-failure.patch > > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code
[ https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224569#comment-17224569 ] Itay Bittan commented on SPARK-26365: - Hi, we are having the same issue. It's critical in a scenario that triggers another job based on the first app success/failure. Any idea for a workaround meanwhile? > spark-submit for k8s cluster doesn't propagate exit code > > > Key: SPARK-26365 > URL: https://issues.apache.org/jira/browse/SPARK-26365 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Spark Submit >Affects Versions: 2.3.2, 2.4.0 >Reporter: Oscar Bonilla >Priority: Minor > > When launching apps using spark-submit in a kubernetes cluster, if the Spark > applications fails (returns exit code = 1 for example), spark-submit will > still exit gracefully and return exit code = 0. > This is problematic, since there's no way to know if there's been a problem > with the Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org