[ https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533313#comment-17533313 ]
Itay Bittan commented on SPARK-28594: ------------------------------------- Hi, Just want to highlight the cost (in terms of money) of the new feature. I'm running tens of thousands of Spark jobs (in Kubernetes) every day. I have noticed that I pay dozens of dollars for `ListBucket` operation in S3. After debugging spark-history I found that every 10s ([default|https://spark.apache.org/docs/latest/monitoring.html#spark-history-server-configuration-options]) we perform O(N) `ListBucket` operations - to get the content each folder. A better solution could be to perform a deep listing as suggested [here|https://stackoverflow.com/a/71195428/1011253]. I tried to do it but it seems like there's abstract file system class and it would require a massive change. > Allow event logs for running streaming apps to be rolled over > ------------------------------------------------------------- > > Key: SPARK-28594 > URL: https://issues.apache.org/jira/browse/SPARK-28594 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Stephen Levett > Assignee: Jungtaek Lim > Priority: Major > Labels: releasenotes > Fix For: 3.0.0 > > > At all current Spark releases when event logging on spark streaming is > enabled the event logs grow massively. The files continue to grow until the > application is stopped or killed. > The Spark history server then has difficulty processing the files. > https://issues.apache.org/jira/browse/SPARK-8617 > Addresses .inprogress files but not event log files that are still running. > Identify a mechanism to set a "max file" size so that the file is rolled over > when it reaches this size? > > -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org