[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over

2022-05-07 Thread Itay Bittan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533313#comment-17533313
 ] 

Itay Bittan commented on SPARK-28594:
-

Hi,

 

Just want to highlight the cost (in terms of money) of the new feature.

I'm running tens of thousands of Spark jobs (in Kubernetes) every day.

I have noticed that I pay dozens of dollars for `ListBucket` operation in S3.

After debugging spark-history I found that every 10s 
([default|https://spark.apache.org/docs/latest/monitoring.html#spark-history-server-configuration-options])
 we perform O(N) `ListBucket` operations - to get the content each folder.

A better solution could be to perform a deep listing as suggested 
[here|https://stackoverflow.com/a/71195428/1011253].

I tried to do it but it seems like there's abstract file system class and it 
would require a massive change.

> Allow event logs for running streaming apps to be rolled over
> -
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over

2020-06-07 Thread Shuai Lu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127791#comment-17127791
 ] 

Shuai Lu commented on SPARK-28594:
--

Got it. Thanks. We manage a multi-tenant cluster and it is a relatively big 
disruption when many users use Spark streaming and we don't have an elegant way 
to prevent it from filling up HDFS gradually. We may just ask user to disable 
event logs for their streaming applications in that case.

> Allow event logs for running streaming apps to be rolled over
> -
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over

2020-06-07 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127788#comment-17127788
 ] 

Jungtaek Lim commented on SPARK-28594:
--

Unfortunately that is most probably the guaranteed way if you're suffering some 
issue with event log in streaming application. I see some other tricky 
alternatives as well, like periodically stop the application and remove/move 
the event log and restart the application, but yes it makes you feel odd, have 
to have downtime a bit just due to event log.

> Allow event logs for running streaming apps to be rolled over
> -
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over

2020-06-07 Thread Shuai Lu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127783#comment-17127783
 ] 

Shuai Lu commented on SPARK-28594:
--

I see. Looks like the only way to bypass this in Spark 2.4 is to disable the 
event logging for Spark streaming?

> Allow event logs for running streaming apps to be rolled over
> -
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over

2020-06-07 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127776#comment-17127776
 ] 

Jungtaek Lim commented on SPARK-28594:
--

Actually it has been an issue with almost all of Spark versions (streaming 
support + event log), but it's unlikely that Spark community does the backport 
of a new feature. I don't think it can land to the 2.4.x version line. I'm not 
sure there's an interest of having 2.5 as well, so at least for now the only 
way to get it is trying out Spark 3.0.0.

> Allow event logs for running streaming apps to be rolled over
> -
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over

2020-06-06 Thread Shuai Lu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17127119#comment-17127119
 ] 

Shuai Lu commented on SPARK-28594:
--

Hi, [~kabhwan], are we planning to support this feature in Spark 2.4? It has 
been an issue with Spark 2.4 as well.

> Allow event logs for running streaming apps to be rolled over
> -
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over

2020-03-17 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061101#comment-17061101
 ] 

Dongjoon Hyun commented on SPARK-28594:
---

`releasenotes` is added for 3.0.0 release.

> Allow event logs for running streaming apps to be rolled over
> -
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2020-03-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056417#comment-17056417
 ] 

Dongjoon Hyun commented on SPARK-28594:
---

I assigned this umbrella to [~kabhwan] since he lead this actively.

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2020-01-29 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026307#comment-17026307
 ] 

Jungtaek Lim commented on SPARK-28594:
--

While I commented some tasks for improvement, technically this issue is 
resolved as all sub-tasks are resolved. Perhaps I may file new JIRA issues for 
these items, as they're not so many - no bothering for all of us.

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2020-01-10 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012562#comment-17012562
 ] 

Jungtaek Lim commented on SPARK-28594:
--

I'm enumerating the items which are "good to do", which might be better to file 
JIRA issues once we decide we should do them, or all required functionalities 
are done and we have a resource to deal with them.

For now, the items what I have are below:
 * Retain specific number of jobs / executions which allows compact file to 
have some of finished jobs / executions
 ** [https://github.com/apache/spark/pull/27085#discussion_r363428336]
 * Separate compaction from cleaning to allow leaving some old event log files 
after compaction
 ** [https://github.com/apache/spark/pull/27085#issuecomment-572792067]
 * Cache the state of compactor to avoid replaying event log files previously 
loaded before
 ** [https://github.com/apache/spark/pull/26416#discussion_r358260674]

 

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-10-23 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958412#comment-16958412
 ] 

Jungtaek Lim commented on SPARK-28594:
--

Please note that SPARK-29579 and SPARK-29581 could be moved out of SPARK-28594, 
as the reason of splitting these issues out of existing one is that we couldn't 
find good way to do that. Things can change if we get some brilliant idea 
before finishing SPARK-28870, but if not, I'd rather set SPARK-28870 as finish 
line of this and move both issues out of this.

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-09-01 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920534#comment-16920534
 ] 

Jungtaek Lim commented on SPARK-28594:
--

Thanks [~felixcheung] for reviewing and volunteering to being shepherd on this 
work!

Could you also jump in [https://github.com/apache/spark/pull/25577] which is 
coupled with this issue?

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-09-01 Thread Felix Cheung (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920476#comment-16920476
 ] 

Felix Cheung commented on SPARK-28594:
--

Reviewed. looks reasonable to me. I can help shepherd this work.

 

ping [~srowen] [~vanzin] [~irashid] for feedback.

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-08-25 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915383#comment-16915383
 ] 

Jungtaek Lim commented on SPARK-28594:
--

To ensure creating smaller PRs (easier reviews) I would split this issue into 
two sub-issues:

1) just roll event log files (no compaction)

2) compact old event log files

Note that even rolling event log files without compaction could help for some 
extreme case, where the log file got really huge for running application so you 
decide to drop some old logs bearing that it will lose the ability to replay 
log file. Currently there's no way to do this - deleting event log file which 
is open for writing would bring some unexpected issues and we would end up with 
stopping application.

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-08-25 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915358#comment-16915358
 ] 

Jungtaek Lim commented on SPARK-28594:
--

I've raised priority as many end users are suffering with this issue especially 
they run long-running queries.

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-08-25 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915357#comment-16915357
 ] 

Jungtaek Lim commented on SPARK-28594:
--

Coincidentally I was working on the design of this feature for 2 weeks. Looks 
like reporter doesn't seem to work on this feature, I'll taking up this issue 
and go forward.

Only POC done. Just started implementing. Here's design doc to describe the 
approach:

[https://docs.google.com/document/d/12bdCC4nA58uveRxpeo8k7kGOI2NRTXmXyBOweSi4YcY/edit#heading=h.7bmfccqq7ozy]

 

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Minor
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org