[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852541#comment-16852541 ]
Karthik Palaniappan edited comment on SPARK-24815 at 5/31/19 2:26 AM: ---------------------------------------------------------------------- I was starting to update the JIRA description with a problem statement, then realized I am unfamiliar with some of the challenges you guys mentioned in the comments, in particular how state is managed in structured streaming. I was imagining that processing rate was the correct heuristic, assuming the goal is to just keep up with the input, even at the expense of processing time. Continuous processing seems to solve the separate case where you need ultra low latency processing. [~skonto] [~kabhwan] [~gsomogyi] if you guys help with a design, I'd be happy to help with the implementation, but for now I will drop this JIRA. was (Author: karthik palaniappan): I was starting to update the JIRA description with a problem statement, then realized I am unfamiliar with some of the challenges you guys mentioned in the comments, in particular how state is managed in structured streaming. I was imagining that processing rate was the correct heuristic, assuming the goal is to keep up with the input. Continuous processing seems to solve the separate case where you need ultra low latency. [~skonto] [~kabhwan] [~gsomogyi] if you guys help with a design, I'd be happy to help with the implementation, but for now I will drop this JIRA. > Structured Streaming should support dynamic allocation > ------------------------------------------------------ > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Structured Streaming > Affects Versions: 2.3.1 > Reporter: Karthik Palaniappan > Priority: Minor > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org