[
https://issues.apache.org/jira/browse/SPARK-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tathagata Das updated SPARK-6752:
---------------------------------
Description:
Currently if you want to create a StreamingContext from checkpoint information,
the system will create a new SparkContext. This prevent StreamingContext to be
recreated from checkpoints in managed environments where SparkContext is
precreated.
Proposed solution: Introduce the following methods on StreamingContext
1. {{new StreamingContext(checkpointDirectory, sparkContext)}}
- Recreate StreamingContext from checkpoint using the provided SparkContext
2. {{new StreamingContext(checkpointDirectory, hadoopConf, sparkContext)}}
- Recreate StreamingContext from checkpoint using the provided SparkContext and
hadoop conf to read the checkpoint
3. {{StreamingContext.getOrCreate(checkpointDirectory, sparkContext,
createFunction: SparkContext => StreamingContext)}}
- If checkpoint file exists, then recreate StreamingContext using the provided
SparkContext (that is, call 1.), else create StreamingContext using the
provided createFunction
Also, the corresponding Java and Python API has to be added as well.
was:
Currently if you want to create a StreamingContext from checkpoint information,
the system will create a new SparkContext. This prevent StreamingContext to be
recreated from checkpoints in managed environments where SparkContext is
precreated.
Proposed solution: Introduce the following methods on StreamingContext
1. {{ new StreamingContext(checkpointDirectory, sparkContext) }}
- Recreate StreamingContext from checkpoint using the provided SparkContext
2. {{ new StreamingContext(checkpointDirectory, hadoopConf, sparkContext) }}
- Recreate StreamingContext from checkpoint using the provided SparkContext and
hadoop conf to read the checkpoint
3. {{StreamingContext.getOrCreate(checkpointDirectory, sparkContext,
createFunction: SparkContext => StreamingContext)}}
- If checkpoint file exists, then recreate StreamingContext using the provided
SparkContext (that is, call 1.), else create StreamingContext using the
provided createFunction
Also, the corresponding Java and Python API has to be added as well.
> Allow StreamingContext to be recreated from checkpoint and existing
> SparkContext
> --------------------------------------------------------------------------------
>
> Key: SPARK-6752
> URL: https://issues.apache.org/jira/browse/SPARK-6752
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Reporter: Tathagata Das
> Assignee: Tathagata Das
> Priority: Critical
>
> Currently if you want to create a StreamingContext from checkpoint
> information, the system will create a new SparkContext. This prevent
> StreamingContext to be recreated from checkpoints in managed environments
> where SparkContext is precreated.
> Proposed solution: Introduce the following methods on StreamingContext
> 1. {{new StreamingContext(checkpointDirectory, sparkContext)}}
> - Recreate StreamingContext from checkpoint using the provided SparkContext
> 2. {{new StreamingContext(checkpointDirectory, hadoopConf, sparkContext)}}
> - Recreate StreamingContext from checkpoint using the provided SparkContext
> and hadoop conf to read the checkpoint
> 3. {{StreamingContext.getOrCreate(checkpointDirectory, sparkContext,
> createFunction: SparkContext => StreamingContext)}}
> - If checkpoint file exists, then recreate StreamingContext using the
> provided SparkContext (that is, call 1.), else create StreamingContext using
> the provided createFunction
> Also, the corresponding Java and Python API has to be added as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]