from:"Giulio De Vecchi \(JIRA\)"

[jira] [Issue Comment Deleted] (SPARK-1647) Prevent data loss when Streaming driver goes down

2014-08-28 Thread Giulio De Vecchi (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Giulio De Vecchi updated SPARK-1647:

Comment: was deleted

(was: Not sure if this make sense, but maybe would be nice to have a kind of
flag available within the code that tells me if I'm running in a normal
situation or during a recovery.
To better explain this, let's consider the following scenario:
I am processing data, let's say from a Kafka streaming, and I am updating a
database based on the computations. During the recovery I don't want to update
again the database (for many reasons, let's just assume that) but I want my
system to be in the same status as before, thus I would like to know if my code
is running for the first time or during a recovery so I can avoid to update the
database again.

More generally I want to know this in case I'm interacting with external
entities.

)

Prevent data loss when Streaming driver goes down
-

Key: SPARK-1647
URL: https://issues.apache.org/jira/browse/SPARK-1647
Project: Spark
Issue Type: Bug
Components: Streaming
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

Currently when the driver goes down, any uncheckpointed data is lost from
within spark. If the system from which messages are pulled can replay
messages, the data may be available - but for some systems, like Flume this
is not the case.
Also, all windowing information is lost for windowing functions.
We must persist raw data somehow, and be able to replay this data if
required. We also must persist windowing information with the data itself.
This will likely require quite a bit of work to complete and probably will
have to be split into several sub-jiras.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-1647) Prevent data loss when Streaming driver goes down

2014-08-27 Thread Giulio De Vecchi (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110942#comment-14110942
]

Giulio De Vecchi edited comment on SPARK-1647 at 8/27/14 10:33 AM:
---

Not sure if this make sense, but maybe would be nice to have a kind of flag
available within the code that tells me if I'm running in a normal situation
or during a recovery.
To better explain this, let's consider the following scenario:
I am processing data, let's say from a Kafka streaming, and I am updating a
database based on the computations. During the recovery I don't want to update
again the database (for many reasons, let's just assume that) but I want my
system to be in the same status as before, thus I would like to know if my
code is running for the first time or during a recovery so I can avoid to
update the database again.

More generally I want to know this in case I'm interacting with external
entities.

was (Author: gadv):
Not sure if this make sense, but maybe would be nice to have a kind of flag
available within the code that tells me if I'm running in a normal situation
or during a recovery.
To better explain this, let's consider the following scenario:
I am processing data, let's say from a Kafka streaming, and I am updating a
database based on the computations. During the recovery I don't want to update
again the database (for many reasons, let's just assume that) but I want my
system to be in the same status as before, so I would be able to know if my
code is running for the first time or during a recovery so I can avoid to
update the database again.

More generally I want to know this in case I'm interacting with external
entities.

Prevent data loss when Streaming driver goes down
-

Key: SPARK-1647
URL: https://issues.apache.org/jira/browse/SPARK-1647
Project: Spark
Issue Type: Bug
Components: Streaming
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-1647) Prevent data loss when Streaming driver goes down

2014-08-27 Thread Giulio De Vecchi (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110942#comment-14110942
]

Giulio De Vecchi edited comment on SPARK-1647 at 8/27/14 10:34 AM:
---

Not sure if this make sense, but maybe would be nice to have a kind of flag
available within the code that tells me if I'm running in a normal situation
or during a recovery.
To better explain this, let's consider the following scenario:
I am processing data, let's say from a Kafka streaming, and I am updating a
database based on the computations. During the recovery I don't want to update
again the database (for many reasons, let's just assume that) but I want my
system to be in the same status as before, thus I would like to know if my code
is running for the first time or during a recovery so I can avoid to update the
database again.

More generally I want to know this in case I'm interacting with external
entities.

was (Author: gadv):
Not sure if this make sense, but maybe would be nice to have a kind of flag
available within the code that tells me if I'm running in a normal situation
or during a recovery.
To better explain this, let's consider the following scenario:
I am processing data, let's say from a Kafka streaming, and I am updating a
database based on the computations. During the recovery I don't want to update
again the database (for many reasons, let's just assume that) but I want my
system to be in the same status as before, thus I would like to know if my
code is running for the first time or during a recovery so I can avoid to
update the database again.

More generally I want to know this in case I'm interacting with external
entities.

Prevent data loss when Streaming driver goes down
-

Key: SPARK-1647
URL: https://issues.apache.org/jira/browse/SPARK-1647
Project: Spark
Issue Type: Bug
Components: Streaming
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1647) Prevent data loss when Streaming driver goes down

2014-08-26 Thread Giulio De Vecchi (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110942#comment-14110942
]

Giulio De Vecchi commented on SPARK-1647:
-

Not sure if this make sense, but maybe would be nice to have a kind of flag
available within the code that tells me if I'm running in a normal situation
or during a recovery.
To better explain this, let's consider the following scenario:
I am processing data, let's say from a Kafka streaming, and I am updating a
database based on the computations. During the recovery I don't want to update
again the database (for many reasons, let's just assume that) but I want my
system to be in the same status as before, so I would be able to know if my
code is running for the first time or during a recovery so I can avoid to
update the database again.

More generally I want to know this in case I'm interacting with external
entities.

Prevent data loss when Streaming driver goes down
-

Key: SPARK-1647
URL: https://issues.apache.org/jira/browse/SPARK-1647
Project: Spark
Issue Type: Bug
Components: Streaming
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-1647) Prevent data loss when Streaming driver goes down

[jira] [Comment Edited] (SPARK-1647) Prevent data loss when Streaming driver goes down

[jira] [Comment Edited] (SPARK-1647) Prevent data loss when Streaming driver goes down

[jira] [Commented] (SPARK-1647) Prevent data loss when Streaming driver goes down

4 matches

Site Navigation

Mail list logo

Footer information