[jira] [Comment Edited] (SPARK-22163) Design Issue of Spark Streaming that Causes Random Run-time Exception

Michael N (JIRA) Wed, 04 Oct 2017 19:13:02 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192352#comment-16192352
 ]


Michael N edited comment on SPARK-22163 at 10/5/17 2:11 AM:
------------------------------------------------------------

It is obvious that you don't understand the differences between design flaws vs 
coding bugs, particularly you have not been able to provide the answers to the 
questions of

1. In the first place, why does Spark serialize the application objects 
*asynchronously* while the streaming application is running continuously from 
batch to batch ?

2. If Spark needs to do this type of serialization at all, why does it not do 
at the end of the batch *synchronously* ?

Instead of blindly closing tickets, you need to either either find their 
answers and post them here or let someone else who is capable to address them. 

Btw, your response to the ticket 
https://issues.apache.org/jira/browse/SPARK-21999 where you said

  "Your app is modifying a collection asynchronously w.r.t. Spark. Right"

confirmed that you do not understand the issue.   *This issue occurs in both 
the slave nodes and the driver*.  My app is *not* modifying a collection 
asynchronously w.r.t. Spark.   So you kept making the same invalid claim and 
kept closing the ticket that you do not understand.   My Streaming Spark 
application  is run synchronously by Spark Streaming framework from batch to 
batch. And it modifies the data synchronously as part of the batch processing. 
However, Spark framework has another thread that *asynchronously* serializes 
the application objects.


was (Author: michaeln_apache):
It is obvious that you don't understand the differences between design flaws vs 
coding bugs, particularly you have not been able to provide the answers to the 
questions of

1. In the first place, why does Spark serialize the application objects 
*asynchronously* while the streaming application is running continuously from 
batch to batch ?

2. If Spark needs to do this type of serialization at all, why does it not do 
at the end of the batch *synchronously* ?

Instead of blindly closing tickets, you need to either either find their 
answers and post them here or let someone else who is capable to address them. 

Btw, your response to the ticket 
https://issues.apache.org/jira/browse/SPARK-21999 where you said

  "Your app is modifying a collection asynchronously w.r.t. Spark. Right"

confirmed that you do not understand the issue.  My app is *not* modifying a 
collection asynchronously w.r.t. Spark.   So you kept making the same invalid 
claim and kept closing the ticket that you do not understand.   My Streaming 
Spark application  is run synchronously by Spark Streaming framework from batch 
to batch. And it modifies the data synchronously as part of the batch 
processing. However, Spark framework has another thread that *asynchronously* 
serializes the application objects.

> Design Issue of Spark Streaming that Causes Random Run-time Exception
> ---------------------------------------------------------------------
>
>                 Key: SPARK-22163
>                 URL: https://issues.apache.org/jira/browse/SPARK-22163
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams, Structured Streaming
>    Affects Versions: 2.2.0
>         Environment: Spark Streaming
> Kafka
> Linux
>            Reporter: Michael N
>            Priority: Critical
>
> The application objects can contain List and can be modified dynamically as 
> well.   However, Spark Streaming framework asynchronously serializes the 
> application's objects as the application runs.  Therefore, it causes random 
> run-time exception on the List when Spark Streaming framework happens to 
> serializes the application's objects while the application modifies a List in 
> its own object.  
> In fact, there are multiple bugs reported about
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList.writeObject
> that are permutation of the same root cause. So the design issue of Spark 
> streaming framework is that it should do this serialization asynchronously.  
> Instead, it should either
> 1. do this serialization synchronously. This is preferred to eliminate the 
> issue completely.  Or
> 2. Allow it to be configured per application whether to do this serialization 
> synchronously or asynchronously, depending on the nature of each application.
> Also, Spark documentation should describe the conditions that trigger Spark 
> to do this type of serialization asynchronously, so the applications can work 
> around them until the fix is provided. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-22163) Design Issue of Spark Streaming that Causes Random Run-time Exception

Reply via email to