from:"Yunjie Qiu \(JIRA\)"

[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

2015-11-17 Thread Yunjie Qiu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008908#comment-15008908
 ] 

Yunjie Qiu edited comment on SPARK-6847 at 11/17/15 4:23 PM:
-

Hi Glyton Camilleri & Jack Hu,
  We also ran into the same StackOverflow issue in our application, where we 
wrote like
{code}
val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))
{code}
  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like
{code}
val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50))
{code}
  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled in {code}_.foreachPartition { ... }{code} so that I 
had to ask here for detail.

Best Regards.


was (Author: yunjie):
Hi Glyton Camilleri & Jack Hu,
  We also ran into the same StackOverflow issue in our application, where we 
wrote like
{code}
val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))
{code}
  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like
{code}
val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50))
{code}
  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled in {code}_.foreachPartition { ... }{code} so that I 
had to ask here for detail.

Best Regards.

> Stack overflow on updateStateByKey which followed by a dstream with 
> checkpoint set
> --
>
> Key: SPARK-6847
> URL: https://issues.apache.org/jira/browse/SPARK-6847
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0
>Reporter: Jack Hu
>  Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}} 
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
> val sparkConf = new SparkConf().setAppName("test")
> val streamingContext = new StreamingContext(sparkConf, Seconds(10))
> streamingContext.checkpoint("""checkpoint""")
> val source = streamingContext.socketTextStream("localhost", )
> val updatedResult = source.map(
> (1,_)).updateStateByKey(
> (newlist : Seq[String], oldstate : Option[String]) => 
> newlist.headOption.orElse(oldstate))
> updatedResult.map(_._2)
> .checkpoint(Seconds(10))
> .foreachRDD((rdd, t) => {
>   println("Deep: " + rdd.toDebugString.split("\n").length)
>   println(t.toString() + ": " + rdd.collect.length)
> })
> streamingContext.start()
> streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over 
> time, the {{updateStateByKey}} never get check-pointed,  and finally, the 
> stack overflow will happen. 
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but 
> not the {{updateStateByKey}} 
> * If remove the {{checkpoint(Seconds(10))}} from the map result ( 
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

2015-11-17 Thread Yunjie Qiu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008908#comment-15008908
 ] 

Yunjie Qiu edited comment on SPARK-6847 at 11/17/15 4:22 PM:
-

Hi Glyton Camilleri & Jack Hu,
  We also ran into the same StackOverflow issue in our application, where we 
wrote like
{code}
val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))
{code}
  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like
{code}
val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50))
{code}
  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled in {code}_.foreachPartition { ... }{code} so that I 
had to ask here for detail.

Best Regards.


was (Author: yunjie):
Hi Glyton Camilleri & Jack Hu,
  We also ran into the same StackOverflow issue in our application, where we 
wrote like
{code}
val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))
{code}
  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like
{code}
val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50))
{code}
?
  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled in {code}_.foreachPartition { ... }{code} so that I 
had to ask here for detail.

Best Regards.

> Stack overflow on updateStateByKey which followed by a dstream with 
> checkpoint set
> --
>
> Key: SPARK-6847
> URL: https://issues.apache.org/jira/browse/SPARK-6847
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0
>Reporter: Jack Hu
>  Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}} 
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
> val sparkConf = new SparkConf().setAppName("test")
> val streamingContext = new StreamingContext(sparkConf, Seconds(10))
> streamingContext.checkpoint("""checkpoint""")
> val source = streamingContext.socketTextStream("localhost", )
> val updatedResult = source.map(
> (1,_)).updateStateByKey(
> (newlist : Seq[String], oldstate : Option[String]) => 
> newlist.headOption.orElse(oldstate))
> updatedResult.map(_._2)
> .checkpoint(Seconds(10))
> .foreachRDD((rdd, t) => {
>   println("Deep: " + rdd.toDebugString.split("\n").length)
>   println(t.toString() + ": " + rdd.collect.length)
> })
> streamingContext.start()
> streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over 
> time, the {{updateStateByKey}} never get check-pointed,  and finally, the 
> stack overflow will happen. 
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but 
> not the {{updateStateByKey}} 
> * If remove the {{checkpoint(Seconds(10))}} from the map result ( 
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

2015-11-17 Thread Yunjie Qiu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008908#comment-15008908
 ] 

Yunjie Qiu edited comment on SPARK-6847 at 11/17/15 4:21 PM:
-

Hi Glyton Camilleri & Jack Hu,
  We also ran into the same StackOverflow issue in our application, where we 
wrote like
{code}
val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))
{code}
  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like
{code}
val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50))
{code}
?
  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled in {code}_.foreachPartition { ... }{code} so that I 
had to ask here for detail.

Best Regards.


was (Author: yunjie):
Hi Glyton Camilleri & Jack Hu,

  We also ran into the same StackOverflow issue in our application, where we 
wrote like

val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))

  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like

val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) ?

  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled _.foreachPartition { ... } that I had to ask here for 
detail.

Best Regards.

> Stack overflow on updateStateByKey which followed by a dstream with 
> checkpoint set
> --
>
> Key: SPARK-6847
> URL: https://issues.apache.org/jira/browse/SPARK-6847
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0
>Reporter: Jack Hu
>  Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}} 
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
> val sparkConf = new SparkConf().setAppName("test")
> val streamingContext = new StreamingContext(sparkConf, Seconds(10))
> streamingContext.checkpoint("""checkpoint""")
> val source = streamingContext.socketTextStream("localhost", )
> val updatedResult = source.map(
> (1,_)).updateStateByKey(
> (newlist : Seq[String], oldstate : Option[String]) => 
> newlist.headOption.orElse(oldstate))
> updatedResult.map(_._2)
> .checkpoint(Seconds(10))
> .foreachRDD((rdd, t) => {
>   println("Deep: " + rdd.toDebugString.split("\n").length)
>   println(t.toString() + ": " + rdd.collect.length)
> })
> streamingContext.start()
> streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over 
> time, the {{updateStateByKey}} never get check-pointed,  and finally, the 
> stack overflow will happen. 
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but 
> not the {{updateStateByKey}} 
> * If remove the {{checkpoint(Seconds(10))}} from the map result ( 
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

2015-11-17 Thread Yunjie Qiu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008908#comment-15008908
 ] 

Yunjie Qiu commented on SPARK-6847:
---

Hi Glyton Camilleri & Jack Hu,

  We also ran into the same StackOverflow issue in our application, where we 
wrote like

val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))

  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like

val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) ?

  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled _.foreachPartition { ... } that I had to ask here for 
detail.

Best Regards.

> Stack overflow on updateStateByKey which followed by a dstream with 
> checkpoint set
> --
>
> Key: SPARK-6847
> URL: https://issues.apache.org/jira/browse/SPARK-6847
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0
>Reporter: Jack Hu
>  Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}} 
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
> val sparkConf = new SparkConf().setAppName("test")
> val streamingContext = new StreamingContext(sparkConf, Seconds(10))
> streamingContext.checkpoint("""checkpoint""")
> val source = streamingContext.socketTextStream("localhost", )
> val updatedResult = source.map(
> (1,_)).updateStateByKey(
> (newlist : Seq[String], oldstate : Option[String]) => 
> newlist.headOption.orElse(oldstate))
> updatedResult.map(_._2)
> .checkpoint(Seconds(10))
> .foreachRDD((rdd, t) => {
>   println("Deep: " + rdd.toDebugString.split("\n").length)
>   println(t.toString() + ": " + rdd.collect.length)
> })
> streamingContext.start()
> streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over 
> time, the {{updateStateByKey}} never get check-pointed,  and finally, the 
> stack overflow will happen. 
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but 
> not the {{updateStateByKey}} 
> * If remove the {{checkpoint(Seconds(10))}} from the map result ( 
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

2015-11-17 Thread Yunjie Qiu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yunjie Qiu updated SPARK-6847:
--
Comment: was deleted

(was: Hi Glyton Camilleri & Jack Hu,

  We also ran into the same StackOverflow issue in our application, where we 
wrote like

val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))

  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like

val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) ?

  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled _.foreachPartition { ... } that I had to ask here for 
detail.

Best Regards.)

> Stack overflow on updateStateByKey which followed by a dstream with 
> checkpoint set
> --
>
> Key: SPARK-6847
> URL: https://issues.apache.org/jira/browse/SPARK-6847
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0
>Reporter: Jack Hu
>  Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}} 
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
> val sparkConf = new SparkConf().setAppName("test")
> val streamingContext = new StreamingContext(sparkConf, Seconds(10))
> streamingContext.checkpoint("""checkpoint""")
> val source = streamingContext.socketTextStream("localhost", )
> val updatedResult = source.map(
> (1,_)).updateStateByKey(
> (newlist : Seq[String], oldstate : Option[String]) => 
> newlist.headOption.orElse(oldstate))
> updatedResult.map(_._2)
> .checkpoint(Seconds(10))
> .foreachRDD((rdd, t) => {
>   println("Deep: " + rdd.toDebugString.split("\n").length)
>   println(t.toString() + ": " + rdd.collect.length)
> })
> streamingContext.start()
> streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over 
> time, the {{updateStateByKey}} never get check-pointed,  and finally, the 
> stack overflow will happen. 
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but 
> not the {{updateStateByKey}} 
> * If remove the {{checkpoint(Seconds(10))}} from the map result ( 
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

2015-11-17 Thread Yunjie Qiu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008909#comment-15008909
 ] 

Yunjie Qiu commented on SPARK-6847:
---

Hi Glyton Camilleri & Jack Hu,

  We also ran into the same StackOverflow issue in our application, where we 
wrote like

val dStream1 = 
context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50))

  I've read about your comments, does it mean that I can get rid of this issue 
by simply add an extra meaningless map() step to dStream1? Or I should do 
something like

val workaroundStream = 
dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) ?

  I was confused by what certainConditionsAreMet refers to, or what kind of the 
content should be filled _.foreachPartition { ... } that I had to ask here for 
detail.

Best Regards.

> Stack overflow on updateStateByKey which followed by a dstream with 
> checkpoint set
> --
>
> Key: SPARK-6847
> URL: https://issues.apache.org/jira/browse/SPARK-6847
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.0
>Reporter: Jack Hu
>  Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}} 
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
> val sparkConf = new SparkConf().setAppName("test")
> val streamingContext = new StreamingContext(sparkConf, Seconds(10))
> streamingContext.checkpoint("""checkpoint""")
> val source = streamingContext.socketTextStream("localhost", )
> val updatedResult = source.map(
> (1,_)).updateStateByKey(
> (newlist : Seq[String], oldstate : Option[String]) => 
> newlist.headOption.orElse(oldstate))
> updatedResult.map(_._2)
> .checkpoint(Seconds(10))
> .foreachRDD((rdd, t) => {
>   println("Deep: " + rdd.toDebugString.split("\n").length)
>   println(t.toString() + ": " + rdd.collect.length)
> })
> streamingContext.start()
> streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over 
> time, the {{updateStateByKey}} never get check-pointed,  and finally, the 
> stack overflow will happen. 
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but 
> not the {{updateStateByKey}} 
> * If remove the {{checkpoint(Seconds(10))}} from the map result ( 
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

[jira] [Issue Comment Deleted] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

6 matches

Site Navigation

Mail list logo

Footer information