[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008908#comment-15008908 ] Yunjie Qiu edited comment on SPARK-6847 at 11/17/15 4:23 PM: - Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like {code} val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) {code} I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like {code} val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) {code} I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled in {code}_.foreachPartition { ... }{code} so that I had to ask here for detail. Best Regards. was (Author: yunjie): Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like {code} val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) {code} I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like {code} val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) {code} I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled in {code}_.foreachPartition { ... }{code} so that I had to ask here for detail. Best Regards. > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008908#comment-15008908 ] Yunjie Qiu edited comment on SPARK-6847 at 11/17/15 4:22 PM: - Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like {code} val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) {code} I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like {code} val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) {code} I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled in {code}_.foreachPartition { ... }{code} so that I had to ask here for detail. Best Regards. was (Author: yunjie): Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like {code} val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) {code} I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like {code} val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) {code} ? I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled in {code}_.foreachPartition { ... }{code} so that I had to ask here for detail. Best Regards. > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008908#comment-15008908 ] Yunjie Qiu edited comment on SPARK-6847 at 11/17/15 4:21 PM: - Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like {code} val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) {code} I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like {code} val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) {code} ? I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled in {code}_.foreachPartition { ... }{code} so that I had to ask here for detail. Best Regards. was (Author: yunjie): Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) ? I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled _.foreachPartition { ... } that I had to ask here for detail. Best Regards. > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008908#comment-15008908 ] Yunjie Qiu commented on SPARK-6847: --- Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) ? I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled _.foreachPartition { ... } that I had to ask here for detail. Best Regards. > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yunjie Qiu updated SPARK-6847: -- Comment: was deleted (was: Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) ? I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled _.foreachPartition { ... } that I had to ask here for detail. Best Regards.) > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008909#comment-15008909 ] Yunjie Qiu commented on SPARK-6847: --- Hi Glyton Camilleri & Jack Hu, We also ran into the same StackOverflow issue in our application, where we wrote like val dStream1 = context.union(kafkaStreams).updateStateByKey(updateFunc).checkpoint(Seconds(50)) I've read about your comments, does it mean that I can get rid of this issue by simply add an extra meaningless map() step to dStream1? Or I should do something like val workaroundStream = dStream1.map(...).checkpoint(Seconds(some_value_other_than_50)) ? I was confused by what certainConditionsAreMet refers to, or what kind of the content should be filled _.foreachPartition { ... } that I had to ask here for detail. Best Regards. > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org