[jira] [Updated] (SPARK-20692) unknowing delay in event timeline

2017-05-10 Thread Zhiwen Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiwen Sun updated SPARK-20692:
---
Description: 
Spark streaming job with 1s interval.

Process time of micro batch suddenly became to 4s while is is usually 0.4s .

When we check where the time spent, we find a unknown delay in job. 

There is no executor computing or shuffle reading. It is about 4s blank in 
event timeline, 

event timeline snapshot is in attachment.


  was:
Spark streaming job with 1s interval.

Process time of micro batch suddenly became to 4s while is is usually 0.4s .

when we check where time spent, we find a unknown delay in job. there is no 
executor computing or shuffle reading. About 4s blank in event timeline, 

event timeline snapshot is in attachment.



> unknowing delay in event timeline
> -
>
> Key: SPARK-20692
> URL: https://issues.apache.org/jira/browse/SPARK-20692
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.2
> Environment: Spark 1.6.1 + kafka 0.8.2
>Reporter: Zhiwen Sun
> Attachments: screenshot-1.png
>
>
> Spark streaming job with 1s interval.
> Process time of micro batch suddenly became to 4s while is is usually 0.4s .
> When we check where the time spent, we find a unknown delay in job. 
> There is no executor computing or shuffle reading. It is about 4s blank in 
> event timeline, 
> event timeline snapshot is in attachment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20692) unknowing delay in event timeline

2017-05-10 Thread Zhiwen Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiwen Sun updated SPARK-20692:
---
Description: 
Spark streaming job with 1s interval.

Process time of micro batch suddenly became to 4s while is is usually 0.4s .

when we check where time spent, we find a unknown delay in job. there is no 
executor computing or shuffle reading. About 4s blank in event timeline, 

event timeline snapshot is in attachment.


  was:
Spark streaming job with 1s interval.

Process time of micro batch suddenly became to 4s while is is usually 0.4s .

when we check where time spent, we find a unknown delay in job. there is no 
executor computing or shuffle reading. About 4s blank in event timeline, 



> unknowing delay in event timeline
> -
>
> Key: SPARK-20692
> URL: https://issues.apache.org/jira/browse/SPARK-20692
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.2
> Environment: Spark 1.6.1 + kafka 0.8.2
>Reporter: Zhiwen Sun
> Attachments: screenshot-1.png
>
>
> Spark streaming job with 1s interval.
> Process time of micro batch suddenly became to 4s while is is usually 0.4s .
> when we check where time spent, we find a unknown delay in job. there is no 
> executor computing or shuffle reading. About 4s blank in event timeline, 
> event timeline snapshot is in attachment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20692) unknowing delay in event timeline

2017-05-10 Thread Zhiwen Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiwen Sun updated SPARK-20692:
---
Attachment: screenshot-1.png

> unknowing delay in event timeline
> -
>
> Key: SPARK-20692
> URL: https://issues.apache.org/jira/browse/SPARK-20692
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.2
> Environment: Spark 1.6.1 + kafka 0.8.2
>Reporter: Zhiwen Sun
> Attachments: screenshot-1.png
>
>
> Spark streaming job with 1s interval.
> Process time of micro batch suddenly became to 4s while is is usually 0.4s .
> when we check where time spent, we find a unknown delay in job. there is no 
> executor computing or shuffle reading. About 4s blank in event timeline, 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20692) unknowing delay in event timeline

2017-05-10 Thread Zhiwen Sun (JIRA)
Zhiwen Sun created SPARK-20692:
--

 Summary: unknowing delay in event timeline
 Key: SPARK-20692
 URL: https://issues.apache.org/jira/browse/SPARK-20692
 Project: Spark
  Issue Type: Bug
  Components: DStreams
Affects Versions: 1.6.2
 Environment: Spark 1.6.1 + kafka 0.8.2
Reporter: Zhiwen Sun


Spark streaming job with 1s interval.

Process time of micro batch suddenly became to 4s while is is usually 0.4s .

when we check where time spent, we find a unknown delay in job. there is no 
executor computing or shuffle reading. About 4s blank in event timeline, 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18525) Kafka DirectInputStream cannot be aware of new partition

2016-11-22 Thread Zhiwen Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689015#comment-15689015
 ] 

Zhiwen Sun commented on SPARK-18525:


Hi Cody:

Thanks for your reply.

We are still use kafka 0.8.2 , Is there a way to solve this problem?

> Kafka DirectInputStream cannot be aware of new partition
> 
>
> Key: SPARK-18525
> URL: https://issues.apache.org/jira/browse/SPARK-18525
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.0.2
>Reporter: Zhiwen Sun
>
> It seems that DirectKafkaInputStream does not support read new partition when 
> spark streaming is running.
> Related spark code:
> https://github.com/apache/spark/blob/v2.0.2/external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala#L101
> How to produce it:
> {code:title=KafkaDirectTest.scala|borderStyle=solid}
> object KafkaDirectTest {
>   def main(args: Array[String]) {
> val conf = new SparkConf().setAppName("kafka direct test 5")
> conf.setIfMissing("spark.master", "local[3]")
> conf.set("spark.streaming.kafka.maxRatePerPartition", "10")
> val ssc = new StreamingContext(conf, Seconds(1))
> val zkQuorum = Config("common").getString("kafka.zkquorum")
> val topic = "test_use"
> val groupId = "stream-test-0809"
> val kafkaParams = Map(
>   "metadata.broker.list" -> "dev-002:9092,dev-004:9092",
>   "group.id" -> groupId
> )
> val fromOffsets: Map[TopicAndPartition, Long] = Map(
>   new TopicAndPartition(topic, 0) -> 0L,
>   new TopicAndPartition(topic, 1) -> 0L,
>   new TopicAndPartition(topic, 2) -> 0L,
>   new TopicAndPartition(topic, 3) -> 0L
> )
> val messageHandler = (mmd: MessageAndMetadata[String, String]) => mmd
> val lines = KafkaUtils.createDirectStream[String, String, StringDecoder, 
> StringDecoder](ssc, kafkaParams, Set(topic))
> lines.foreachRDD { rdd =>
>   rdd.foreach { row =>
> println(s"\n row: ${row} ")
>   }
>   val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
>   offsetRanges.foreach { offset =>
> println(s"\n- offset: ${offset.topic} ${offset.partition} 
> ${offset.fromOffset} ${offset.untilOffset}")
>   }
> }
> ssc.start()
> ssc.awaitTermination()
>   }
> }
> {code}
> 1. start the job
> 2. add new partition of test_use topic
> The job cannot read new partition data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18525) Kafka DirectInputStream cannot be aware of new partition

2016-11-21 Thread Zhiwen Sun (JIRA)
Zhiwen Sun created SPARK-18525:
--

 Summary: Kafka DirectInputStream cannot be aware of new partition
 Key: SPARK-18525
 URL: https://issues.apache.org/jira/browse/SPARK-18525
 Project: Spark
  Issue Type: Improvement
  Components: Input/Output
Affects Versions: 2.0.2
Reporter: Zhiwen Sun


It seems that DirectKafkaInputStream does not support read new partition when 
spark streaming is running.

Related spark code:

https://github.com/apache/spark/blob/v2.0.2/external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala#L101


How to produce it:

{code:title=KafkaDirectTest.scala|borderStyle=solid}
object KafkaDirectTest {

  def main(args: Array[String]) {
val conf = new SparkConf().setAppName("kafka direct test 5")
conf.setIfMissing("spark.master", "local[3]")
conf.set("spark.streaming.kafka.maxRatePerPartition", "10")
val ssc = new StreamingContext(conf, Seconds(1))

val zkQuorum = Config("common").getString("kafka.zkquorum")
val topic = "test_use"
val groupId = "stream-test-0809"

val kafkaParams = Map(
  "metadata.broker.list" -> "dev-002:9092,dev-004:9092",
  "group.id" -> groupId
)

val fromOffsets: Map[TopicAndPartition, Long] = Map(
  new TopicAndPartition(topic, 0) -> 0L,
  new TopicAndPartition(topic, 1) -> 0L,
  new TopicAndPartition(topic, 2) -> 0L,
  new TopicAndPartition(topic, 3) -> 0L
)

val messageHandler = (mmd: MessageAndMetadata[String, String]) => mmd

val lines = KafkaUtils.createDirectStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, Set(topic))

lines.foreachRDD { rdd =>
  rdd.foreach { row =>
println(s"\n row: ${row} ")
  }

  val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
  offsetRanges.foreach { offset =>
println(s"\n- offset: ${offset.topic} ${offset.partition} 
${offset.fromOffset} ${offset.untilOffset}")
  }
}

ssc.start()
ssc.awaitTermination()

  }

}
{code}

1. start the job
2. add new partition of test_use topic

The job cannot read new partition data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org