[jira] [Comment Edited] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable
[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054442#comment-14054442 ] sunshangchun edited comment on SPARK-2201 at 7/8/14 11:12 AM: -- I don't think it's a problem. 1. It's a external module and take no effect on spark core module. 2. spark core module has already used zookeeper to select primary master. 3. The changes is backward compatibility, set a host and port as the flume receiver still works. Thanks was (Author: joyyoj): I don't like it's a problem. It's a external module and take no effect on spark core module. Again, spark core module has already used zookeeper to select leader master. Thanks > Improve FlumeInputDStream's stability and make it scalable > -- > > Key: SPARK-2201 > URL: https://issues.apache.org/jira/browse/SPARK-2201 > Project: Spark > Issue Type: Improvement >Reporter: sunshangchun > > Currently: > FlumeUtils.createStream(ssc, "localhost", port); > This means that only one flume receiver can work with FlumeInputDStream .so > the solution is not scalable. > I use a zookeeper to solve this problem. > Spark flume receivers register themselves to a zk path when started, and a > flume agent get physical hosts and push events to them. > Some works need to be done here: > 1.receiver create tmp node in zk, listeners just watch those tmp nodes. > 2. when spark FlumeReceivers started, they acquire a physical host > (localhost's ip and an idle port) and register itself to zookeeper. > 3. A new flume sink. In the method of appendEvents, they get physical hosts > and push data to them in a round-robin manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable
[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054442#comment-14054442 ] sunshangchun commented on SPARK-2201: - I don't like it's a problem. It's a external module and take no effect on spark core module. Again, spark core module has already used zookeeper to select leader master. Thanks > Improve FlumeInputDStream's stability and make it scalable > -- > > Key: SPARK-2201 > URL: https://issues.apache.org/jira/browse/SPARK-2201 > Project: Spark > Issue Type: Improvement >Reporter: sunshangchun > > Currently: > FlumeUtils.createStream(ssc, "localhost", port); > This means that only one flume receiver can work with FlumeInputDStream .so > the solution is not scalable. > I use a zookeeper to solve this problem. > Spark flume receivers register themselves to a zk path when started, and a > flume agent get physical hosts and push events to them. > Some works need to be done here: > 1.receiver create tmp node in zk, listeners just watch those tmp nodes. > 2. when spark FlumeReceivers started, they acquire a physical host > (localhost's ip and an idle port) and register itself to zookeeper. > 3. A new flume sink. In the method of appendEvents, they get physical hosts > and push data to them in a round-robin manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable
[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053128#comment-14053128 ] sunshangchun commented on SPARK-2201: - I've pulled a request : https://github.com/apache/spark/pull/1310 > Improve FlumeInputDStream's stability and make it scalable > -- > > Key: SPARK-2201 > URL: https://issues.apache.org/jira/browse/SPARK-2201 > Project: Spark > Issue Type: Improvement >Reporter: sunshangchun > > Currently: > FlumeUtils.createStream(ssc, "localhost", port); > This means that only one flume receiver can work with FlumeInputDStream .so > the solution is not scalable. > I use a zookeeper to solve this problem. > Spark flume receivers register themselves to a zk path when started, and a > flume agent get physical hosts and push events to them. > Some works need to be done here: > 1.receiver create tmp node in zk, listeners just watch those tmp nodes. > 2. when spark FlumeReceivers started, they acquire a physical host > (localhost's ip and an idle port) and register itself to zookeeper. > 3. A new flume sink. In the method of appendEvents, they get physical hosts > and push data to them in a round-robin manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable
[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunshangchun updated SPARK-2201: Description: Currently: FlumeUtils.createStream(ssc, "localhost", port); This means that only one flume receiver can work with FlumeInputDStream .so the solution is not scalable. I use a zookeeper to solve this problem. Spark flume receivers register themselves to a zk path when started, and a flume agent get physical hosts and push events to them. Some works need to be done here: 1.receiver create tmp node in zk, listeners just watch those tmp nodes. 2. when spark FlumeReceivers started, they acquire a physical host (localhost's ip and an idle port) and register itself to zookeeper. 3. A new flume sink. In the method of appendEvents, they get physical hosts and push data to them in a round-robin manner. was: Currently only one flume receiver can work with FlumeInputDStream and I am willing to do some works to improve it, my ideas are described as follows: a ip and port denotes a physical host, and a logical host consists of one or more physical hosts In our case, spark flume receivers bind themselves to a logical host when started, and a flume agent get physical hosts and push events to them. Two classes are introduced, LogicalHostRouter supplies a map between logical host and physical host, and LogicalHostRouterListener let relation changes watchable. Some works need to be done here: 1. LogicalHostRouter and LogicalHostRouterListener can be implemented by zookeeper. when physical host started, create tmp node in zk, listeners just watch those tmp nodes. 2. when spark FlumeReceivers started, they acquire a physical host (localhost's ip and an idle port) and register itself to zookeeper. 3. A new flume sink. In the method of appendEvents, they get physical hosts and push data to them in a round-robin manner. Does it a feasible plan? Thanks. Summary: Improve FlumeInputDStream's stability and make it scalable (was: Improve FlumeInputDStream's stability) > Improve FlumeInputDStream's stability and make it scalable > -- > > Key: SPARK-2201 > URL: https://issues.apache.org/jira/browse/SPARK-2201 > Project: Spark > Issue Type: Improvement >Reporter: sunshangchun > > Currently: > FlumeUtils.createStream(ssc, "localhost", port); > This means that only one flume receiver can work with FlumeInputDStream .so > the solution is not scalable. > I use a zookeeper to solve this problem. > Spark flume receivers register themselves to a zk path when started, and a > flume agent get physical hosts and push events to them. > Some works need to be done here: > 1.receiver create tmp node in zk, listeners just watch those tmp nodes. > 2. when spark FlumeReceivers started, they acquire a physical host > (localhost's ip and an idle port) and register itself to zookeeper. > 3. A new flume sink. In the method of appendEvents, they get physical hosts > and push data to them in a round-robin manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2381) streaming receiver crashed,but seems nothing happened
sunshangchun created SPARK-2381: --- Summary: streaming receiver crashed,but seems nothing happened Key: SPARK-2381 URL: https://issues.apache.org/jira/browse/SPARK-2381 Project: Spark Issue Type: Bug Reporter: sunshangchun when we submit a streaming job and if receivers doesn't start normally, the application should stop itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2379) stopReceive in dead loop, cause stackoverflow exception
sunshangchun created SPARK-2379: --- Summary: stopReceive in dead loop, cause stackoverflow exception Key: SPARK-2379 URL: https://issues.apache.org/jira/browse/SPARK-2379 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: sunshangchun streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisor.scala stop will call stopReceiver and stopReceiver will call stop if exception occurs, that make a dead loop. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream's stability
[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunshangchun updated SPARK-2201: Summary: Improve FlumeInputDStream's stability (was: Improve FlumeInputDStream) > Improve FlumeInputDStream's stability > - > > Key: SPARK-2201 > URL: https://issues.apache.org/jira/browse/SPARK-2201 > Project: Spark > Issue Type: Improvement >Reporter: sunshangchun > > Currently only one flume receiver can work with FlumeInputDStream and I am > willing to do some works to improve it, my ideas are described as follows: > a ip and port denotes a physical host, and a logical host consists of one or > more physical hosts > In our case, spark flume receivers bind themselves to a logical host when > started, and a flume agent get physical hosts and push events to them. > Two classes are introduced, LogicalHostRouter supplies a map between logical > host and physical host, and LogicalHostRouterListener let relation changes > watchable. > Some works need to be done here: > 1. LogicalHostRouter and LogicalHostRouterListener can be implemented by > zookeeper. when physical host started, create tmp node in zk, listeners just > watch those tmp nodes. > 2. when spark FlumeReceivers started, they acquire a physical host > (localhost's ip and an idle port) and register itself to zookeeper. > 3. A new flume sink. In the method of appendEvents, they get physical hosts > and push data to them in a round-robin manner. > Does it a feasible plan? Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream
[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunshangchun updated SPARK-2201: Description: Currently only one flume receiver can work with FlumeInputDStream and I am willing to do some works to improve it, my ideas are described as follows: a ip and port denotes a physical host, and a logical host consists of one or more physical hosts In our case, spark flume receivers bind themselves to a logical host when started, and a flume agent get physical hosts and push events to them. Two classes are introduced, LogicalHostRouter supplies a map between logical host and physical host, and LogicalHostRouterListener let relation changes watchable. Some works need to be done here: 1. LogicalHostRouter and LogicalHostRouterListener can be implemented by zookeeper. when physical host started, create tmp node in zk, listeners just watch those tmp nodes. 2. when spark FlumeReceivers started, they acquire a physical host (localhost's ip and an idle port) and register itself to zookeeper. 3. A new flume sink. In the method of appendEvents, they get physical hosts and push data to them in a round-robin manner. Does it a feasible plan? Thanks. > Improve FlumeInputDStream > - > > Key: SPARK-2201 > URL: https://issues.apache.org/jira/browse/SPARK-2201 > Project: Spark > Issue Type: Improvement >Reporter: sunshangchun > > Currently only one flume receiver can work with FlumeInputDStream and I am > willing to do some works to improve it, my ideas are described as follows: > a ip and port denotes a physical host, and a logical host consists of one or > more physical hosts > In our case, spark flume receivers bind themselves to a logical host when > started, and a flume agent get physical hosts and push events to them. > Two classes are introduced, LogicalHostRouter supplies a map between logical > host and physical host, and LogicalHostRouterListener let relation changes > watchable. > Some works need to be done here: > 1. LogicalHostRouter and LogicalHostRouterListener can be implemented by > zookeeper. when physical host started, create tmp node in zk, listeners just > watch those tmp nodes. > 2. when spark FlumeReceivers started, they acquire a physical host > (localhost's ip and an idle port) and register itself to zookeeper. > 3. A new flume sink. In the method of appendEvents, they get physical hosts > and push data to them in a round-robin manner. > Does it a feasible plan? Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2201) Improve FlumeInputDStream
sunshangchun created SPARK-2201: --- Summary: Improve FlumeInputDStream Key: SPARK-2201 URL: https://issues.apache.org/jira/browse/SPARK-2201 Project: Spark Issue Type: Improvement Reporter: sunshangchun -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1998) SparkFlumeEvent with body bigger than 1020 bytes are not read properly
[ https://issues.apache.org/jira/browse/SPARK-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026053#comment-14026053 ] sunshangchun commented on SPARK-1998: - I've pulled a request here(https://github.com/apache/spark/pull/951) Does anyone can submit and resolve it ? > SparkFlumeEvent with body bigger than 1020 bytes are not read properly > -- > > Key: SPARK-1998 > URL: https://issues.apache.org/jira/browse/SPARK-1998 > Project: Spark > Issue Type: Bug >Reporter: sun.sam > Attachments: patch.diff > > -- This message was sent by Atlassian JIRA (v6.2#6252)