[jira] [Comment Edited] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable

2014-07-08 Thread sunshangchun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054442#comment-14054442
 ] 

sunshangchun edited comment on SPARK-2201 at 7/8/14 11:12 AM:
--

I don't think it's a problem. 
1. It's a external module and take no effect on spark core module.
2. spark core module has already used zookeeper to select primary master.
3. The changes is backward compatibility, set a host and port as the flume 
receiver still works.

Thanks 


was (Author: joyyoj):
I don't like it's a problem. 
It's a external module and take no effect on spark core module.
Again, spark core module has already used zookeeper to select leader master.

Thanks 

> Improve FlumeInputDStream's stability and make it scalable
> --
>
> Key: SPARK-2201
> URL: https://issues.apache.org/jira/browse/SPARK-2201
> Project: Spark
>  Issue Type: Improvement
>Reporter: sunshangchun
>
> Currently:
> FlumeUtils.createStream(ssc, "localhost", port); 
> This means that only one flume receiver can work with FlumeInputDStream .so 
> the solution is not scalable. 
> I use a zookeeper to solve this problem.
> Spark flume receivers register themselves to a zk path when started, and a 
> flume agent get physical hosts and push events to them.
> Some works need to be done here: 
> 1.receiver create tmp node in zk,  listeners just watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host 
> (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts 
> and push data to them in a round-robin manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable

2014-07-07 Thread sunshangchun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054442#comment-14054442
 ] 

sunshangchun commented on SPARK-2201:
-

I don't like it's a problem. 
It's a external module and take no effect on spark core module.
Again, spark core module has already used zookeeper to select leader master.

Thanks 

> Improve FlumeInputDStream's stability and make it scalable
> --
>
> Key: SPARK-2201
> URL: https://issues.apache.org/jira/browse/SPARK-2201
> Project: Spark
>  Issue Type: Improvement
>Reporter: sunshangchun
>
> Currently:
> FlumeUtils.createStream(ssc, "localhost", port); 
> This means that only one flume receiver can work with FlumeInputDStream .so 
> the solution is not scalable. 
> I use a zookeeper to solve this problem.
> Spark flume receivers register themselves to a zk path when started, and a 
> flume agent get physical hosts and push events to them.
> Some works need to be done here: 
> 1.receiver create tmp node in zk,  listeners just watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host 
> (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts 
> and push data to them in a round-robin manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable

2014-07-06 Thread sunshangchun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053128#comment-14053128
 ] 

sunshangchun commented on SPARK-2201:
-

I've pulled a request :
https://github.com/apache/spark/pull/1310

> Improve FlumeInputDStream's stability and make it scalable
> --
>
> Key: SPARK-2201
> URL: https://issues.apache.org/jira/browse/SPARK-2201
> Project: Spark
>  Issue Type: Improvement
>Reporter: sunshangchun
>
> Currently:
> FlumeUtils.createStream(ssc, "localhost", port); 
> This means that only one flume receiver can work with FlumeInputDStream .so 
> the solution is not scalable. 
> I use a zookeeper to solve this problem.
> Spark flume receivers register themselves to a zk path when started, and a 
> flume agent get physical hosts and push events to them.
> Some works need to be done here: 
> 1.receiver create tmp node in zk,  listeners just watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host 
> (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts 
> and push data to them in a round-robin manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable

2014-07-06 Thread sunshangchun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunshangchun updated SPARK-2201:


Description: 
Currently:
FlumeUtils.createStream(ssc, "localhost", port); 
This means that only one flume receiver can work with FlumeInputDStream .so the 
solution is not scalable. 
I use a zookeeper to solve this problem.

Spark flume receivers register themselves to a zk path when started, and a 
flume agent get physical hosts and push events to them.

Some works need to be done here: 
1.receiver create tmp node in zk,  listeners just watch those tmp nodes.
2. when spark FlumeReceivers started, they acquire a physical host (localhost's 
ip and an idle port) and register itself to zookeeper.
3. A new flume sink. In the method of appendEvents, they get physical hosts and 
push data to them in a round-robin manner.


  was:
Currently only one flume receiver can work with FlumeInputDStream and I am 
willing to do some works to improve it, my ideas are described as follows: 

a ip and port denotes a physical host, and a logical host consists of one or 
more physical hosts

In our case, spark flume receivers bind themselves to a logical host when 
started, and a flume agent get physical hosts and push events to them.
Two classes are introduced, LogicalHostRouter supplies a map between logical 
host and physical host, and LogicalHostRouterListener let relation changes 
watchable.

Some works need to be done here: 
1. LogicalHostRouter and LogicalHostRouterListener  can be implemented by 
zookeeper. when physical host started, create tmp node in zk,  listeners just 
watch those tmp nodes.
2. when spark FlumeReceivers started, they acquire a physical host (localhost's 
ip and an idle port) and register itself to zookeeper.
3. A new flume sink. In the method of appendEvents, they get physical hosts and 
push data to them in a round-robin manner.

Does it a feasible plan? Thanks.


Summary: Improve FlumeInputDStream's stability and make it scalable  
(was: Improve FlumeInputDStream's stability)

> Improve FlumeInputDStream's stability and make it scalable
> --
>
> Key: SPARK-2201
> URL: https://issues.apache.org/jira/browse/SPARK-2201
> Project: Spark
>  Issue Type: Improvement
>Reporter: sunshangchun
>
> Currently:
> FlumeUtils.createStream(ssc, "localhost", port); 
> This means that only one flume receiver can work with FlumeInputDStream .so 
> the solution is not scalable. 
> I use a zookeeper to solve this problem.
> Spark flume receivers register themselves to a zk path when started, and a 
> flume agent get physical hosts and push events to them.
> Some works need to be done here: 
> 1.receiver create tmp node in zk,  listeners just watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host 
> (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts 
> and push data to them in a round-robin manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2381) streaming receiver crashed,but seems nothing happened

2014-07-06 Thread sunshangchun (JIRA)
sunshangchun created SPARK-2381:
---

 Summary: streaming receiver crashed,but seems nothing happened
 Key: SPARK-2381
 URL: https://issues.apache.org/jira/browse/SPARK-2381
 Project: Spark
  Issue Type: Bug
Reporter: sunshangchun


when we submit a streaming job and if receivers doesn't start normally, the 
application should stop itself. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2379) stopReceive in dead loop, cause stackoverflow exception

2014-07-06 Thread sunshangchun (JIRA)
sunshangchun created SPARK-2379:
---

 Summary: stopReceive in dead loop, cause stackoverflow exception
 Key: SPARK-2379
 URL: https://issues.apache.org/jira/browse/SPARK-2379
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: sunshangchun


streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisor.scala
stop will call stopReceiver and stopReceiver will call stop if exception 
occurs, that make a dead loop.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream's stability

2014-06-19 Thread sunshangchun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunshangchun updated SPARK-2201:


Summary: Improve FlumeInputDStream's stability  (was: Improve 
FlumeInputDStream)

> Improve FlumeInputDStream's stability
> -
>
> Key: SPARK-2201
> URL: https://issues.apache.org/jira/browse/SPARK-2201
> Project: Spark
>  Issue Type: Improvement
>Reporter: sunshangchun
>
> Currently only one flume receiver can work with FlumeInputDStream and I am 
> willing to do some works to improve it, my ideas are described as follows: 
> a ip and port denotes a physical host, and a logical host consists of one or 
> more physical hosts
> In our case, spark flume receivers bind themselves to a logical host when 
> started, and a flume agent get physical hosts and push events to them.
> Two classes are introduced, LogicalHostRouter supplies a map between logical 
> host and physical host, and LogicalHostRouterListener let relation changes 
> watchable.
> Some works need to be done here: 
> 1. LogicalHostRouter and LogicalHostRouterListener  can be implemented by 
> zookeeper. when physical host started, create tmp node in zk,  listeners just 
> watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host 
> (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts 
> and push data to them in a round-robin manner.
> Does it a feasible plan? Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream

2014-06-19 Thread sunshangchun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunshangchun updated SPARK-2201:


Description: 
Currently only one flume receiver can work with FlumeInputDStream and I am 
willing to do some works to improve it, my ideas are described as follows: 

a ip and port denotes a physical host, and a logical host consists of one or 
more physical hosts

In our case, spark flume receivers bind themselves to a logical host when 
started, and a flume agent get physical hosts and push events to them.
Two classes are introduced, LogicalHostRouter supplies a map between logical 
host and physical host, and LogicalHostRouterListener let relation changes 
watchable.

Some works need to be done here: 
1. LogicalHostRouter and LogicalHostRouterListener  can be implemented by 
zookeeper. when physical host started, create tmp node in zk,  listeners just 
watch those tmp nodes.
2. when spark FlumeReceivers started, they acquire a physical host (localhost's 
ip and an idle port) and register itself to zookeeper.
3. A new flume sink. In the method of appendEvents, they get physical hosts and 
push data to them in a round-robin manner.

Does it a feasible plan? Thanks.


> Improve FlumeInputDStream
> -
>
> Key: SPARK-2201
> URL: https://issues.apache.org/jira/browse/SPARK-2201
> Project: Spark
>  Issue Type: Improvement
>Reporter: sunshangchun
>
> Currently only one flume receiver can work with FlumeInputDStream and I am 
> willing to do some works to improve it, my ideas are described as follows: 
> a ip and port denotes a physical host, and a logical host consists of one or 
> more physical hosts
> In our case, spark flume receivers bind themselves to a logical host when 
> started, and a flume agent get physical hosts and push events to them.
> Two classes are introduced, LogicalHostRouter supplies a map between logical 
> host and physical host, and LogicalHostRouterListener let relation changes 
> watchable.
> Some works need to be done here: 
> 1. LogicalHostRouter and LogicalHostRouterListener  can be implemented by 
> zookeeper. when physical host started, create tmp node in zk,  listeners just 
> watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host 
> (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts 
> and push data to them in a round-robin manner.
> Does it a feasible plan? Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (SPARK-2201) Improve FlumeInputDStream

2014-06-19 Thread sunshangchun (JIRA)
sunshangchun created SPARK-2201:
---

 Summary: Improve FlumeInputDStream
 Key: SPARK-2201
 URL: https://issues.apache.org/jira/browse/SPARK-2201
 Project: Spark
  Issue Type: Improvement
Reporter: sunshangchun






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (SPARK-1998) SparkFlumeEvent with body bigger than 1020 bytes are not read properly

2014-06-09 Thread sunshangchun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026053#comment-14026053
 ] 

sunshangchun commented on SPARK-1998:
-

I've pulled a request here(https://github.com/apache/spark/pull/951)
Does anyone can submit and resolve it ?


> SparkFlumeEvent with body bigger than 1020 bytes are not read properly
> --
>
> Key: SPARK-1998
> URL: https://issues.apache.org/jira/browse/SPARK-1998
> Project: Spark
>  Issue Type: Bug
>Reporter: sun.sam
> Attachments: patch.diff
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)