Re: How to use FlumeInputDStream in spark cluster?

2014-12-01 Thread Ping Tang
Thank you very much for your reply. I have a cluster of 8 nodes: m1, m2, m3.. m8. m1 configured as Spark master node, the rest of the nodes are all worker node. I also configured m3 as the History Server. But the history server fails to start.I ran FlumeEventCount in m1 using the right

-Error stopping receiver in running Spark+Flume sample code FlumeEventCount.scala

2014-11-10 Thread Ping Tang
Hi, Can somebody help me to understand why this error occurred? 2014-11-10 00:17:44,512 INFO [Executor task launch worker-0] receiver.BlockGenerator (Logging.scala:logInfo(59)) - Started BlockGenerator 2014-11-10 00:17:44,513 INFO [Executor task launch worker-0]

Question regarding sorting and grouping

2014-11-05 Thread Ping Tang
Hi, I’m working on an use case using Spark streaming. I need to process a RDD of strings so that they will be grouped by IP and sorted by time. Could somebody tell me the right transformation? Input: 2014-10-23 08:18:38,904 [192.168.10.1] 2014-10-23 08:18:38,907 [192.168.10.1] ccc

Errors in Spark streaming application due to HDFS append

2014-11-05 Thread Ping Tang
Hi All, I’m trying to write streaming processed data in HDFS (Hadoop 2). The buffer is flushed and closed after each writing. The following errors occurred when opening the same file to append. I know for sure the error is caused by closing the file. Any idea? Here is the code to write HDFS