date:20140824

Printing the RDDs in SparkPageRank

2014-08-24 Thread Deep Pradhan

Hi, I was going through the SparkPageRank code and want to see the intermediate steps, like the RDDs formed in the intermediate steps. Here is a part of the code along with the lines that I added in order to print the RDDs. I want to print the *parts* in the code (denoted by the comment in Bold

Re: Spark Streaming with Flume event

2014-08-24 Thread Spidy

Anybody? Example of how to desearalize FlumeEvent data using Scala -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Flume-event-tp12569p12709.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

How to make Spark Streaming write its output so that Impala can read it?

2014-08-24 Thread rafeeq s

I have the following problem with Spark Streaming API. I am currently streaming input data via KAFKA to Spark Streaming, with which I plan to do some preprocessing for the data. Then, I'd like to save the data to Parquet file system and query it with Impala. However, Spark is writing the data

Re: Spark SQL Parser error

2014-08-24 Thread S Malligarjunan

Hello Yin, Additional note: In ./bin/spark-shell --jars s3n:/mybucket/myudf.jar I got the following message in console. Waring skipped external jar.. Thanks and Regards, Sankar S. On , S Malligarjunan smalligarju...@yahoo.com wrote: Hello Yin, I have tried use sc.addJar and

Re: Printing the RDDs in SparkPageRank

2014-08-24 Thread Jörn Franke

Hi, What kind of error do you receive? Best regards, Jörn Le 24 août 2014 08:29, Deep Pradhan pradhandeep1...@gmail.com a écrit : Hi, I was going through the SparkPageRank code and want to see the intermediate steps, like the RDDs formed in the intermediate steps. Here is a part of the

Spark Streaming API and Performance Clarifications

2014-08-24 Thread didi

bsd I am new to the Spark Streaming and have some issues which i can't find any documentation stuff to answer them I believe a lot of Spark users in general and Spark Streaming in particular use it for analysis of events by calculation of distributed large aggregations. In case i have to digest

amp lab spark streaming twitter example

2014-08-24 Thread Forest D

Hi folks, I have been trying to run the AMPLab’s twitter streaming example (http://ampcamp.berkeley.edu/big-data-mini-course/realtime-processing-with-spark-streaming.html) in the last 2 days.I have encountered the same error messages as shown below: 14/08/24 17:14:22 ERROR

Re: amp lab spark streaming twitter example

2014-08-24 Thread Jonathan Haddad

Could you be hitting this? https://issues.apache.org/jira/browse/SPARK-3178 On Sun, Aug 24, 2014 at 10:21 AM, Forest D dev24a...@gmail.com wrote: Hi folks, I have been trying to run the AMPLab’s twitter streaming example

Return multiple [K,V] pairs from a Java Function

2014-08-24 Thread Tom

Hi, I would like to create multiple key-value pairs, where all keys still can be reduced. For instance, I have the following 2 lines: A,B,C B,D I would like to return the following pairs for the first line: A,B A,C B,A B,C C,A C,B And for the second B,D D,B After a reduce by key, I want to end

Re: Return multiple [K,V] pairs from a Java Function

2014-08-24 Thread Sean Owen

You are looking for the method flatMapToPair. It takes a PairFlatMapFunction, which is something that returns an Iterable of Tuple2 of K,V. You end up with a JavaPairRDD of K and V as desired. On Sun, Aug 24, 2014 at 9:15 PM, Tom thubregt...@gmail.com wrote: Hi, I would like to create multiple

Re: What about implementing various hypothesis test for LogisticRegression in MLlib

2014-08-24 Thread Xiangrui Meng

Thanks for the reference! Many tests are not designed for big data: http://magazine.amstat.org/blog/2010/09/01/statrevolution/ . So we need to understand which tests are proper. Feel free to create a JIRA and let's move our discussion there. -Xiangrui On Fri, Aug 22, 2014 at 8:44 PM, guxiaobo1982

pipe raw binary data

2014-08-24 Thread Emeric, Viel

Hello, I am trying to use the RDD pipe method to integrate Spark with external commands to be executed on each partition. My program roughly looks like: rdd.pipe(cmd1).pipe(cmd2) The output of cmd1 and input of cmd2 is raw binary data. However, the pipe method in RDD requires converting data

Spark Stream + HDFS Append

2014-08-24 Thread Dean Chen

We are using HDFS for log storage where logs are flushed to HDFS every minute, with a new file created for each hour. We would like to consume these logs using spark streaming. The docs state that new HDFS will be picked up, but does Spark Streaming support HDFS appends? — Dean Chen

Re: Spark Stream + HDFS Append

2014-08-24 Thread Tobias Pfeiffer

Hi, On Mon, Aug 25, 2014 at 9:56 AM, Dean Chen deanch...@gmail.com wrote: We are using HDFS for log storage where logs are flushed to HDFS every minute, with a new file created for each hour. We would like to consume these logs using spark streaming. The docs state that new HDFS will be

Re: multiple windows from the same DStream ?

2014-08-24 Thread Tobias Pfeiffer

Hi, computations are triggered by an output operation. No output operation, no computation. Therefore in your code example, On Thu, Aug 21, 2014 at 11:58 PM, Josh J joshjd...@gmail.com wrote: JavaPairReceiverInputDStreamString, String messages =

Printing the RDDs in SparkPageRank

Re: Spark Streaming with Flume event

How to make Spark Streaming write its output so that Impala can read it?

Re: Spark SQL Parser error

Re: Printing the RDDs in SparkPageRank

Spark Streaming API and Performance Clarifications

amp lab spark streaming twitter example

Re: amp lab spark streaming twitter example

Return multiple [K,V] pairs from a Java Function

Re: Return multiple [K,V] pairs from a Java Function

Re: What about implementing various hypothesis test for LogisticRegression in MLlib

pipe raw binary data

Spark Stream + HDFS Append

Re: Spark Stream + HDFS Append

Re: multiple windows from the same DStream ?

15 matches

Site Navigation

Mail list logo

Footer information