Hi,
I was going through the SparkPageRank code and want to see the intermediate
steps, like the RDDs formed in the intermediate steps.
Here is a part of the code along with the lines that I added in order to
print the RDDs.
I want to print the *parts* in the code (denoted by the comment in Bold
Anybody? Example of how to desearalize FlumeEvent data using Scala
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Flume-event-tp12569p12709.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I have the following problem with Spark Streaming API. I am currently
streaming input data via KAFKA to Spark Streaming, with which I plan to do
some preprocessing for the data. Then, I'd like to save the data to Parquet
file system and query it with Impala.
However, Spark is writing the data
Hello Yin,
Additional note:
In ./bin/spark-shell --jars s3n:/mybucket/myudf.jar I got the following
message in console.
Waring skipped external jar..
Thanks and Regards,
Sankar S.
On , S Malligarjunan smalligarju...@yahoo.com wrote:
Hello Yin,
I have tried use sc.addJar and
Hi,
What kind of error do you receive?
Best regards,
Jörn
Le 24 août 2014 08:29, Deep Pradhan pradhandeep1...@gmail.com a écrit :
Hi,
I was going through the SparkPageRank code and want to see the
intermediate steps, like the RDDs formed in the intermediate steps.
Here is a part of the
bsd
I am new to the Spark Streaming and have some issues which i can't find any
documentation stuff to answer them
I believe a lot of Spark users in general and Spark Streaming in particular
use it for analysis of events by calculation of distributed large
aggregations.
In case i have to digest
Hi folks,
I have been trying to run the AMPLab’s twitter streaming example
(http://ampcamp.berkeley.edu/big-data-mini-course/realtime-processing-with-spark-streaming.html)
in the last 2 days.I have encountered the same error messages as shown below:
14/08/24 17:14:22 ERROR
Could you be hitting this? https://issues.apache.org/jira/browse/SPARK-3178
On Sun, Aug 24, 2014 at 10:21 AM, Forest D dev24a...@gmail.com wrote:
Hi folks,
I have been trying to run the AMPLab’s twitter streaming example
Hi,
I would like to create multiple key-value pairs, where all keys still can be
reduced. For instance, I have the following 2 lines:
A,B,C
B,D
I would like to return the following pairs for the first line:
A,B
A,C
B,A
B,C
C,A
C,B
And for the second
B,D
D,B
After a reduce by key, I want to end
You are looking for the method flatMapToPair. It takes a
PairFlatMapFunction, which is something that returns an Iterable of
Tuple2 of K,V. You end up with a JavaPairRDD of K and V as desired.
On Sun, Aug 24, 2014 at 9:15 PM, Tom thubregt...@gmail.com wrote:
Hi,
I would like to create multiple
Thanks for the reference! Many tests are not designed for big data:
http://magazine.amstat.org/blog/2010/09/01/statrevolution/ . So we
need to understand which tests are proper. Feel free to create a JIRA
and let's move our discussion there. -Xiangrui
On Fri, Aug 22, 2014 at 8:44 PM, guxiaobo1982
Hello,
I am trying to use the RDD pipe method to integrate Spark with external
commands to be executed on each partition. My program roughly looks like:
rdd.pipe(cmd1).pipe(cmd2)
The output of cmd1 and input of cmd2 is raw binary data.
However, the pipe method in RDD requires converting data
We are using HDFS for log storage where logs are flushed to HDFS every minute,
with a new file created for each hour. We would like to consume these logs
using spark streaming.
The docs state that new HDFS will be picked up, but does Spark Streaming
support HDFS appends?
—
Dean Chen
Hi,
On Mon, Aug 25, 2014 at 9:56 AM, Dean Chen deanch...@gmail.com wrote:
We are using HDFS for log storage where logs are flushed to HDFS every
minute, with a new file created for each hour. We would like to consume
these logs using spark streaming.
The docs state that new HDFS will be
Hi,
computations are triggered by an output operation. No output operation, no
computation. Therefore in your code example,
On Thu, Aug 21, 2014 at 11:58 PM, Josh J joshjd...@gmail.com wrote:
JavaPairReceiverInputDStreamString, String messages =
15 matches
Mail list logo