Re: [X-post] Saving SparkSQL result RDD to Cassandra

2015-07-09 Thread Su She
will be registered as an output stream and therefore materialized. Change it to a map, foreach or some other form of transform. HTH -Todd On Thu, Jul 9, 2015 at 5:24 PM, Su She suhsheka...@gmail.com wrote: Hello All, I also posted this on the Spark/Datastax thread, but thought it was also

[X-post] Saving SparkSQL result RDD to Cassandra

2015-07-09 Thread Su She
Hello All, I also posted this on the Spark/Datastax thread, but thought it was also 50% a spark question (or mostly a spark question). I was wondering what is the best practice to saving streaming Spark SQL (

Re: HDFS Rest Service not available

2015-06-02 Thread Su She
directory and typing sbin/stop-dfs.sh and then sbin/start-dfs.sh Thanks Best Regards On Tue, Jun 2, 2015 at 5:03 AM, Su She suhsheka...@gmail.com wrote: Hello All, A bit scared I did something stupid...I killed a few PIDs that were listening to ports 2183 (kafka), 4042 (spark app), some

HDFS Rest Service not available

2015-06-01 Thread Su She
Hello All, A bit scared I did something stupid...I killed a few PIDs that were listening to ports 2183 (kafka), 4042 (spark app), some of the PIDs didn't even seem to be stopped as they still are running when i do lsof -i:[port number] I'm not sure if the problem started after or before I did

Trouble trying to run ./spark-ec2 script

2015-05-13 Thread Su She
I'm trying to set up my own cluster and am having trouble running this script: ./spark-ec2 --key-pair=xx --identity-file=xx.pem --region=us-west-2 --zone=us-west-2c --num-slaves=1 launch my-spark-cluster based off: https://spark.apache.org/docs/latest/ec2-scripts.html It just tries to open the

Re: Trouble trying to run ./spark-ec2 script

2015-05-13 Thread Su She
Hmm, just tried to run it again, but opened the script with python, the cmd line seemed to pop up really quick and exited. On Wed, May 13, 2015 at 2:06 PM, Su She suhsheka...@gmail.com wrote: Hi Ted, Yes I do have Python 3.5 installed. I just ran py from the ec2 directory and it started up

Re: Getting error running MLlib example with new cluster

2015-05-11 Thread Su She
! On Mon, Apr 27, 2015 at 11:48 AM, Su She suhsheka...@gmail.com wrote: Hello Xiangrui, I am using this spark-submit command (as I do for all other jobs): /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-submit --class MLlib --master local[2] --jars $(echo /home/ec2-user

Re: Getting error running MLlib example with new cluster

2015-04-27 Thread Su She
/learning-spark/target/simple-project-1.1.jar Thank you for the help! Best, Su On Mon, Apr 27, 2015 at 9:58 AM, Xiangrui Meng men...@gmail.com wrote: How did you run the example app? Did you use spark-submit? -Xiangrui On Thu, Apr 23, 2015 at 2:27 PM, Su She suhsheka...@gmail.com wrote: Sorry

Getting error running MLlib example with new cluster

2015-04-23 Thread Su She
I had asked this question before, but wanted to ask again as I think it is related to my pom file or project setup. I have been trying on/off for the past month to try to run this MLlib example: - To unsubscribe, e-mail:

Getting error running MLlib example with new cluster

2015-04-23 Thread Su She
Sorry, accidentally sent the last email before finishing. I had asked this question before, but wanted to ask again as I think it is now related to my pom file or project setup. Really appreciate the help! I have been trying on/off for the past month to try to run this MLlib example:

value reduceByKeyAndWindow is not a member of org.apache.spark.streaming.dstream.DStream[(String, Int)]

2015-04-07 Thread Su She
Hello Everyone, I am trying to implement this example (Spark Streaming with Twitter). https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala I am able to do: hashTags.print() to get a live stream of filtered hashtags,

Re: MLlib Spam example gets stuck in Stage X

2015-03-30 Thread Su She
AM, Xiangrui Meng men...@gmail.com wrote: +Holden, Joseph It seems that there is something wrong with the sample data file: https://github.com/databricks/learning-spark/blob/master/files/ham.txt -Xiangrui On Fri, Mar 27, 2015 at 1:03 PM, Su She suhsheka...@gmail.com wrote: Hello

Re: MLlib Spam example gets stuck in Stage X

2015-03-20 Thread Su She
On Thu, Mar 19, 2015 at 1:15 PM, Su She suhsheka...@gmail.com wrote: Hi Akhil, 1) How could I see how much time it is spending on stage 1? Or what if, like above, it doesn't get past stage 1? 2) How could I check if its a GC time? and where would I increase the parallelism

MLlib Spam example gets stuck in Stage X

2015-03-19 Thread Su She
Hello Everyone, I am trying to run this MLlib example from Learning Spark: https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48 Things I'm doing differently: 1) Using spark shell instead of an application 2) instead of

Re: MLlib Spam example gets stuck in Stage X

2015-03-19 Thread Su She
time it spend on Stage 1. See if its a GC time, then try increasing the level of parallelism or repartition it like sc.getDefaultParallelism*3. Thanks Best Regards On Thu, Mar 19, 2015 at 12:15 PM, Su She suhsheka...@gmail.com wrote: Hello Everyone, I am trying to run this MLlib example

Re: Running Scala Word Count Using Maven

2015-03-16 Thread Su She
Hello, So actually solved the problem...see point 3. Here are a few approaches/errors I was getting: 1) mvn package exec:java -Dexec.mainClass=HelloWorld Error: java.lang.ClassNotFoundException: HelloWorld 2)

Re: What joda-time dependency does spark submit use/need?

2015-03-02 Thread Su She
these jars (joda-time-2.7.jar, joda-convert-1.7.jar) either as part of your build and assembly or via the --jars option to spark-submit. HTH. On Fri, Feb 27, 2015 at 2:48 PM, Su She suhsheka...@gmail.com wrote: Hello Everyone, I'm having some issues launching (non-spark) applications via

What joda-time dependency does spark submit use/need?

2015-02-27 Thread Su She
Hello Everyone, I'm having some issues launching (non-spark) applications via the spark-submit commands. The common error I am getting is c/p below. I am able to submit a spark streaming/kafka spark application, but can't start a dynamoDB java app. The common error is related to joda-time. 1) I

Re: Why are there different parts in my CSV?

2015-02-14 Thread Su She
http://stackoverflow.com/questions/23527941/how-to-write-to-csv-in-spark Just read this...seems like it should be easily readable. Thanks! On Sat, Feb 14, 2015 at 1:36 AM, Su She suhsheka...@gmail.com wrote: Thanks Akhil for the link. Is there a reason why there is a new directory created

Re: Why are there different parts in my CSV?

2015-02-14 Thread Su She
) Thanks Best Regards On Sat, Feb 14, 2015 at 2:18 AM, Su She suhsheka...@gmail.com wrote: Thanks Akhil for the suggestion, it is now only giving me one part - . Is there anyway I can just create a file rather than a directory? It doesn't seem like there is just a saveAsTextFile

Re: Why are there different parts in my CSV?

2015-02-14 Thread Su She
with Spark. Here's and example for doing that https://issues.apache.org/jira/browse/SPARK-944 Thanks Best Regards On Sat, Feb 14, 2015 at 2:55 PM, Su She suhsheka...@gmail.com wrote: Hello Akhil, thank you for your continued help! 1) So, if I can write it in programitically after every batch

Re: Why are there different parts in my CSV?

2015-02-14 Thread Su She
to the merged files /merged-ouput, true(to delete the original dir),null) Thanks Best Regards On Sat, Feb 14, 2015 at 2:18 AM, Su She suhsheka...@gmail.com wrote: Thanks Akhil for the suggestion, it is now only giving me one part - . Is there anyway I can just create

Re: Why are there different parts in my CSV?

2015-02-14 Thread Su She
at a directory of these files. On Sat, Feb 14, 2015 at 9:05 PM, Su She suhsheka...@gmail.com wrote: Thanks Sean and Akhil! I will take out the repartition(1). Please let me know if I understood this correctly, Spark Streamingwrites data like this: foo-1001.csv/part -x, part-x foo

Re: Why are there different parts in my CSV?

2015-02-13 Thread Su She
a repartition before the saveAs* call. messages.repartition(1).saveAsHadoopFiles(hdfs://user/ec2-user/,csv,String.class, String.class, (Class) TextOutputFormat.class); Thanks Best Regards On Fri, Feb 13, 2015 at 11:59 AM, Su She suhsheka...@gmail.com wrote: Hello Everyone, I am writing

Why are there different parts in my CSV?

2015-02-12 Thread Su She
Hello Everyone, I am writing simple word counts to hdfs using messages.saveAsHadoopFiles(hdfs://user/ec2-user/,csv,String.class, String.class, (Class) TextOutputFormat.class); 1) However, each 2 seconds I getting a new *directory *that is titled as a csv. So i'll have test.csv, which will be a

Re: Can spark job server be used to visualize streaming data?

2015-02-12 Thread Su She
it works. See here: https://github.com/andypetrella/spark-notebook From: Su She Date: Thursday, February 12, 2015 at 1:55 AM To: Felix C Cc: Kelvin Chu, user@spark.apache.org Subject: Re: Can spark job server be used to visualize streaming data? Hello Felix, I am already streaming

Re: Can spark job server be used to visualize streaming data?

2015-02-11 Thread Su She
...@hotmail.com wrote: Checkout https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html In there are links to how that is done. --- Original Message --- From: Kelvin Chu 2dot7kel...@gmail.com Sent: February 10, 2015 12:48 PM To: Su She suhsheka...@gmail.com

Re: Can spark job server be used to visualize streaming data?

2015-02-11 Thread Su She
--- Original Message --- From: Su She suhsheka...@gmail.com Sent: February 11, 2015 10:23 AM To: Felix C felixcheun...@hotmail.com Cc: Kelvin Chu 2dot7kel...@gmail.com, user@spark.apache.org Subject: Re: Can spark job server be used to visualize streaming data? Thank you Felix and Kelvin. I

Can spark job server be used to visualize streaming data?

2015-02-09 Thread Su She
Hello Everyone, I was reading this blog post: http://homes.esat.kuleuven.be/~bioiuser/blog/a-d3-visualisation-from-spark-as-a-service/ and was wondering if this approach can be taken to visualize streaming data...not just historical data? Thank you! -Suh

Best tools for visualizing Spark Streaming data?

2015-02-05 Thread Su She
Hello Everyone, I wanted to hear the community's thoughts on what (open - source) tools have been used to visualize data from Spark/Spark Streaming? I've taken a look at Zepellin, but had some trouble working with it. Couple questions: 1) I've looked at a couple blog posts and it seems like

Trouble deploying spark program because of soft link?

2015-01-30 Thread Su She
Hello Everyone, A bit confused on this one...I have set up the KafkaWordCount found here: https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java Everything runs fine when I run it using this on instance A:

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-27 Thread Su She
in that case) On 27 Jan 2015 07:25, Su She suhsheka...@gmail.com wrote: Hello Sean and Akhil, I shut down the services on Cloudera Manager. I shut them down in the appropriate order and then stopped all services of CM. I then shut down my instances. I then turned my instances back on, but I

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-26 Thread Su She
the steps correctly across your whole cluster. I'm not sure if the stock stop-all.sh script is supposed to work. Certainly, if you are using CM, by far the easiest is to start/stop all of these things in CM. On Wed, Jan 21, 2015 at 6:08 PM, Su She suhsheka...@gmail.com wrote: Hello Sean Akhil

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Su She
do) and then shutdown the machines. You can execute the following command to disable safe mode: hadoop fs -safemode leave Thanks Best Regards On Sat, Jan 17, 2015 at 8:31 AM, Su She suhsheka...@gmail.com wrote: Hello Everyone, I am encountering trouble running Spark

HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-16 Thread Su She
Hello Everyone, I am encountering trouble running Spark applications when I shut down my EC2 instances. Everything else seems to work except Spark. When I try running a simple Spark application, like sc.parallelize() I get the message that hdfs name node is in safemode. Has anyone else had this

Re: Getting Output From a Cluster

2015-01-12 Thread Su She
Hello Everyone, Quick followup, is there any way I can append output to one file rather then create a new directory/file every X milliseconds? Thanks! Suhas Shekar University of California, Los Angeles B.A. Economics, Specialization in Computing 2014 On Thu, Jan 8, 2015 at 11:41 PM, Su She

Re: Getting Output From a Cluster

2015-01-12 Thread Su She
, then you can repartition the data to 1 before saving. Another way would be to use hadoop's copy merge command/api(available from 2.0 versions) On 13 Jan 2015 01:08, Su She suhsheka...@gmail.com wrote: Hello Everyone, Quick followup, is there any way I can append output to one file rather

Re: Getting Output From a Cluster

2015-01-08 Thread Su She
yourStream.saveAsNewAPIHadoopFiles(hdfsUrl, /output-location,Text.class, Text.class, outputFormatClass); Thanks Best Regards On Fri, Jan 9, 2015 at 10:22 AM, Su She suhsheka...@gmail.com wrote: Yes, I am calling the saveAsHadoopFiles on the Dstream. However, when I call print on the Dstream it works

Re: Getting Output From a Cluster

2015-01-08 Thread Su She
calling the saveAsText files on the DStream --looks like it? Look at the section called Design Patterns for using foreachRDD in the link you sent -- you want to do dstream.foreachRDD(rdd = rdd.saveAs) On Thu, Jan 8, 2015 at 5:20 PM, Su She suhsheka...@gmail.com wrote: Hello Everyone, Thanks

Getting Output From a Cluster

2015-01-08 Thread Su She
Hello Everyone, Thanks in advance for the help! I successfully got my Kafka/Spark WordCount app to print locally. However, I want to run it on a cluster, which means that I will have to save it to HDFS if I want to be able to read the output. I am running Spark 1.1.0, which means according to