Broadcasting Large Objects Fails?

2015-02-22 Thread TJ Klein
Hi, I am trying to broadcast large objects (order of a couple of 100 MBs). However, I keep getting errors when trying to do so: Traceback (most recent call last): File /LORM_experiment.py, line 510, in module broadcast_gradient_function = sc.broadcast(gradient_function) File

Running Example Spark Program

2015-02-22 Thread Surendran Duraisamy
Hello All, I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark Examples in my Ubuntu System. I downloaded spark-1.2.1-bin-hadoop2.4.tgz http://www.apache.org/dyn/closer.cgi/spark/spark-1.2.1/spark-1.2.1-bin-hadoop2.4.tgz and started sbin/start-master.sh After starting

Re: Running Example Spark Program

2015-02-22 Thread Jason Bell
If you would like a morr detailed walkthrough I wrote one recently. https://dataissexy.wordpress.com/2015/02/03/apache-spark-standalone-clusters-bigdata-hadoop-spark/ Regards Jason Bell On 22 Feb 2015 14:16, VISHNU SUBRAMANIAN johnfedrickena...@gmail.com wrote: Try restarting your Spark

RE: Spark SQL odbc on Windows

2015-02-22 Thread Ashic Mahtab
Hi Francisco,While I haven't tried this, have a look at the contents of start-thriftserver.sh - all it's doing is setting up a few variables and calling: /bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 and passing some additional parameters. Perhaps doing the

Re: Running Example Spark Program

2015-02-22 Thread VISHNU SUBRAMANIAN
Try restarting your Spark cluster . ./sbin/stop-all.sh ./sbin/start-all.sh Thanks, Vishnu On Sun, Feb 22, 2015 at 7:30 PM, Surendran Duraisamy 2013ht12...@wilp.bits-pilani.ac.in wrote: Hello All, I am new to Apache Spark, I am trying to run JavaKMeans.java from Spark Examples in my Ubuntu

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Tassilo Klein
Hi Akhil, thanks for your reply. I am using the latest version of Spark 1.2.1 (also tried 1.3 developer branch). If I am not mistaken the TorrentBroadcast is the default there, isn't it? Thanks, Tassilo On Sun, Feb 22, 2015 at 10:59 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Did you try

Re: Running Example Spark Program

2015-02-22 Thread Surendran Duraisamy
Thank You Jason, Got the program working after setting SPARK_WORKER_CORES SPARK_WORKER_MEMORY While running the program from eclipse, got strange ClassNotFoundException. In JavaKMeans.java, ParsePoint is static inner class. When running the program I got ClassNotFound for ParsePoint. I have

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Hi Francisco, Out of curiosity - why ROLAP mode using multi-dimensional mode (vs tabular) from SSAS to Spark? As a past SSAS guy you've definitely piqued my interest. The one thing that you may run into is that the SQL generated by SSAS can be quite convoluted. When we were doing the same thing

Re: Spark performance tuning

2015-02-22 Thread Akhil Das
You can simply follow these http://spark.apache.org/docs/1.2.0/tuning.html Thanks Best Regards On Sun, Feb 22, 2015 at 1:14 AM, java8964 java8...@hotmail.com wrote: Can someone share some ideas about how to tune the GC time? Thanks -- From: java8...@hotmail.com

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Akhil Das
Did you try with torrent broadcast factory? Thanks Best Regards On Sun, Feb 22, 2015 at 3:29 PM, TJ Klein tjkl...@gmail.com wrote: Hi, I am trying to broadcast large objects (order of a couple of 100 MBs). However, I keep getting errors when trying to do so: Traceback (most recent call

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Tassilo Klein
I see, thanks. Yes, I have tried already all sorts of changes to these parameters. Unfortunately, none of seem had any impact. Thanks, Tassilo On Sun, Feb 22, 2015 at 1:24 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Yes it is, you have some more customizable options over here

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Back to thrift, there was an earlier thread on this topic at http://mail-archives.apache.org/mod_mbox/spark-user/201411.mbox/%3CCABPQxsvXA-ROPeXN=wjcev_n9gv-drqxujukbp_goutvnyx...@mail.gmail.com%3E that may be useful as well. On Sun Feb 22 2015 at 8:42:29 AM Denny Lee denny.g@gmail.com wrote:

Re: Broadcasting Large Objects Fails?

2015-02-22 Thread Akhil Das
Yes it is, you have some more customizable options over here http://spark.apache.org/docs/1.2.0/configuration.html#compression-and-serialization Thanks Best Regards On Sun, Feb 22, 2015 at 11:47 PM, Tassilo Klein tjkl...@gmail.com wrote: Hi Akhil, thanks for your reply. I am using the

Spark SQL odbc on Windows

2015-02-22 Thread Francisco Orchard
Hello, I work on a MS consulting company and we are evaluating including SPARK on our BigData offer. We are particulary interested into testing SPARK as rolap engine for SSAS but we cannot find a way to activate the odbc server (thrift) on a Windows custer. There is no start-thriftserver.sh

Re: Missing shuffle files

2015-02-22 Thread Sameer Farooqui
Do you guys have dynamic allocation turned on for YARN? Anders, was Task 450 in your job acting like a Reducer and fetching the Map spill output data from a different node? If a Reducer task can't read the remote data it needs, that could cause the stage to fail. Sometimes this forces the

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread Tom Vacek
The SparkConf doesn't allow you to set arbitrary variables. You can use SparkContext's HadoopRDD and create a JobConf (with whatever variables you want), and then grab them out of the JobConf in your RecordReader. On Sun, Feb 22, 2015 at 4:28 PM, hnahak harihar1...@gmail.com wrote: Hi, I

Re: Posting to the list

2015-02-22 Thread hnahak
I'm also facing the same issue, this is third time whenever I post anything it never accept by the community and at the same time got a failure mail in my register mail id. and when click to subscribe to this mailing list link, i didnt get any new subscription mail in my inbox. Please anyone

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2015-02-22 Thread Ted Yu
Haven't found the method in http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SchemaRDD The new DataFrame has this method: /** * Returns the content of the [[DataFrame]] as an [[RDD]] of [[Row]]s. * @group rdd */ def rdd: RDD[Row] = { FYI On Sun, Feb

How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread hnahak
Hi, I have written custom InputFormat and RecordReader for Spark, I need to use user variables from spark client program. I added them in SparkConf val sparkConf = new SparkConf().setAppName(args(0)).set(developer,MyName) *and in InputFormat class* protected boolean

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2015-02-22 Thread stephane.collot
Hi Michael, I think that the feature (convert a SchemaRDD to a structured class RDD) is now available. But I didn't understand in the PR how exactly to do this. Can you give an example or doc links? Best regards -- View this message in context:

Re: Posting to the list

2015-02-22 Thread Ted Yu
bq. i didnt get any new subscription mail in my inbox. Have you checked your Spam folder ? Cheers On Sun, Feb 22, 2015 at 2:36 PM, hnahak harihar1...@gmail.com wrote: I'm also facing the same issue, this is third time whenever I post anything it never accept by the community and at the same

How to integrate HBASE on Spark

2015-02-22 Thread sandeep vura
Hi I had installed spark on 3 node cluster. Spark services are up and running.But i want to integrate hbase on spark Do i need to install HBASE on hadoop cluster or spark cluster. Please let me know asap. Regards, Sandeep.v

Submitting jobs to Spark EC2 cluster remotely

2015-02-22 Thread olegshirokikh
I've set up the EC2 cluster with Spark. Everything works, all master/slaves are up and running. I'm trying to submit a sample job (SparkPi). When I ssh to cluster and submit it from there - everything works fine. However when driver is created on a remote host (my laptop), it doesn't work. I've

Re: cannot run spark shell in yarn-client mode

2015-02-22 Thread quangnguyenbh
Does anyone fix this error ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cannot-run-spark-shell-in-yarn-client-mode-tp4013p21761.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to integrate HBASE on Spark

2015-02-22 Thread Akhil Das
If you are having both the clusters on the same network, then i'd suggest you installing it on the hadoop cluster. If you install it on the spark cluster itself, then hbase might take up a few cpu cycles and there's a chance for the job to lag. Thanks Best Regards On Mon, Feb 23, 2015 at 12:48

Re: Posting to the list

2015-02-22 Thread haihar nahak
I checked it but I didn't see any mail from user list. Let me do it one more time. [image: Inline image 1] --Harihar On Mon, Feb 23, 2015 at 11:50 AM, Ted Yu yuzhih...@gmail.com wrote: bq. i didnt get any new subscription mail in my inbox. Have you checked your Spam folder ? Cheers On

Re: Use Spark Streaming for Batch?

2015-02-22 Thread Tobias Pfeiffer
Hi, On Sat, Feb 21, 2015 at 1:05 AM, craigv craigvanderbo...@gmail.com wrote: /Might it be possible to perform large batches processing on HDFS time series data using Spark Streaming?/ 1.I understand that there is not currently an InputDStream that could do what's needed. I would have

Re: Any sample code for Kafka consumer

2015-02-22 Thread Tathagata Das
Spark Streaming already directly supports Kafka http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources Is there any reason why that is not sufficient? TD On Sun, Feb 22, 2015 at 5:18 PM, mykidong mykid...@gmail.com wrote: In java, you can see this example:

Re: Launching Spark cluster on EC2 with Ubuntu AMI

2015-02-22 Thread Ted Yu
bq. bash: git: command not found Looks like the AMI doesn't have git pre-installed. Cheers On Sun, Feb 22, 2015 at 4:29 PM, olegshirokikh o...@solver.com wrote: I'm trying to launch Spark cluster on AWS EC2 with custom AMI (Ubuntu) using the following: ./ec2/spark-ec2 --key-pair=***

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread haihar nahak
Thanks. I extract hadoop configuration and set a my arbitrary variable and able to get inside InputFormat from JobContext.configuration On Mon, Feb 23, 2015 at 12:04 PM, Tom Vacek minnesota...@gmail.com wrote: The SparkConf doesn't allow you to set arbitrary variables. You can use

Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?

2015-02-22 Thread hnahak
Instead of setting in SparkConf , set it into SparkContext.hadoopconfiguration.set(key,value) and from JobContext extract same key. --Harihar -- View this message in context:

Launching Spark cluster on EC2 with Ubuntu AMI

2015-02-22 Thread olegshirokikh
I'm trying to launch Spark cluster on AWS EC2 with custom AMI (Ubuntu) using the following: ./ec2/spark-ec2 --key-pair=*** --identity-file='/home/***.pem' --region=us-west-2 --zone=us-west-2b --spark-version=1.2.1 --slaves=2 --instance-type=t2.micro --ami=ami-29ebb519 --user=ubuntu launch

Re: Any sample code for Kafka consumer

2015-02-22 Thread mykidong
In java, you can see this example: https://github.com/mykidong/spark-kafka-simple-consumer-receiver - Kidong. -- Original Message -- From: icecreamlc [via Apache Spark User List] ml-node+s1001560n21746...@n3.nabble.com To: mykidong mykid...@gmail.com Sent: 2015-02-21 오전 11:16:37

Re: Use Spark Streaming for Batch?

2015-02-22 Thread Soumitra Kumar
See if https://issues.apache.org/jira/browse/SPARK-3660 helps you. My patch has been accepted and, this enhancement is scheduled for 1.3.0. This lets you specify initialRDD for updateStateByKey operation. Let me know if you need any information. On Sun, Feb 22, 2015 at 5:21 PM, Tobias Pfeiffer