Re: Migrate Relational to Distributed

2015-05-23 Thread Dmitry Tolpeko
Hi Brant, Let me partially answer to your concerns: please follow a new open source project PL/HQL (www.plhql.org) aimed at allowing you to reuse existing logic and leverage existing skills at some extent, so you do not need to rewrite everything to Scala/Java and can do this gradually. I hope it

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Vadim Bichutskiy
Yes, we're running Spark on EC2. Will transition to EMR soon. -Vadim ᐧ On Sat, May 23, 2015 at 2:22 PM, Johan Beisser j...@caustic.org wrote: Yes. We're looking at bootstrapping in EMR... On Sat, May 23, 2015 at 07:21 Joe Wass jw...@crossref.org wrote: I used Spark on EC2 a while ago

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Joe Wass
Sorry guys, my email submitted before I finished writing it. Check my other message (with the same subject)! On 23 May 2015 at 20:25, Shafaq s.abdullah...@gmail.com wrote: Yes-Spark EC2 cluster . Looking into migrating to spark emr. Adding more ec2 is not possible afaik. On May 23, 2015 11:22

??????spark.executor.extraClassPath - Values not picked up by executors

2015-05-23 Thread wesley.miao
My experience is don't put any application specific settings into spark-defaults.conf which is applied to all applications. Instead, you can either set them programmatically as what you did below or through spark-submit. Also, if you still like to do it via spark-defaults.conf, you will have

Strange ClassNotFound exeption

2015-05-23 Thread boci
Hi guys! I have a small spark application. It's query some data from postgres, enrich it and write to elasticsearch. When I deployed into spark container I got a very fustrating error: https://gist.github.com/b0c1/66527e00bada1e4c0dc3 Spark version: 1.3.1 Hadoop version: 2.6.0 Additional info:

Re: DataFrame groupBy vs RDD groupBy

2015-05-23 Thread ayan guha
Hi Michael This is great info. I am currently using repartitionandsort function to achieve the same. Is this the recommended way till 1.3 or is there any better way? On 23 May 2015 07:38, Michael Armbrust mich...@databricks.com wrote: DataFrames have a lot more information about the data, so

Re: Strange ClassNotFound exeption

2015-05-23 Thread Ted Yu
In my local maven repo, I found: $ jar tvf /Users/tyu/.m2/repository//org/spark-project/akka/akka-actor_2.10/2.3.4-spark/akka-actor_2.10-2.3.4-spark.jar | grep SelectionPath 521 Mon Sep 29 12:05:36 PDT 2014 akka/actor/SelectionPathElement.class Is the above jar in your classpath ? On Sat,

SparkSQL can't read S3 path for hive external table

2015-05-23 Thread ogoh
Hello, I am using Spark1.3 in AWS. SparkSQL can't recognize Hive external table on S3. The following is the error message. I appreciate any help. Thanks, Okehee -- 15/05/24 01:02:18 ERROR thriftserver.SparkSQLDriver: Failed in [select count(*) from api_search where pdate='2015-05-08']

?????? ?????? ?????? How to use spark to access HBase with Security enabled

2015-05-23 Thread donhoff_h
Hi, The exception is the same as before. Just like the following: 2015-05-23 18:01:40,943 ERROR [hconnection-0x14027b82-shared--pool1-t1] ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.

Re: [Streaming] Non-blocking recommendation in custom receiver documentation and KinesisReceiver's worker.run blocking calll

2015-05-23 Thread Aniket Bhatnagar
Hi TD Unfortunately, I am off for a week so I won't be able to test this until next week. Will keep you posted. Aniket On Sat, May 23, 2015, 6:16 AM Tathagata Das t...@databricks.com wrote: Hey Aniket, I just checked in the fix in Spark master and branch-1.4. Could you download Spark and

split function on spark sql created rdd

2015-05-23 Thread kali.tumm...@gmail.com
Hi All, I am trying to do word count on number of tweets, my first step is to get data from table using spark sql and then run split function on top of it to calculate word count. Error:- valuse split is not a member of org.apache.spark.sql.SchemaRdd Spark Code that doesn't work to do word

Is anyone using Amazon EC2?

2015-05-23 Thread Joe Wass
I used Spark on EC2 a while ago

Not able to run SparkPi locally

2015-05-23 Thread Sujit Pal
Hello all, This is probably me doing something obviously wrong, would really appreciate some pointers on how to fix this. I installed spark-1.3.1-bin-hadoop2.6.tgz from the Spark download page [ https://spark.apache.org/downloads.html] and just untarred it on a local drive. I am on Mac OSX

Re: Doubts about SparkSQL

2015-05-23 Thread Ram Sriharsha
Yes it does ... you can try out the following example (the People dataset that comes with Spark). There is an inner query that filters on age and an outer query that filters on name. The physical plan applies a single composite filter on name and age as you can see below sqlContext.sql(select *

Re: Not able to run SparkPi locally

2015-05-23 Thread Sujit Pal
Replying to my own email in case someone has the same or similar issue. On a hunch I ran this against my Linux (Ubuntu 14.04 with JDK 8) box. Not only did bin/run-example SparkPi run without any problems, it also provided a very helpful message in the output. 15/05/23 08:35:15 WARN Utils: Your

Doubts about SparkSQL

2015-05-23 Thread Renato Marroquín Mogrovejo
Hi all, I have some doubts about the latest SparkSQL. 1. In the paper about SparkSQL it has been stated that The physical planner also performs rule-based physical optimizations, such as pipelining projections or filters into one Spark map operation. ... If dealing with a query of the form:

Re: Help reading Spark UI tea leaves..

2015-05-23 Thread Shay Seng
Thanks! I was getting a little confused by this partitioner business, I thought that by default a pairRDD would be partitioned by a HashPartitioner? Was this possibly the case in 0.9.3 but not in 1.x? In anycase, I tried your suggestion and the shuffle was removed. Cheers. One small question

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Johan Beisser
Yes. We're looking at bootstrapping in EMR... On Sat, May 23, 2015 at 07:21 Joe Wass jw...@crossref.org wrote: I used Spark on EC2 a while ago

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Shafaq
Yes-Spark EC2 cluster . Looking into migrating to spark emr. Adding more ec2 is not possible afaik. On May 23, 2015 11:22 AM, Johan Beisser j...@caustic.org wrote: Yes. We're looking at bootstrapping in EMR... On Sat, May 23, 2015 at 07:21 Joe Wass jw...@crossref.org wrote: I used Spark on

Re: Spark Streaming: all tasks running on one executor (Kinesis + Mongodb)

2015-05-23 Thread Mike Trienis
Yup, and since I have only one core per executor it explains why there was only one executor utilized. I'll need to investigate which EC2 instance type is going to be the best fit. Thanks Evo. On Fri, May 22, 2015 at 3:47 PM, Evo Eftimov evo.efti...@isecc.com wrote: A receiver occupies a cpu

Re: spark.executor.extraClassPath - Values not picked up by executors

2015-05-23 Thread Todd Nist
Hi Yana, Yes typeo in the eamil, file name is correct spark-defaults.conf; thanks though. So it appears to work if in the driver is specify it as part of the sparkConf: val conf = new SparkConf().setAppName(getClass.getSimpleName) .set(spark.executor.extraClassPath,

Re: SparkSQL failing while writing into S3 for 'insert into table'

2015-05-23 Thread Cheolsoo Park
It seems it generated query results into tmp dir firstly, and tries to rename it into the right folder finally. But, it failed while renaming it. This problem exists not only in SparkSQL but also in any Hadoop tools (e.g. Hive, Pig, etc) when using with s3. Usually, It is better to write task