Hi Brant,
Let me partially answer to your concerns: please follow a new open source
project PL/HQL (www.plhql.org) aimed at allowing you to reuse existing
logic and leverage existing skills at some extent, so you do not need to
rewrite everything to Scala/Java and can do this gradually. I hope it
Yes, we're running Spark on EC2. Will transition to EMR soon. -Vadim
ᐧ
On Sat, May 23, 2015 at 2:22 PM, Johan Beisser j...@caustic.org wrote:
Yes.
We're looking at bootstrapping in EMR...
On Sat, May 23, 2015 at 07:21 Joe Wass jw...@crossref.org wrote:
I used Spark on EC2 a while ago
Sorry guys, my email submitted before I finished writing it. Check my other
message (with the same subject)!
On 23 May 2015 at 20:25, Shafaq s.abdullah...@gmail.com wrote:
Yes-Spark EC2 cluster . Looking into migrating to spark emr.
Adding more ec2 is not possible afaik.
On May 23, 2015 11:22
My experience is don't put any application specific settings into
spark-defaults.conf which is applied to all applications.
Instead, you can either set them programmatically as what you did below or
through spark-submit.
Also, if you still like to do it via spark-defaults.conf, you will have
Hi guys!
I have a small spark application. It's query some data from postgres,
enrich it and write to elasticsearch. When I deployed into spark container
I got a very fustrating error:
https://gist.github.com/b0c1/66527e00bada1e4c0dc3
Spark version: 1.3.1
Hadoop version: 2.6.0
Additional info:
Hi Michael
This is great info. I am currently using repartitionandsort function to
achieve the same. Is this the recommended way till 1.3 or is there any
better way?
On 23 May 2015 07:38, Michael Armbrust mich...@databricks.com wrote:
DataFrames have a lot more information about the data, so
In my local maven repo, I found:
$ jar tvf
/Users/tyu/.m2/repository//org/spark-project/akka/akka-actor_2.10/2.3.4-spark/akka-actor_2.10-2.3.4-spark.jar
| grep SelectionPath
521 Mon Sep 29 12:05:36 PDT 2014 akka/actor/SelectionPathElement.class
Is the above jar in your classpath ?
On Sat,
Hello,
I am using Spark1.3 in AWS.
SparkSQL can't recognize Hive external table on S3.
The following is the error message.
I appreciate any help.
Thanks,
Okehee
--
15/05/24 01:02:18 ERROR thriftserver.SparkSQLDriver: Failed in [select
count(*) from api_search where pdate='2015-05-08']
Hi,
The exception is the same as before. Just like the following:
2015-05-23 18:01:40,943 ERROR [hconnection-0x14027b82-shared--pool1-t1]
ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is
missing or invalid credentials. Consider 'kinit'.
Hi TD
Unfortunately, I am off for a week so I won't be able to test this until
next week. Will keep you posted.
Aniket
On Sat, May 23, 2015, 6:16 AM Tathagata Das t...@databricks.com wrote:
Hey Aniket, I just checked in the fix in Spark master and branch-1.4.
Could you download Spark and
Hi All,
I am trying to do word count on number of tweets, my first step is to get
data from table using spark sql and then run split function on top of it to
calculate word count.
Error:- valuse split is not a member of org.apache.spark.sql.SchemaRdd
Spark Code that doesn't work to do word
I used Spark on EC2 a while ago
Hello all,
This is probably me doing something obviously wrong, would really
appreciate some pointers on how to fix this.
I installed spark-1.3.1-bin-hadoop2.6.tgz from the Spark download page [
https://spark.apache.org/downloads.html] and just untarred it on a local
drive. I am on Mac OSX
Yes it does ... you can try out the following example (the People dataset
that comes with Spark). There is an inner query that filters on age and an
outer query that filters on name.
The physical plan applies a single composite filter on name and age as you
can see below
sqlContext.sql(select *
Replying to my own email in case someone has the same or similar issue.
On a hunch I ran this against my Linux (Ubuntu 14.04 with JDK 8) box. Not
only did bin/run-example SparkPi run without any problems, it also
provided a very helpful message in the output.
15/05/23 08:35:15 WARN Utils: Your
Hi all,
I have some doubts about the latest SparkSQL.
1. In the paper about SparkSQL it has been stated that The physical
planner also performs rule-based physical optimizations, such as pipelining
projections or filters into one Spark map operation. ...
If dealing with a query of the form:
Thanks!
I was getting a little confused by this partitioner business, I thought
that by default a pairRDD would be partitioned by a HashPartitioner? Was
this possibly the case in 0.9.3 but not in 1.x?
In anycase, I tried your suggestion and the shuffle was removed. Cheers.
One small question
Yes.
We're looking at bootstrapping in EMR...
On Sat, May 23, 2015 at 07:21 Joe Wass jw...@crossref.org wrote:
I used Spark on EC2 a while ago
Yes-Spark EC2 cluster . Looking into migrating to spark emr.
Adding more ec2 is not possible afaik.
On May 23, 2015 11:22 AM, Johan Beisser j...@caustic.org wrote:
Yes.
We're looking at bootstrapping in EMR...
On Sat, May 23, 2015 at 07:21 Joe Wass jw...@crossref.org wrote:
I used Spark on
Yup, and since I have only one core per executor it explains why there was
only one executor utilized. I'll need to investigate which EC2 instance
type is going to be the best fit.
Thanks Evo.
On Fri, May 22, 2015 at 3:47 PM, Evo Eftimov evo.efti...@isecc.com wrote:
A receiver occupies a cpu
Hi Yana,
Yes typeo in the eamil, file name is correct spark-defaults.conf; thanks
though. So it appears to work if in the driver is specify it as part of
the sparkConf:
val conf = new SparkConf().setAppName(getClass.getSimpleName)
.set(spark.executor.extraClassPath,
It seems it generated query results into tmp dir firstly, and tries to rename
it into the right folder finally. But, it failed while renaming it.
This problem exists not only in SparkSQL but also in any Hadoop tools (e.g.
Hive, Pig, etc) when using with s3. Usually, It is better to write task
22 matches
Mail list logo