Yes - that works great! Sorry for implying I couldn't. Was just more
flummoxed that I couldn't make the Scala call work on its own. Will
continue to debug ;-)
On Sun, Dec 14, 2014 at 11:39 Michael Armbrust mich...@databricks.com
wrote:
BTW, I cannot use SparkSQL / case right now because my table
tabs.map(c = (c(167), c(110), c(200)) instead of tabs.map(c = (c._(167),
c._(110), c._(200))
On Sun, Dec 14, 2014 at 3:12 PM, Denny Lee denny.g@gmail.com wrote:
Yes - that works great! Sorry for implying I couldn't. Was just more
flummoxed that I couldn't make the Scala call work on its
Hi Xiaoyong,
SparkSQL has already been released and has been part of the Spark code-base
since Spark 1.0. The latest stable release is Spark 1.1 (here's the Spark
SQL Programming Guide
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html) and we're
currently voting on Spark 1.2.
Hive
Yes, that is correct. A quick reference on this is the post
https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1
with the pertinent section being:
It is important to note that when you create Spark tables (for
Thanks Sandy!
On Mon, Dec 8, 2014 at 23:15 Sandy Ryza sandy.r...@cloudera.com wrote:
Another thing to be aware of is that YARN will round up containers to the
nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults
to 1024.
-Sandy
On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee
This is perhaps more of a YARN question than a Spark question but i was
just curious to how is memory allocated in YARN via the various
configurations. For example, if I spin up my cluster with 4GB with a
different number of executors as noted below
4GB executor-memory x 10 executors = 46GB
* executorMemory.
When you set executor memory, the yarn resource request is executorMemory
+ yarnOverhead.
- Arun
On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee denny.g@gmail.com wrote:
This is perhaps more of a YARN question than a Spark question but i was
just curious to how is memory allocated
My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand steps.
If I was running this on standalone cluster mode the query finished in 55s
but on YARN, the query was still running 30min later. Would the hard coded
sleeps potentially be in play here?
On Fri, Dec 5, 2014 at 11:23 Sandy
, and --num-executors
arguments? When running against a standalone cluster, by default Spark
will make use of all the cluster resources, but when running against YARN,
Spark defaults to a couple tiny executors.
-Sandy
On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com
wrote:
My
Okay, my bad for not testing out the documented arguments - once i use the
correct ones, the query shrinks completes in ~55s (I can probably make it
faster). Thanks for the help, eh?!
On Fri Dec 05 2014 at 10:34:50 PM Denny Lee denny.g@gmail.com wrote:
Sorry for the delay in my response
To determine if this is a Windows vs. other configuration, can you just try
to call the Spark-class.cmd SparkSubmit without actually referencing the
Hadoop or Thrift server classes?
On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.com
wrote:
I traced the code and used
By any chance are you using Spark 1.0.2? registerTempTable was introduced
from Spark 1.1+ while for Spark 1.0.2, it would be registerAsTable.
On Sun Nov 23 2014 at 10:59:48 AM riginos samarasrigi...@gmail.com wrote:
Hi guys ,
Im trying to do the Spark SQL Programming Guide but after the:
It sort of depends on your environment. If you are running on your local
environment, I would just download the latest Spark 1.1 binaries and you'll
be good to go. If its a production environment, it sort of depends on how
you are setup (e.g. AWS, Cloudera, etc.)
On Sun Nov 23 2014 at 11:27:49
extraction job against multiple data sources via Hadoop streaming.
Another good call out but utilizing Scala within Spark is that most of the
Spark code is written in Scala.
On Sat, Nov 22, 2014 at 08:12 Denny Lee denny.g@gmail.com wrote:
There are various scenarios where traditional Hadoop
When you are starting the thrift server service - are you connecting to it
locally or is this on a remote server when you use beeline and/or Tableau?
On Thu, Oct 30, 2014 at 8:00 AM, Bojan Kostic blood9ra...@gmail.com wrote:
I use beta driver SQL ODBC from Databricks.
--
View this message
QQ - did you download the Spark 1.1 binaries that included the Hadoop one?
Does this happen if you're using the Spark 1.1 binaries that do not include
the Hadoop jars?
On Wed, Oct 29, 2014 at 11:31 AM, Ron Ayoub ronalday...@live.com wrote:
Apparently Spark does require Hadoop even if you do not
–jar (ADD_JARS) is a special class loading for Spark while
–driver-class-path (SPARK_CLASSPATH) is captured by the startup scripts and
appended to classpath settings that is used to start the JVM running the
driver
You can reference
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql
by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
Specified key was too long; max key length is 767 bytes
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Should I use HIVE 0.12.0 instead of HIVE 0.13.1?
Regards
Arthur
On 31 Aug, 2014, at 6:01 am, Denny Lee denny.g
This seems similar to a related Windows issue concerning python where
pyspark could't find the python because the PYTHONSTARTUP environment
wasn't set - by any chance could this be related?
On Wed, Sep 24, 2014 at 7:51 PM, christy 760948...@qq.com wrote:
Hi I have installed standalone on
The registered table is stored within the spark context itself. To have the
table available for the thrift server to get access to, you can save the sc
table into the Hive context so that way the Thrift server process can see the
table. If you are using derby as your metastore, then the
I’m not sure if I’m completely answering your question here but I’m currently
working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4
without any issues.
On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote:
I see the binary packages include hadoop 1,
registerTempTable you mentioned works on SqlContext instead of HiveContext.
Thanks,
Du
On 9/10/14, 1:21 PM, Denny Lee denny.g@gmail.com wrote:
Actually, when registering the table, it is only available within the sc
context you are running it in. For Spark 1.1, the method name is changed
Please correct me if I’m wrong but I was under the impression as per the maven
repositories that it was just to stay more in sync with the various version of
Hadoop. Looking at the latest documentation
(https://spark.apache.org/docs/latest/building-with-maven.html), there are
multiple Hadoop
Yes, atleast for my query scenarios, I have been able to use Spark 1.1 with
Hadoop 2.4 against Hadoop 2.5. Note, Hadoop 2.5 is considered a relatively
minor release
(http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available)
where Hadoop 2.4 and 2.3 were considered
When you re-ran sbt did you clear out the packages first and ensure that
the datanucleus jars were generated within lib_managed? I remembered
having to do that when I was working testing out different configs.
On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101
alexandria.shea...@gmail.com wrote:
Could you provide some context about running this in yarn-cluster mode?
The Thrift server that's included within Spark 1.1 is based on Hive 0.12.
Hive has been able to work against YARN since Hive 0.10. So when you start
the thrift server, provided you copied the hive-site.xml over to the Spark
Actually, when registering the table, it is only available within the sc
context you are running it in. For Spark 1.1, the method name is changed to
RegisterAsTempTable to better reflect that.
The Thrift server process runs under a different process meaning that it cannot
see any of the
your-port
This behavior is inherited from Hive since Spark SQL Thrift server is a variant
of HiveServer2.
On Wed, Sep 3, 2014 at 10:47 PM, Denny Lee denny.g@gmail.com wrote:
When I start the thrift server (on Spark 1.1 RC4) via:
./sbin/start-thriftserver.sh --master spark://hostname:7077
When I start the thrift server (on Spark 1.1 RC4) via:
./sbin/start-thriftserver.sh --master spark://hostname:7077 --driver-class-path
$CLASSPATH
It appears that the thrift server is starting off of localhost as opposed to
hostname. I have set the spark-env.sh to use the hostname, modified the
Oh, forgot to add the managed libraries and the Hive libraries within the
CLASSPATH. As soon as I did that, we’re good to go now.
On August 29, 2014 at 22:55:47, Denny Lee (denny.g@gmail.com) wrote:
My issue is similar to the issue as noted
http://mail-archives.apache.org/mod_mbox
Oh, you may be running into an issue with your MySQL setup actually, try running
alter database metastore_db character set latin1
so that way Hive (and the Spark HiveContext) can execute properly against the
metastore.
On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com
I’m currently using the Spark 1.1 branch and have been able to get the Thrift
service up and running. The quick questions were whether I should able to use
the Thrift service to connect to SparkSQL generated tables and/or Hive tables?
As well, by any chance do we have any documents that
Quick question - is there a handy sample / example of how to use the LDA
algorithm within Spark MLLib?
Thanks!
Denny
Apologies but we had placed the settings for downloading the slides to Seattle
Spark Meetup members only - but actually meant to share with everyone. We have
since fixed this and now you can download it. HTH!
On August 14, 2014 at 18:14:35, Denny Lee (denny.g@gmail.com) wrote
For those whom were not able to attend the Seattle Spark Meetup - Spark at eBay
- Troubleshooting the Everyday Issues, the slides have been now posted at:
http://files.meetup.com/12063092/SparkMeetupAugust2014Public.pdf.
Enjoy!
Denny
We're coming off a great Seattle Spark Meetup session with Evan Chan
(@evanfchan) Interactive OLAP Queries with @ApacheSpark and #Cassandra
(http://www.slideshare.net/EvanChan2/2014-07olapcassspark) at Whitepages.
Now, we're proud to announce that our next session is Spark at eBay -
=hdinsight
2) put this file into d:\winutil\bin
3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\)
after that test runs
Thank you,
Konstantin Kudryavtsev
On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote:
You don't actually need it per se - its just that some
Thanks! will take a look at this later today. HTH!
On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev
kudryavtsev.konstan...@gmail.com wrote:
Hi Denny,
just created https://issues.apache.org/jira/browse/SPARK-2356
On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote
By any chance do you have HDP 2.1 installed? you may need to install the utils
and update the env variables per
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com wrote:
issue.
On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev
kudryavtsev.konstan...@gmail.com wrote:
No, I don’t
why do I need to have HDP installed? I don’t use Hadoop at all and I’d
like to read data from local filesystem
On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com
For those whom were not able to attend the last Seattle Spark Meetup, we had a
great session by Claudiu Barbura on xPatterns on Spark, Shark, Tachyon, and
Mesos - you can find the slides at:
http://www.slideshare.net/ClaudiuBarbura/seattle-spark-meetup-may-2014.
As well, check out the next
We’ve had some pretty awesome presentations at the Seattle Spark Meetup - here
are the links to the various slides:
Seattle Spark Meetup KickOff with DataBricks | Introduction to Spark with Matei
Zaharia and Pat McDonough
Learnings from Running Spark at Twitter sessions
Ben Hindman’s Mesos
You may also want to check out Paco Nathan's Introduction to Spark courses:
http://liber118.com/pxn/
On May 1, 2014, at 8:20 AM, Mayur Rustagi mayur.rust...@gmail.com wrote:
Hi Nicholas,
We provide training on spark, hands-on also associated ecosystem.
We gave it recently at a
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera
Manager, Spark is running healthy.
But when I try to run spark-shell, I eventually get the error:
14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master
spark://ip-172-xxx-xxx-xxx:7077...
14/04/02
If you have any questions on helping to get a Spark Meetup off the ground,
please do not hesitate to ping me (denny.g@gmail.com). I helped jump start
the one here in Seattle (and tangentially have been helping the Vancouver and
Denver ones as well). HTH!
On March 31, 2014 at 12:35:38
101 - 145 of 145 matches
Mail list logo