Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Denny Lee
If you have any questions on helping to get a Spark Meetup off the ground, please do not hesitate to ping me (denny.g@gmail.com).  I helped jump start the one here in Seattle (and tangentially have been helping the Vancouver and Denver ones as well).  HTH! On March 31, 2014 at 12:35:38

CDH5 Spark on EC2

2014-04-02 Thread Denny Lee
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera Manager, Spark is running healthy. But when I try to run spark-shell, I eventually get the error: 14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master  spark://ip-172-xxx-xxx-xxx:7077... 14/04/02

Re: Spark Training

2014-05-01 Thread Denny Lee
You may also want to check out Paco Nathan's Introduction to Spark courses: http://liber118.com/pxn/ On May 1, 2014, at 8:20 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: Hi Nicholas, We provide training on spark, hands-on also associated ecosystem. We gave it recently at a

Seattle Spark Meetup Slides

2014-05-02 Thread Denny Lee
We’ve had some pretty awesome presentations at the Seattle Spark Meetup - here are the links to the various slides: Seattle Spark Meetup KickOff with DataBricks | Introduction to Spark with Matei Zaharia and Pat McDonough Learnings from Running Spark at Twitter sessions Ben Hindman’s Mesos

Seattle Spark Meetup: xPatterns Slides and @pacoid session next week!

2014-05-23 Thread Denny Lee
For those whom were not able to attend the last Seattle Spark Meetup, we had a great session by Claudiu Barbura on xPatterns on Spark, Shark, Tachyon, and Mesos - you can find the slides at: http://www.slideshare.net/ClaudiuBarbura/seattle-spark-meetup-may-2014. As well, check out the next

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote:

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\) after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote: You don't actually need it per se - its just that some

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
Thanks! will take a look at this later today. HTH! On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev kudryavtsev.konstan...@gmail.com wrote: Hi Denny, just created https://issues.apache.org/jira/browse/SPARK-2356 On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote

SeattleSparkMeetup: Spark at eBay - Troubleshooting the everyday issues

2014-07-18 Thread Denny Lee
We're coming off a great Seattle Spark Meetup session with Evan Chan (@evanfchan) Interactive OLAP Queries with @ApacheSpark and #Cassandra  (http://www.slideshare.net/EvanChan2/2014-07olapcassspark) at Whitepages.  Now, we're proud to announce that our next session is Spark at eBay -

Seattle Spark Meetup: Spark at eBay - Troubleshooting the everyday issues Slides

2014-08-14 Thread Denny Lee
For those whom were not able to attend the Seattle Spark Meetup - Spark at eBay - Troubleshooting the Everyday Issues, the slides have been now posted at:  http://files.meetup.com/12063092/SparkMeetupAugust2014Public.pdf. Enjoy! Denny

Re: Seattle Spark Meetup: Spark at eBay - Troubleshooting the everyday issues Slides

2014-08-15 Thread Denny Lee
Apologies but we had placed the settings for downloading the slides to Seattle Spark Meetup members only - but actually meant to share with everyone.  We have since fixed this and now you can download it.  HTH! On August 14, 2014 at 18:14:35, Denny Lee (denny.g@gmail.com) wrote

LDA example?

2014-08-21 Thread Denny Lee
Quick question - is there a handy sample / example of how to use the LDA algorithm within Spark MLLib?   Thanks! Denny

Spark / Thrift / ODBC connectivity

2014-08-28 Thread Denny Lee
I’m currently using the Spark 1.1 branch and have been able to get the Thrift service up and running.  The quick questions were whether I should able to use the Thrift service to connect to SparkSQL generated tables and/or Hive tables?   As well, by any chance do we have any documents that

Re: SparkSQL HiveContext No Suitable Driver / Cannot Find Driver

2014-08-30 Thread Denny Lee
Oh, forgot to add the managed libraries and the Hive libraries within the CLASSPATH.  As soon as I did that, we’re good to go now. On August 29, 2014 at 22:55:47, Denny Lee (denny.g@gmail.com) wrote: My issue is similar to the issue as noted  http://mail-archives.apache.org/mod_mbox

Re: Spark Hive max key length is 767 bytes

2014-08-30 Thread Denny Lee
Oh, you may be running into an issue with your MySQL setup actually, try running alter database metastore_db character set latin1 so that way Hive (and the Spark HiveContext) can execute properly against the metastore. On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com

Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-03 Thread Denny Lee
When I start the thrift server (on Spark 1.1 RC4) via: ./sbin/start-thriftserver.sh --master spark://hostname:7077 --driver-class-path $CLASSPATH It appears that the thrift server is starting off of localhost as opposed to hostname.  I have set the spark-env.sh to use the hostname, modified the

Re: Starting Thriftserver via hostname on Spark 1.1 RC4?

2014-09-04 Thread Denny Lee
your-port This behavior is inherited from Hive since Spark SQL Thrift server is a variant of HiveServer2. ​ On Wed, Sep 3, 2014 at 10:47 PM, Denny Lee denny.g@gmail.com wrote: When I start the thrift server (on Spark 1.1 RC4) via: ./sbin/start-thriftserver.sh --master spark://hostname:7077

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-10 Thread Denny Lee
Actually, when registering the table, it is only available within the sc context you are running it in. For Spark 1.1, the method name is changed to RegisterAsTempTable to better reflect that. The Thrift server process runs under a different process meaning that it cannot see any of the

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
I’m not sure if I’m completely answering your question here but I’m currently working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4 without any issues. On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote: I see the binary packages include hadoop 1,

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread Denny Lee
registerTempTable you mentioned works on SqlContext instead of HiveContext. Thanks, Du On 9/10/14, 1:21 PM, Denny Lee denny.g@gmail.com wrote: Actually, when registering the table, it is only available within the sc context you are running it in. For Spark 1.1, the method name is changed

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
Please correct me if I’m wrong but I was under the impression as per the maven repositories that it was just to stay more in sync with the various version of Hadoop.  Looking at the latest documentation (https://spark.apache.org/docs/latest/building-with-maven.html), there are multiple Hadoop

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
Yes, atleast for my query scenarios, I have been able to use Spark 1.1 with Hadoop 2.4 against Hadoop 2.5.  Note, Hadoop 2.5 is considered a relatively minor release (http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available) where Hadoop 2.4 and 2.3 were considered

Re: Spark SQL JDBC

2014-09-11 Thread Denny Lee
When you re-ran sbt did you clear out the packages first and ensure that the datanucleus jars were generated within lib_managed? I remembered having to do that when I was working testing out different configs. On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 alexandria.shea...@gmail.com wrote:

Re: Spark SQL Thrift JDBC server deployment for production

2014-09-11 Thread Denny Lee
Could you provide some context about running this in yarn-cluster mode? The Thrift server that's included within Spark 1.1 is based on Hive 0.12. Hive has been able to work against YARN since Hive 0.10. So when you start the thrift server, provided you copied the hive-site.xml over to the Spark

Re: SchemaRDD and RegisterAsTable

2014-09-17 Thread Denny Lee
The registered table is stored within the spark context itself.  To have the table available for the thrift server to get access to, you can save the sc table into the Hive context so that way the Thrift server process can see the table.  If you are using derby as your metastore, then the

Re: What is a pre built package of Apache Spark

2014-09-24 Thread Denny Lee
This seems similar to a related Windows issue concerning python where pyspark could't find the python because the PYTHONSTARTUP environment wasn't set - by any chance could this be related? On Wed, Sep 24, 2014 at 7:51 PM, christy 760948...@qq.com wrote: Hi I have installed standalone on

Re: Spark Hive max key length is 767 bytes

2014-09-25 Thread Denny Lee
by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) Should I use HIVE 0.12.0 instead of HIVE 0.13.1? Regards Arthur On 31 Aug, 2014, at 6:01 am, Denny Lee denny.g

Re: add external jars to spark-shell

2014-10-20 Thread Denny Lee
–jar (ADD_JARS) is a special class loading for Spark while –driver-class-path (SPARK_CLASSPATH) is captured by the startup scripts and appended to classpath settings that is used to start the JVM running the driver You can reference https://www.concur.com/blog/en-us/connect-tableau-to-sparksql

Re: winutils

2014-10-29 Thread Denny Lee
QQ - did you download the Spark 1.1 binaries that included the Hadoop one? Does this happen if you're using the Spark 1.1 binaries that do not include the Hadoop jars? On Wed, Oct 29, 2014 at 11:31 AM, Ron Ayoub ronalday...@live.com wrote: Apparently Spark does require Hadoop even if you do not

Re: Spark + Tableau

2014-10-30 Thread Denny Lee
When you are starting the thrift server service - are you connecting to it locally or is this on a remote server when you use beeline and/or Tableau? On Thu, Oct 30, 2014 at 8:00 AM, Bojan Kostic blood9ra...@gmail.com wrote: I use beta driver SQL ODBC from Databricks. -- View this message

Re: Spark or MR, Scala or Java?

2014-11-22 Thread Denny Lee
extraction job against multiple data sources via Hadoop streaming. Another good call out but utilizing Scala within Spark is that most of the Spark code is written in Scala. On Sat, Nov 22, 2014 at 08:12 Denny Lee denny.g@gmail.com wrote: There are various scenarios where traditional Hadoop

Re: Spark SQL Programming Guide - registerTempTable Error

2014-11-23 Thread Denny Lee
By any chance are you using Spark 1.0.2? registerTempTable was introduced from Spark 1.1+ while for Spark 1.0.2, it would be registerAsTable. On Sun Nov 23 2014 at 10:59:48 AM riginos samarasrigi...@gmail.com wrote: Hi guys , Im trying to do the Spark SQL Programming Guide but after the:

Re: Spark SQL Programming Guide - registerTempTable Error

2014-11-23 Thread Denny Lee
It sort of depends on your environment. If you are running on your local environment, I would just download the latest Spark 1.1 binaries and you'll be good to go. If its a production environment, it sort of depends on how you are setup (e.g. AWS, Cloudera, etc.) On Sun Nov 23 2014 at 11:27:49

Re: latest Spark 1.2 thrift server fail with NoClassDefFoundError on Guava

2014-11-25 Thread Denny Lee
To determine if this is a Windows vs. other configuration, can you just try to call the Spark-class.cmd SparkSubmit without actually referencing the Hadoop or Thrift server classes? On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.com wrote: I traced the code and used

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand steps. If I was running this on standalone cluster mode the query finished in 55s but on YARN, the query was still running 30min later. Would the hard coded sleeps potentially be in play here? On Fri, Dec 5, 2014 at 11:23 Sandy

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
, and --num-executors arguments? When running against a standalone cluster, by default Spark will make use of all the cluster resources, but when running against YARN, Spark defaults to a couple tiny executors. -Sandy On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com wrote: My

Re: spark-submit on YARN is slow

2014-12-05 Thread Denny Lee
Okay, my bad for not testing out the documented arguments - once i use the correct ones, the query shrinks completes in ~55s (I can probably make it faster). Thanks for the help, eh?! On Fri Dec 05 2014 at 10:34:50 PM Denny Lee denny.g@gmail.com wrote: Sorry for the delay in my response

Spark on YARN memory utilization

2014-12-06 Thread Denny Lee
This is perhaps more of a YARN question than a Spark question but i was just curious to how is memory allocated in YARN via the various configurations. For example, if I spin up my cluster with 4GB with a different number of executors as noted below 4GB executor-memory x 10 executors = 46GB

Re: Spark on YARN memory utilization

2014-12-06 Thread Denny Lee
* executorMemory. When you set executor memory, the yarn resource request is executorMemory + yarnOverhead. - Arun On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee denny.g@gmail.com wrote: This is perhaps more of a YARN question than a Spark question but i was just curious to how is memory allocated

Re: Spark on YARN memory utilization

2014-12-09 Thread Denny Lee
Thanks Sandy! On Mon, Dec 8, 2014 at 23:15 Sandy Ryza sandy.r...@cloudera.com wrote: Another thing to be aware of is that YARN will round up containers to the nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults to 1024. -Sandy On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee

Re: Spark-SQL JDBC driver

2014-12-11 Thread Denny Lee
Yes, that is correct. A quick reference on this is the post https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1 with the pertinent section being: It is important to note that when you create Spark tables (for

Re: Spark SQL Roadmap?

2014-12-13 Thread Denny Lee
Hi Xiaoyong, SparkSQL has already been released and has been part of the Spark code-base since Spark 1.0. The latest stable release is Spark 1.1 (here's the Spark SQL Programming Guide http://spark.apache.org/docs/1.1.0/sql-programming-guide.html) and we're currently voting on Spark 1.2. Hive

Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
I have a large of files within HDFS that I would like to do a group by statement ala val table = sc.textFile(hdfs://) val tabs = table.map(_.split(\t)) I'm trying to do something similar to tabs.map(c = (c._(167), c._(110), c._(200)) where I create a new RDD that only has but that isn't

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
looks like the way to go given the context. What's not working? Kr, Gerard On Dec 14, 2014 5:17 PM, Denny Lee denny.g@gmail.com wrote: I have a large of files within HDFS that I would like to do a group by statement ala val table = sc.textFile(hdfs://) val tabs = table.map(_.split

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
Yes - that works great! Sorry for implying I couldn't. Was just more flummoxed that I couldn't make the Scala call work on its own. Will continue to debug ;-) On Sun, Dec 14, 2014 at 11:39 Michael Armbrust mich...@databricks.com wrote: BTW, I cannot use SparkSQL / case right now because my table

Re: Limit the # of columns in Spark Scala

2014-12-14 Thread Denny Lee
tabs.map(c = (c(167), c(110), c(200)) instead of tabs.map(c = (c._(167), c._(110), c._(200)) On Sun, Dec 14, 2014 at 3:12 PM, Denny Lee denny.g@gmail.com wrote: Yes - that works great! Sorry for implying I couldn't. Was just more flummoxed that I couldn't make the Scala call work on its

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Denny Lee
I'm curious if you're seeing the same thing when using bdutil against GCS? I'm wondering if this may be an issue concerning the transfer rate of Spark - Hadoop - GCS Connector - GCS. On Wed Dec 17 2014 at 10:09:17 PM Alessandro Baretta alexbare...@gmail.com wrote: All, I'm using the Spark

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Denny Lee
. See the following. alex@hadoop-m:~/split$ time bash -c gsutil ls gs://my-bucket/20141205/csv/*/*/* | wc -l 6860 real0m6.971s user0m1.052s sys 0m0.096s Alex On Wed, Dec 17, 2014 at 10:29 PM, Denny Lee denny.g@gmail.com wrote: I'm curious if you're seeing the same

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Denny Lee
to test this? But more importantly, what information would this give me? On Wed, Dec 17, 2014 at 10:46 PM, Denny Lee denny.g@gmail.com wrote: Oh, it makes sense of gsutil scans through this quickly, but I was wondering if running a Hadoop job / bdutil would result in just as fast scans

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Denny Lee
To clarify, there isn't a Hadoop 2.6 profile per se but you can build using -Dhadoop.version=2.4 which works with Hadoop 2.6. On Fri, Dec 19, 2014 at 12:55 Ted Yu yuzhih...@gmail.com wrote: You can use hadoop-2.4 profile and pass -Dhadoop.version=2.6.0 Cheers On Fri, Dec 19, 2014 at 12:51

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Denny Lee
Sorry Ted! I saw profile (-P) but missed the -D. My bad! On Fri, Dec 19, 2014 at 16:46 Ted Yu yuzhih...@gmail.com wrote: Here is the command I used: mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests FYI On Fri, Dec 19, 2014 at 4:35 PM, Denny

Re: S3 files , Spark job hungsup

2014-12-23 Thread Denny Lee
You should be able to kill the job using the webUI or via spark-class. More info can be found in the thread: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-kill-a-Spark-job-running-in-cluster-mode-td18583.html. HTH! On Tue, Dec 23, 2014 at 4:47 PM, durga durgak...@gmail.com wrote:

Spark 1.2 and Mesos 0.21.0 spark.executor.uri issue?

2014-12-30 Thread Denny Lee
I've been working with Spark 1.2 and Mesos 0.21.0 and while I have set the spark.executor.uri within spark-env.sh (and directly within bash as well), the Mesos slaves do not seem to be able to access the spark tgz file via HTTP or HDFS as per the message below. 14/12/30 15:57:35 INFO SparkILoop:

Re: Fail to launch spark-shell on windows 2008 R2

2015-02-03 Thread Denny Lee
Hi Ningjun, I have been working with Spark 1.2 on Windows 7 and Windows 2008 R2 (purely for development purposes). I had most recently installed them utilizing Java 1.8, Scala 2.10.4, and Spark 1.2 Precompiled for Hadoop 2.4+. A handy thread concerning the null\bin\winutils issue is addressed

Re: Tableau beta connector

2015-02-04 Thread Denny Lee
works. -- *From:* Denny Lee denny.g@gmail.com *Sent:* Thursday, February 5, 2015 12:20 PM *To:* İsmail Keskin; Ashutosh Trivedi (MT2013030) *Cc:* user@spark.apache.org *Subject:* Re: Tableau beta connector Some quick context behind how Tableau interacts

Re: Tableau beta connector

2015-02-05 Thread Denny Lee
and tableau can extract that RDD persisted on hive. Regards, Ashutosh -- *From:* Denny Lee denny.g@gmail.com *Sent:* Thursday, February 5, 2015 1:27 PM *To:* Ashutosh Trivedi (MT2013030); İsmail Keskin *Cc:* user@spark.apache.org *Subject:* Re: Tableau beta

Re: Spark (SQL) as OLAP engine

2015-02-03 Thread Denny Lee
A great presentation by Evan Chan on utilizing Cassandra as Jonathan noted is at: OLAP with Cassandra and Spark http://www.slideshare.net/EvanChan2/2014-07olapcassspark. On Tue Feb 03 2015 at 10:03:34 AM Jonathan Haddad j...@jonhaddad.com wrote: Write out the rdd to a cassandra table. The

Re: spark-shell can't import the default hive-site.xml options probably.

2015-02-01 Thread Denny Lee
, Denny Lee denny.g@gmail.com wrote: I may be missing something here but typically when the hive-site.xml configurations do not require you to place s within the configuration itself. Both the retry.delay and socket.timeout values are in seconds so you should only need to place the integer value

Re: spark-shell can't import the default hive-site.xml options probably.

2015-02-01 Thread Denny Lee
I may be missing something here but typically when the hive-site.xml configurations do not require you to place s within the configuration itself. Both the retry.delay and socket.timeout values are in seconds so you should only need to place the integer value (which are in seconds). On Sun Feb

Re: Spark 1.3 SQL Programming Guide and sql._ / sql.types._

2015-02-20 Thread Denny Lee
. On Fri, Feb 20, 2015 at 9:55 AM, Denny Lee denny.g@gmail.com wrote: Quickly reviewing the latest SQL Programming Guide https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md (in github) I had a couple of quick questions: 1) Do we need to instantiate the SparkContext

Re: takeSample triggers 2 jobs

2015-03-06 Thread Denny Lee
Hi Rares, If you dig into the descriptions for the two jobs, it will probably return something like: Job ID: 1 org.apache.spark.rdd.RDD.takeSample(RDD.scala:447) $line41.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:22) ... Job ID: 0

Re: Errors in SPARK

2015-03-24 Thread Denny Lee
* Cheers, Sandeep.v On Wed, Mar 25, 2015 at 11:10 AM, sandeep vura sandeepv...@gmail.com wrote: No I am just running ./spark-shell command in terminal I will try with above command On Wed, Mar 25, 2015 at 11:09 AM, Denny Lee denny.g@gmail.com wrote: Did you include the connection

Re: Errors in SPARK

2015-03-24 Thread Denny Lee
Did you include the connection to a MySQL connector jar so that way spark-shell / hive can connect to the metastore? For example, when I run my spark-shell instance in standalone mode, I use: ./spark-shell --master spark://servername:7077 --driver-class-path /lib/mysql-connector-java-5.1.27.jar

Re: Total size of serialized results is bigger than spark.driver.maxResultSize

2015-03-25 Thread Denny Lee
As you noted, you can change the spark.driver.maxResultSize value in your Spark Configurations (https://spark.apache.org/docs/1.2.0/configuration.html). Please reference the Spark Properties section noting that you can modify these properties via the spark-defaults.conf or via SparkConf(). HTH!

Re: Anyone has some simple example with spark-sql with spark 1.3

2015-03-30 Thread Denny Lee
Hi Vincent, This may be a case that you're missing a semi-colon after your CREATE TEMPORARY TABLE statement. I ran your original statement (missing the semi-colon) and got the same error as you did. As soon as I added it in, I was good to go again: CREATE TEMPORARY TABLE jsonTable USING

Re: Creating Partitioned Parquet Tables via SparkSQL

2015-04-01 Thread Denny Lee
Thanks Felix :) On Wed, Apr 1, 2015 at 00:08 Felix Cheung felixcheun...@hotmail.com wrote: This is tracked by these JIRAs.. https://issues.apache.org/jira/browse/SPARK-5947 https://issues.apache.org/jira/browse/SPARK-5948 -- From: denny.g@gmail.com Date:

Re: spark-sql throws org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException

2015-03-26 Thread Denny Lee
If you're not using MySQL as your metastore for Hive, out of curiosity what are you using? The error you are seeing is common when there isn't the correct driver to allow Spark to connect to the Hive metastore because the correct driver isn't there. As well, I noticed that you're using

Re: Handling Big data for interactive BI tools

2015-03-26 Thread Denny Lee
BTW, a tool that I have been using to help do the preaggregation of data using hyperloglog in combination with Spark is atscale (http://atscale.com/). It builds the aggregations and makes use of the speed of SparkSQL - all within the context of a model that is accessible by Tableau or Qlik. On

Re: Spark sql thrift server slower than hive

2015-03-22 Thread Denny Lee
How are you running your spark instance out of curiosity? Via YARN or standalone mode? When connecting Spark thriftserver to the Spark service, have you allocated enough memory and CPU when executing with spark? On Sun, Mar 22, 2015 at 3:39 AM fanooos dev.fano...@gmail.com wrote: We have

Re: spark master shut down suddenly

2015-03-04 Thread Denny Lee
It depends on your setup but one of the locations is /var/log/mesos On Wed, Mar 4, 2015 at 19:11 lisendong lisend...@163.com wrote: I ‘m sorry, but how to look at the mesos logs? where are them? 在 2015年3月4日,下午6:06,Akhil Das ak...@sigmoidanalytics.com 写道: You can check in the mesos logs

Re: Spark SQL odbc on Windows

2015-02-23 Thread Denny Lee
paper!! We were already using it as a guideline for our tests. Best regards, Francisco -- From: Denny Lee denny.g@gmail.com Sent: ‎22/‎02/‎2015 17:56 To: Ashic Mahtab as...@live.com; Francisco Orchard forch...@gmail.com; Apache Spark user@spark.apache.org

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Hi Francisco, Out of curiosity - why ROLAP mode using multi-dimensional mode (vs tabular) from SSAS to Spark? As a past SSAS guy you've definitely piqued my interest. The one thing that you may run into is that the SQL generated by SSAS can be quite convoluted. When we were doing the same thing

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Back to thrift, there was an earlier thread on this topic at http://mail-archives.apache.org/mod_mbox/spark-user/201411.mbox/%3CCABPQxsvXA-ROPeXN=wjcev_n9gv-drqxujukbp_goutvnyx...@mail.gmail.com%3E that may be useful as well. On Sun Feb 22 2015 at 8:42:29 AM Denny Lee denny.g@gmail.com wrote

Re: Use case for data in SQL Server

2015-02-24 Thread Denny Lee
Hi Suhel, My team is currently working with a lot of SQL Server databases as one of our many data sources and ultimately we pull the data into HDFS from SQL Server. As we had a lot of SQL databases to hit, we used the jTDS driver and SQOOP to extract the data out of SQL Server and into HDFS

Spark 1.3 SQL Programming Guide and sql._ / sql.types._

2015-02-20 Thread Denny Lee
Quickly reviewing the latest SQL Programming Guide https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md (in github) I had a couple of quick questions: 1) Do we need to instantiate the SparkContext as per // sc is an existing SparkContext. val sqlContext = new

Re: Unable to run hive queries inside spark

2015-02-24 Thread Denny Lee
The error message you have is: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/src is not a directory or unable to create one) Could you verify that you (the user you are running under) has the rights to create

Re: Unable to run hive queries inside spark

2015-02-24 Thread Denny Lee
descriptionlocation of default database for the warehouse/description /property Do I need to do anything explicitly other than placing hive-site.xml in the spark.conf directory ? Thanks !! On Wed, Feb 25, 2015 at 11:42 AM, Denny Lee denny.g@gmail.com wrote: The error message

Re: How to start spark-shell with YARN?

2015-02-24 Thread Denny Lee
It may have to do with the akka heartbeat interval per SPARK-3923 - https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-3923 ? On Tue, Feb 24, 2015 at 16:40 Xi Shen davidshe...@gmail.com wrote: Hi Sean, I launched the spark-shell on the same machine as I started YARN service. I

Re: Hive Table not from from Spark SQL

2015-03-27 Thread Denny Lee
Upon reviewing your other thread, could you confirm that your Hive metastore that you can connect to via Hive is a MySQL database? And to also confirm, when you're running spark-shell and doing a show tables statement, you're getting the same error? On Fri, Mar 27, 2015 at 6:08 AM ÐΞ€ρ@Ҝ (๏̯͡๏)

Re: Standalone Scheduler VS YARN Performance

2015-03-24 Thread Denny Lee
By any chance does this thread address look similar: http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html ? On Tue, Mar 24, 2015 at 5:23 AM Harut Martirosyan harut.martiros...@gmail.com wrote: What is performance overhead caused by YARN,

Re: Hadoop 2.5 not listed in Spark 1.4 build page

2015-03-24 Thread Denny Lee
Hadoop 2.5 would be referenced as via -Dhadoop-2.5 using the profile -Phadoop-2.4 Please note earlier in the link the section: # Apache Hadoop 2.4.X or 2.5.X mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package Versions of Hadoop after 2.5.X may or may not work with the

Re: [SparkSQL] How to calculate stddev on a DataFrame?

2015-03-25 Thread Denny Lee
Perhaps this email reference may be able to help from a DataFrame perspective: http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201503.mbox/%3CCALte62ztepahF=5hk9rcfbnyk4z43wkcq4fkdcbwmgf_3_o...@mail.gmail.com%3E On Wed, Mar 25, 2015 at 7:29 PM Haopu Wang hw...@qilinsoft.com wrote:

Re: Use pig load function in spark

2015-03-23 Thread Denny Lee
You may be able to utilize Spork (Pig on Apache Spark) as a mechanism to do this: https://github.com/sigmoidanalytics/spork On Mon, Mar 23, 2015 at 2:29 AM Dai, Kevin yun...@ebay.com wrote: Hi, all Can spark use pig’s load function to load data? Best Regards, Kevin.

Re: Using a different spark jars than the one on the cluster

2015-03-23 Thread Denny Lee
+1 - I currently am doing what Marcelo is suggesting as I have a CDH 5.2 cluster (with Spark 1.1) and I'm also running Spark 1.3.0+ side-by-side in my cluster. On Wed, Mar 18, 2015 at 1:23 PM Marcelo Vanzin van...@cloudera.com wrote: Since you're using YARN, you should be able to download a

Re: Should I do spark-sql query on HDFS or hive?

2015-03-23 Thread Denny Lee
From the standpoint of Spark SQL accessing the files - when it is hitting Hive, it is in effect hitting HDFS as well. Hive provides a great framework where the table structure is already well defined.But underneath it, Hive is just accessing files from HDFS so you are hitting HDFS either way.

ArrayBuffer within a DataFrame

2015-04-02 Thread Denny Lee
Quick question - the output of a dataframe is in the format of: [2015-04, ArrayBuffer(A, B, C, D)] and I'd like to return it as: 2015-04, A 2015-04, B 2015-04, C 2015-04, D What's the best way to do this? Thanks in advance!

Re: ArrayBuffer within a DataFrame

2015-04-02 Thread Denny Lee
, Apr 2, 2015 at 7:10 PM, Denny Lee denny.g@gmail.com wrote: Quick question - the output of a dataframe is in the format of: [2015-04, ArrayBuffer(A, B, C, D)] and I'd like to return it as: 2015-04, A 2015-04, B 2015-04, C 2015-04, D What's the best way to do this? Thanks

Re: Start ThriftServer Error

2015-04-22 Thread Denny Lee
You may need to specify the hive port itself. For example, my own Thrift start command is in the form: ./sbin/start-thriftserver.sh --master spark://$myserver:7077 --driver-class-path $CLASSPATH --hiveconf hive.server2.thrift.bind.host $myserver --hiveconf hive.server2.thrift.port 1 HTH!

Re: Skipped Jobs

2015-04-19 Thread Denny Lee
Thanks for the correction Mark :) On Sun, Apr 19, 2015 at 3:45 PM Mark Hamstra m...@clearstorydata.com wrote: Almost. Jobs don't get skipped. Stages and Tasks do if the needed results are already available. On Sun, Apr 19, 2015 at 3:18 PM, Denny Lee denny.g@gmail.com wrote: The job

Re: Spark Cluster Setup

2015-04-27 Thread Denny Lee
Similar to what Dean called out, we build Puppet manifests so we could do the automation - its a bit of work to setup, but well worth the effort. On Fri, Apr 24, 2015 at 11:27 AM Dean Wampler deanwamp...@gmail.com wrote: It's mostly manual. You could try automating with something like Chef, of

Re: how to delete data from table in sparksql

2015-05-14 Thread Denny Lee
Delete from table is available as part of Hive 0.14 (reference: Apache Hive Language Manual DML - Delete https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete) while Spark 1.3 defaults to Hive 0.13.Perhaps rebuild Spark with Hive 0.14 or generate a new

Hive Skew flag?

2015-05-15 Thread Denny Lee
Just wondering if we have any timeline on when the hive skew flag will be included within SparkSQL? Thanks! Denny

Re: Microsoft SQL jdbc support from spark sql

2015-04-16 Thread Denny Lee
Bummer - out of curiosity, if you were to use the classpath.first or perhaps copy the jar to the slaves could that actually do the trick? The latter isn't really all that efficient but just curious if that could do the trick. On Thu, Apr 16, 2015 at 7:14 AM ARose ashley.r...@telarix.com wrote:

Re: Which version of Hive QL is Spark 1.3.0 using?

2015-04-17 Thread Denny Lee
Support for sub queries in predicates hasn't been resolved yet - please refer to SPARK-4226 BTW, Spark 1.3 default bindings to Hive 0.13.1 On Fri, Apr 17, 2015 at 09:18 ARose ashley.r...@telarix.com wrote: So I'm trying to store the results of a query into a DataFrame, but I get the

Re: Converting Date pattern in scala code

2015-04-14 Thread Denny Lee
If you're doing in Scala per se - then you can probably just reference JodaTime or Java Date / Time classes. If are using SparkSQL, then you can use the various Hive date functions for conversion. On Tue, Apr 14, 2015 at 11:04 AM BASAK, ANANDA ab9...@att.com wrote: I need some help to convert

Re: Microsoft SQL jdbc support from spark sql

2015-04-07 Thread Denny Lee
At this time, the JDBC Data source is not extensible so it cannot support SQL Server. There was some thoughts - credit to Cheng Lian for this - about making the JDBC data source extensible for third party support possibly via slick. On Mon, Apr 6, 2015 at 10:41 PM bipin bipin@gmail.com

Re: Microsoft SQL jdbc support from spark sql

2015-04-07 Thread Denny Lee
That's correct, at this time MS SQL Server is not supported through the JDBC data source at this time. In my environment, we've been using Hadoop streaming to extract out data from multiple SQL Servers, pushing the data into HDFS, creating the Hive tables and/or converting them into Parquet, and

Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Denny Lee
something like this would work. You might need to play with the type. df.explode(arrayBufferColumn) { x = x } On Fri, Apr 3, 2015 at 6:43 AM, Denny Lee denny.g@gmail.com wrote: Thanks Dean - fun hack :) On Fri, Apr 3, 2015 at 6:11 AM Dean Wampler deanwamp...@gmail.com wrote: A hack

Re: Which Hive version should be used for Spark 1.3

2015-04-09 Thread Denny Lee
By default Spark 1.3 has bindings to Hive 0.13.1 though you can bind it to Hive 0.12 if you specify it in the profile when building Spark as per https://spark.apache.org/docs/1.3.0/building-spark.html. If you are downloading a pre built version of Spark 1.3 - then by default, it is set to Hive

  1   2   >