If you have any questions on helping to get a Spark Meetup off the ground,
please do not hesitate to ping me (denny.g@gmail.com). I helped jump start
the one here in Seattle (and tangentially have been helping the Vancouver and
Denver ones as well). HTH!
On March 31, 2014 at 12:35:38
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera
Manager, Spark is running healthy.
But when I try to run spark-shell, I eventually get the error:
14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master
spark://ip-172-xxx-xxx-xxx:7077...
14/04/02
You may also want to check out Paco Nathan's Introduction to Spark courses:
http://liber118.com/pxn/
On May 1, 2014, at 8:20 AM, Mayur Rustagi mayur.rust...@gmail.com wrote:
Hi Nicholas,
We provide training on spark, hands-on also associated ecosystem.
We gave it recently at a
We’ve had some pretty awesome presentations at the Seattle Spark Meetup - here
are the links to the various slides:
Seattle Spark Meetup KickOff with DataBricks | Introduction to Spark with Matei
Zaharia and Pat McDonough
Learnings from Running Spark at Twitter sessions
Ben Hindman’s Mesos
For those whom were not able to attend the last Seattle Spark Meetup, we had a
great session by Claudiu Barbura on xPatterns on Spark, Shark, Tachyon, and
Mesos - you can find the slides at:
http://www.slideshare.net/ClaudiuBarbura/seattle-spark-meetup-may-2014.
As well, check out the next
By any chance do you have HDP 2.1 installed? you may need to install the utils
and update the env variables per
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows
On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com wrote:
issue.
On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev
kudryavtsev.konstan...@gmail.com wrote:
No, I don’t
why do I need to have HDP installed? I don’t use Hadoop at all and I’d
like to read data from local filesystem
On Jul 2, 2014, at 9:10 PM, Denny Lee denny.g@gmail.com
=hdinsight
2) put this file into d:\winutil\bin
3) add in my test: System.setProperty(hadoop.home.dir, d:\\winutil\\)
after that test runs
Thank you,
Konstantin Kudryavtsev
On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee denny.g@gmail.com wrote:
You don't actually need it per se - its just that some
Thanks! will take a look at this later today. HTH!
On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev
kudryavtsev.konstan...@gmail.com wrote:
Hi Denny,
just created https://issues.apache.org/jira/browse/SPARK-2356
On Jul 3, 2014, at 7:06 PM, Denny Lee denny.g@gmail.com wrote
We're coming off a great Seattle Spark Meetup session with Evan Chan
(@evanfchan) Interactive OLAP Queries with @ApacheSpark and #Cassandra
(http://www.slideshare.net/EvanChan2/2014-07olapcassspark) at Whitepages.
Now, we're proud to announce that our next session is Spark at eBay -
For those whom were not able to attend the Seattle Spark Meetup - Spark at eBay
- Troubleshooting the Everyday Issues, the slides have been now posted at:
http://files.meetup.com/12063092/SparkMeetupAugust2014Public.pdf.
Enjoy!
Denny
Apologies but we had placed the settings for downloading the slides to Seattle
Spark Meetup members only - but actually meant to share with everyone. We have
since fixed this and now you can download it. HTH!
On August 14, 2014 at 18:14:35, Denny Lee (denny.g@gmail.com) wrote
Quick question - is there a handy sample / example of how to use the LDA
algorithm within Spark MLLib?
Thanks!
Denny
I’m currently using the Spark 1.1 branch and have been able to get the Thrift
service up and running. The quick questions were whether I should able to use
the Thrift service to connect to SparkSQL generated tables and/or Hive tables?
As well, by any chance do we have any documents that
Oh, forgot to add the managed libraries and the Hive libraries within the
CLASSPATH. As soon as I did that, we’re good to go now.
On August 29, 2014 at 22:55:47, Denny Lee (denny.g@gmail.com) wrote:
My issue is similar to the issue as noted
http://mail-archives.apache.org/mod_mbox
Oh, you may be running into an issue with your MySQL setup actually, try running
alter database metastore_db character set latin1
so that way Hive (and the Spark HiveContext) can execute properly against the
metastore.
On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com
When I start the thrift server (on Spark 1.1 RC4) via:
./sbin/start-thriftserver.sh --master spark://hostname:7077 --driver-class-path
$CLASSPATH
It appears that the thrift server is starting off of localhost as opposed to
hostname. I have set the spark-env.sh to use the hostname, modified the
your-port
This behavior is inherited from Hive since Spark SQL Thrift server is a variant
of HiveServer2.
On Wed, Sep 3, 2014 at 10:47 PM, Denny Lee denny.g@gmail.com wrote:
When I start the thrift server (on Spark 1.1 RC4) via:
./sbin/start-thriftserver.sh --master spark://hostname:7077
Actually, when registering the table, it is only available within the sc
context you are running it in. For Spark 1.1, the method name is changed to
RegisterAsTempTable to better reflect that.
The Thrift server process runs under a different process meaning that it cannot
see any of the
I’m not sure if I’m completely answering your question here but I’m currently
working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4
without any issues.
On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote:
I see the binary packages include hadoop 1,
registerTempTable you mentioned works on SqlContext instead of HiveContext.
Thanks,
Du
On 9/10/14, 1:21 PM, Denny Lee denny.g@gmail.com wrote:
Actually, when registering the table, it is only available within the sc
context you are running it in. For Spark 1.1, the method name is changed
Please correct me if I’m wrong but I was under the impression as per the maven
repositories that it was just to stay more in sync with the various version of
Hadoop. Looking at the latest documentation
(https://spark.apache.org/docs/latest/building-with-maven.html), there are
multiple Hadoop
Yes, atleast for my query scenarios, I have been able to use Spark 1.1 with
Hadoop 2.4 against Hadoop 2.5. Note, Hadoop 2.5 is considered a relatively
minor release
(http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available)
where Hadoop 2.4 and 2.3 were considered
When you re-ran sbt did you clear out the packages first and ensure that
the datanucleus jars were generated within lib_managed? I remembered
having to do that when I was working testing out different configs.
On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101
alexandria.shea...@gmail.com wrote:
Could you provide some context about running this in yarn-cluster mode?
The Thrift server that's included within Spark 1.1 is based on Hive 0.12.
Hive has been able to work against YARN since Hive 0.10. So when you start
the thrift server, provided you copied the hive-site.xml over to the Spark
The registered table is stored within the spark context itself. To have the
table available for the thrift server to get access to, you can save the sc
table into the Hive context so that way the Thrift server process can see the
table. If you are using derby as your metastore, then the
This seems similar to a related Windows issue concerning python where
pyspark could't find the python because the PYTHONSTARTUP environment
wasn't set - by any chance could this be related?
On Wed, Sep 24, 2014 at 7:51 PM, christy 760948...@qq.com wrote:
Hi I have installed standalone on
by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
Specified key was too long; max key length is 767 bytes
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Should I use HIVE 0.12.0 instead of HIVE 0.13.1?
Regards
Arthur
On 31 Aug, 2014, at 6:01 am, Denny Lee denny.g
–jar (ADD_JARS) is a special class loading for Spark while
–driver-class-path (SPARK_CLASSPATH) is captured by the startup scripts and
appended to classpath settings that is used to start the JVM running the
driver
You can reference
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql
QQ - did you download the Spark 1.1 binaries that included the Hadoop one?
Does this happen if you're using the Spark 1.1 binaries that do not include
the Hadoop jars?
On Wed, Oct 29, 2014 at 11:31 AM, Ron Ayoub ronalday...@live.com wrote:
Apparently Spark does require Hadoop even if you do not
When you are starting the thrift server service - are you connecting to it
locally or is this on a remote server when you use beeline and/or Tableau?
On Thu, Oct 30, 2014 at 8:00 AM, Bojan Kostic blood9ra...@gmail.com wrote:
I use beta driver SQL ODBC from Databricks.
--
View this message
extraction job against multiple data sources via Hadoop streaming.
Another good call out but utilizing Scala within Spark is that most of the
Spark code is written in Scala.
On Sat, Nov 22, 2014 at 08:12 Denny Lee denny.g@gmail.com wrote:
There are various scenarios where traditional Hadoop
By any chance are you using Spark 1.0.2? registerTempTable was introduced
from Spark 1.1+ while for Spark 1.0.2, it would be registerAsTable.
On Sun Nov 23 2014 at 10:59:48 AM riginos samarasrigi...@gmail.com wrote:
Hi guys ,
Im trying to do the Spark SQL Programming Guide but after the:
It sort of depends on your environment. If you are running on your local
environment, I would just download the latest Spark 1.1 binaries and you'll
be good to go. If its a production environment, it sort of depends on how
you are setup (e.g. AWS, Cloudera, etc.)
On Sun Nov 23 2014 at 11:27:49
To determine if this is a Windows vs. other configuration, can you just try
to call the Spark-class.cmd SparkSubmit without actually referencing the
Hadoop or Thrift server classes?
On Tue Nov 25 2014 at 5:42:09 PM Judy Nash judyn...@exchange.microsoft.com
wrote:
I traced the code and used
My submissions of Spark on YARN (CDH 5.2) resulted in a few thousand steps.
If I was running this on standalone cluster mode the query finished in 55s
but on YARN, the query was still running 30min later. Would the hard coded
sleeps potentially be in play here?
On Fri, Dec 5, 2014 at 11:23 Sandy
, and --num-executors
arguments? When running against a standalone cluster, by default Spark
will make use of all the cluster resources, but when running against YARN,
Spark defaults to a couple tiny executors.
-Sandy
On Fri, Dec 5, 2014 at 11:32 AM, Denny Lee denny.g@gmail.com
wrote:
My
Okay, my bad for not testing out the documented arguments - once i use the
correct ones, the query shrinks completes in ~55s (I can probably make it
faster). Thanks for the help, eh?!
On Fri Dec 05 2014 at 10:34:50 PM Denny Lee denny.g@gmail.com wrote:
Sorry for the delay in my response
This is perhaps more of a YARN question than a Spark question but i was
just curious to how is memory allocated in YARN via the various
configurations. For example, if I spin up my cluster with 4GB with a
different number of executors as noted below
4GB executor-memory x 10 executors = 46GB
* executorMemory.
When you set executor memory, the yarn resource request is executorMemory
+ yarnOverhead.
- Arun
On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee denny.g@gmail.com wrote:
This is perhaps more of a YARN question than a Spark question but i was
just curious to how is memory allocated
Thanks Sandy!
On Mon, Dec 8, 2014 at 23:15 Sandy Ryza sandy.r...@cloudera.com wrote:
Another thing to be aware of is that YARN will round up containers to the
nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults
to 1024.
-Sandy
On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee
Yes, that is correct. A quick reference on this is the post
https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1
with the pertinent section being:
It is important to note that when you create Spark tables (for
Hi Xiaoyong,
SparkSQL has already been released and has been part of the Spark code-base
since Spark 1.0. The latest stable release is Spark 1.1 (here's the Spark
SQL Programming Guide
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html) and we're
currently voting on Spark 1.2.
Hive
I have a large of files within HDFS that I would like to do a group by
statement ala
val table = sc.textFile(hdfs://)
val tabs = table.map(_.split(\t))
I'm trying to do something similar to
tabs.map(c = (c._(167), c._(110), c._(200))
where I create a new RDD that only has
but that isn't
looks like
the way to go given the context. What's not working?
Kr, Gerard
On Dec 14, 2014 5:17 PM, Denny Lee denny.g@gmail.com wrote:
I have a large of files within HDFS that I would like to do a group by
statement ala
val table = sc.textFile(hdfs://)
val tabs = table.map(_.split
Yes - that works great! Sorry for implying I couldn't. Was just more
flummoxed that I couldn't make the Scala call work on its own. Will
continue to debug ;-)
On Sun, Dec 14, 2014 at 11:39 Michael Armbrust mich...@databricks.com
wrote:
BTW, I cannot use SparkSQL / case right now because my table
tabs.map(c = (c(167), c(110), c(200)) instead of tabs.map(c = (c._(167),
c._(110), c._(200))
On Sun, Dec 14, 2014 at 3:12 PM, Denny Lee denny.g@gmail.com wrote:
Yes - that works great! Sorry for implying I couldn't. Was just more
flummoxed that I couldn't make the Scala call work on its
I'm curious if you're seeing the same thing when using bdutil against GCS?
I'm wondering if this may be an issue concerning the transfer rate of Spark
- Hadoop - GCS Connector - GCS.
On Wed Dec 17 2014 at 10:09:17 PM Alessandro Baretta alexbare...@gmail.com
wrote:
All,
I'm using the Spark
. See the
following.
alex@hadoop-m:~/split$ time bash -c gsutil ls
gs://my-bucket/20141205/csv/*/*/* | wc -l
6860
real0m6.971s
user0m1.052s
sys 0m0.096s
Alex
On Wed, Dec 17, 2014 at 10:29 PM, Denny Lee denny.g@gmail.com wrote:
I'm curious if you're seeing the same
to test this? But more importantly, what
information would this give me?
On Wed, Dec 17, 2014 at 10:46 PM, Denny Lee denny.g@gmail.com wrote:
Oh, it makes sense of gsutil scans through this quickly, but I was
wondering if running a Hadoop job / bdutil would result in just as fast
scans
To clarify, there isn't a Hadoop 2.6 profile per se but you can build using
-Dhadoop.version=2.4 which works with Hadoop 2.6.
On Fri, Dec 19, 2014 at 12:55 Ted Yu yuzhih...@gmail.com wrote:
You can use hadoop-2.4 profile and pass -Dhadoop.version=2.6.0
Cheers
On Fri, Dec 19, 2014 at 12:51
Sorry Ted! I saw profile (-P) but missed the -D. My bad!
On Fri, Dec 19, 2014 at 16:46 Ted Yu yuzhih...@gmail.com wrote:
Here is the command I used:
mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests
FYI
On Fri, Dec 19, 2014 at 4:35 PM, Denny
You should be able to kill the job using the webUI or via spark-class.
More info can be found in the thread:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-kill-a-Spark-job-running-in-cluster-mode-td18583.html.
HTH!
On Tue, Dec 23, 2014 at 4:47 PM, durga durgak...@gmail.com wrote:
I've been working with Spark 1.2 and Mesos 0.21.0 and while I have set the
spark.executor.uri within spark-env.sh (and directly within bash as well),
the Mesos slaves do not seem to be able to access the spark tgz file via
HTTP or HDFS as per the message below.
14/12/30 15:57:35 INFO SparkILoop:
Hi Ningjun,
I have been working with Spark 1.2 on Windows 7 and Windows 2008 R2 (purely
for development purposes). I had most recently installed them utilizing
Java 1.8, Scala 2.10.4, and Spark 1.2 Precompiled for Hadoop 2.4+. A handy
thread concerning the null\bin\winutils issue is addressed
works.
--
*From:* Denny Lee denny.g@gmail.com
*Sent:* Thursday, February 5, 2015 12:20 PM
*To:* İsmail Keskin; Ashutosh Trivedi (MT2013030)
*Cc:* user@spark.apache.org
*Subject:* Re: Tableau beta connector
Some quick context behind how Tableau interacts
and tableau can extract that RDD persisted on hive.
Regards,
Ashutosh
--
*From:* Denny Lee denny.g@gmail.com
*Sent:* Thursday, February 5, 2015 1:27 PM
*To:* Ashutosh Trivedi (MT2013030); İsmail Keskin
*Cc:* user@spark.apache.org
*Subject:* Re: Tableau beta
A great presentation by Evan Chan on utilizing Cassandra as Jonathan noted
is at: OLAP with Cassandra and Spark
http://www.slideshare.net/EvanChan2/2014-07olapcassspark.
On Tue Feb 03 2015 at 10:03:34 AM Jonathan Haddad j...@jonhaddad.com wrote:
Write out the rdd to a cassandra table. The
, Denny Lee denny.g@gmail.com wrote:
I may be missing something here but typically when the hive-site.xml
configurations do not require you to place s within the configuration
itself. Both the retry.delay and socket.timeout values are in seconds so
you should only need to place the integer value
I may be missing something here but typically when the hive-site.xml
configurations do not require you to place s within the configuration
itself. Both the retry.delay and socket.timeout values are in seconds so
you should only need to place the integer value (which are in seconds).
On Sun Feb
.
On Fri, Feb 20, 2015 at 9:55 AM, Denny Lee denny.g@gmail.com wrote:
Quickly reviewing the latest SQL Programming Guide
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md
(in github) I had a couple of quick questions:
1) Do we need to instantiate the SparkContext
Hi Rares,
If you dig into the descriptions for the two jobs, it will probably return
something like:
Job ID: 1
org.apache.spark.rdd.RDD.takeSample(RDD.scala:447)
$line41.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)
...
Job ID: 0
*
Cheers,
Sandeep.v
On Wed, Mar 25, 2015 at 11:10 AM, sandeep vura sandeepv...@gmail.com
wrote:
No I am just running ./spark-shell command in terminal I will try with
above command
On Wed, Mar 25, 2015 at 11:09 AM, Denny Lee denny.g@gmail.com
wrote:
Did you include the connection
Did you include the connection to a MySQL connector jar so that way
spark-shell / hive can connect to the metastore?
For example, when I run my spark-shell instance in standalone mode, I use:
./spark-shell --master spark://servername:7077 --driver-class-path
/lib/mysql-connector-java-5.1.27.jar
As you noted, you can change the spark.driver.maxResultSize value in your
Spark Configurations (https://spark.apache.org/docs/1.2.0/configuration.html).
Please reference the Spark Properties section noting that you can modify
these properties via the spark-defaults.conf or via SparkConf().
HTH!
Hi Vincent,
This may be a case that you're missing a semi-colon after your CREATE
TEMPORARY TABLE statement. I ran your original statement (missing the
semi-colon) and got the same error as you did. As soon as I added it in, I
was good to go again:
CREATE TEMPORARY TABLE jsonTable
USING
Thanks Felix :)
On Wed, Apr 1, 2015 at 00:08 Felix Cheung felixcheun...@hotmail.com wrote:
This is tracked by these JIRAs..
https://issues.apache.org/jira/browse/SPARK-5947
https://issues.apache.org/jira/browse/SPARK-5948
--
From: denny.g@gmail.com
Date:
If you're not using MySQL as your metastore for Hive, out of curiosity what
are you using?
The error you are seeing is common when there isn't the correct driver to
allow Spark to connect to the Hive metastore because the correct driver
isn't there.
As well, I noticed that you're using
BTW, a tool that I have been using to help do the preaggregation of data
using hyperloglog in combination with Spark is atscale (http://atscale.com/).
It builds the aggregations and makes use of the speed of SparkSQL - all
within the context of a model that is accessible by Tableau or Qlik.
On
How are you running your spark instance out of curiosity? Via YARN or
standalone mode? When connecting Spark thriftserver to the Spark service,
have you allocated enough memory and CPU when executing with spark?
On Sun, Mar 22, 2015 at 3:39 AM fanooos dev.fano...@gmail.com wrote:
We have
It depends on your setup but one of the locations is /var/log/mesos
On Wed, Mar 4, 2015 at 19:11 lisendong lisend...@163.com wrote:
I ‘m sorry, but how to look at the mesos logs?
where are them?
在 2015年3月4日,下午6:06,Akhil Das ak...@sigmoidanalytics.com 写道:
You can check in the mesos logs
paper!! We were already
using it as a guideline for our tests.
Best regards,
Francisco
--
From: Denny Lee denny.g@gmail.com
Sent: 22/02/2015 17:56
To: Ashic Mahtab as...@live.com; Francisco Orchard forch...@gmail.com;
Apache Spark user@spark.apache.org
Hi Francisco,
Out of curiosity - why ROLAP mode using multi-dimensional mode (vs tabular)
from SSAS to Spark? As a past SSAS guy you've definitely piqued my
interest.
The one thing that you may run into is that the SQL generated by SSAS can
be quite convoluted. When we were doing the same thing
Back to thrift, there was an earlier thread on this topic at
http://mail-archives.apache.org/mod_mbox/spark-user/201411.mbox/%3CCABPQxsvXA-ROPeXN=wjcev_n9gv-drqxujukbp_goutvnyx...@mail.gmail.com%3E
that may be useful as well.
On Sun Feb 22 2015 at 8:42:29 AM Denny Lee denny.g@gmail.com wrote
Hi Suhel,
My team is currently working with a lot of SQL Server databases as one of
our many data sources and ultimately we pull the data into HDFS from SQL
Server. As we had a lot of SQL databases to hit, we used the jTDS driver
and SQOOP to extract the data out of SQL Server and into HDFS
Quickly reviewing the latest SQL Programming Guide
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md
(in github) I had a couple of quick questions:
1) Do we need to instantiate the SparkContext as per
// sc is an existing SparkContext.
val sqlContext = new
The error message you have is:
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:file:/user/hive/warehouse/src is not a directory or
unable to create one)
Could you verify that you (the user you are running under) has the rights
to create
descriptionlocation of default database for the
warehouse/description
/property
Do I need to do anything explicitly other than placing hive-site.xml in
the spark.conf directory ?
Thanks !!
On Wed, Feb 25, 2015 at 11:42 AM, Denny Lee denny.g@gmail.com wrote:
The error message
It may have to do with the akka heartbeat interval per SPARK-3923 -
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-3923 ?
On Tue, Feb 24, 2015 at 16:40 Xi Shen davidshe...@gmail.com wrote:
Hi Sean,
I launched the spark-shell on the same machine as I started YARN service.
I
Upon reviewing your other thread, could you confirm that your Hive
metastore that you can connect to via Hive is a MySQL database? And to
also confirm, when you're running spark-shell and doing a show tables
statement, you're getting the same error?
On Fri, Mar 27, 2015 at 6:08 AM ÐΞ€ρ@Ҝ (๏̯͡๏)
By any chance does this thread address look similar:
http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html
?
On Tue, Mar 24, 2015 at 5:23 AM Harut Martirosyan
harut.martiros...@gmail.com wrote:
What is performance overhead caused by YARN,
Hadoop 2.5 would be referenced as via -Dhadoop-2.5 using the profile
-Phadoop-2.4
Please note earlier in the link the section:
# Apache Hadoop 2.4.X or 2.5.X
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package
Versions of Hadoop after 2.5.X may or may not work with the
Perhaps this email reference may be able to help from a DataFrame
perspective:
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201503.mbox/%3CCALte62ztepahF=5hk9rcfbnyk4z43wkcq4fkdcbwmgf_3_o...@mail.gmail.com%3E
On Wed, Mar 25, 2015 at 7:29 PM Haopu Wang hw...@qilinsoft.com wrote:
You may be able to utilize Spork (Pig on Apache Spark) as a mechanism to do
this: https://github.com/sigmoidanalytics/spork
On Mon, Mar 23, 2015 at 2:29 AM Dai, Kevin yun...@ebay.com wrote:
Hi, all
Can spark use pig’s load function to load data?
Best Regards,
Kevin.
+1 - I currently am doing what Marcelo is suggesting as I have a CDH 5.2
cluster (with Spark 1.1) and I'm also running Spark 1.3.0+ side-by-side in
my cluster.
On Wed, Mar 18, 2015 at 1:23 PM Marcelo Vanzin van...@cloudera.com wrote:
Since you're using YARN, you should be able to download a
From the standpoint of Spark SQL accessing the files - when it is hitting
Hive, it is in effect hitting HDFS as well. Hive provides a great
framework where the table structure is already well defined.But
underneath it, Hive is just accessing files from HDFS so you are hitting
HDFS either way.
Quick question - the output of a dataframe is in the format of:
[2015-04, ArrayBuffer(A, B, C, D)]
and I'd like to return it as:
2015-04, A
2015-04, B
2015-04, C
2015-04, D
What's the best way to do this?
Thanks in advance!
, Apr 2, 2015 at 7:10 PM, Denny Lee denny.g@gmail.com wrote:
Quick question - the output of a dataframe is in the format of:
[2015-04, ArrayBuffer(A, B, C, D)]
and I'd like to return it as:
2015-04, A
2015-04, B
2015-04, C
2015-04, D
What's the best way to do this?
Thanks
You may need to specify the hive port itself. For example, my own Thrift
start command is in the form:
./sbin/start-thriftserver.sh --master spark://$myserver:7077
--driver-class-path $CLASSPATH --hiveconf hive.server2.thrift.bind.host
$myserver --hiveconf hive.server2.thrift.port 1
HTH!
Thanks for the correction Mark :)
On Sun, Apr 19, 2015 at 3:45 PM Mark Hamstra m...@clearstorydata.com
wrote:
Almost. Jobs don't get skipped. Stages and Tasks do if the needed
results are already available.
On Sun, Apr 19, 2015 at 3:18 PM, Denny Lee denny.g@gmail.com wrote:
The job
Similar to what Dean called out, we build Puppet manifests so we could do
the automation - its a bit of work to setup, but well worth the effort.
On Fri, Apr 24, 2015 at 11:27 AM Dean Wampler deanwamp...@gmail.com wrote:
It's mostly manual. You could try automating with something like Chef, of
Delete from table is available as part of Hive 0.14 (reference: Apache Hive
Language Manual DML - Delete
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete)
while Spark 1.3 defaults to Hive 0.13.Perhaps rebuild Spark with Hive
0.14 or generate a new
Just wondering if we have any timeline on when the hive skew flag will be
included within SparkSQL?
Thanks!
Denny
Bummer - out of curiosity, if you were to use the classpath.first or
perhaps copy the jar to the slaves could that actually do the trick? The
latter isn't really all that efficient but just curious if that could do
the trick.
On Thu, Apr 16, 2015 at 7:14 AM ARose ashley.r...@telarix.com wrote:
Support for sub queries in predicates hasn't been resolved yet - please
refer to SPARK-4226
BTW, Spark 1.3 default bindings to Hive 0.13.1
On Fri, Apr 17, 2015 at 09:18 ARose ashley.r...@telarix.com wrote:
So I'm trying to store the results of a query into a DataFrame, but I get
the
If you're doing in Scala per se - then you can probably just reference
JodaTime or Java Date / Time classes. If are using SparkSQL, then you can
use the various Hive date functions for conversion.
On Tue, Apr 14, 2015 at 11:04 AM BASAK, ANANDA ab9...@att.com wrote:
I need some help to convert
At this time, the JDBC Data source is not extensible so it cannot support
SQL Server. There was some thoughts - credit to Cheng Lian for this -
about making the JDBC data source extensible for third party support
possibly via slick.
On Mon, Apr 6, 2015 at 10:41 PM bipin bipin@gmail.com
That's correct, at this time MS SQL Server is not supported through the
JDBC data source at this time. In my environment, we've been using Hadoop
streaming to extract out data from multiple SQL Servers, pushing the data
into HDFS, creating the Hive tables and/or converting them into Parquet,
and
something like this would work. You might need to play with the
type.
df.explode(arrayBufferColumn) { x = x }
On Fri, Apr 3, 2015 at 6:43 AM, Denny Lee denny.g@gmail.com wrote:
Thanks Dean - fun hack :)
On Fri, Apr 3, 2015 at 6:11 AM Dean Wampler deanwamp...@gmail.com
wrote:
A hack
By default Spark 1.3 has bindings to Hive 0.13.1 though you can bind it to
Hive 0.12 if you specify it in the profile when building Spark as per
https://spark.apache.org/docs/1.3.0/building-spark.html.
If you are downloading a pre built version of Spark 1.3 - then by default,
it is set to Hive
1 - 100 of 145 matches
Mail list logo