Hi all,
The co-founder of Databricks just demo'ed that we can stream tweets with
structured streaming: https://youtu.be/9xSz0ppBtFg?t=16m42s but he didn't
show how he did it - does anyone know how to provide credentials to
structured streaming?
Looks like an equal sign is missing between partitions and 200.
On Sat, May 21, 2016 at 8:31 PM, SRK wrote:
> Hi,
>
> How to set the degree of parallelism in Spark SQL? I am using the following
> but it somehow seems to allocate only two executors at a time.
>
>
Hi,
How to set the degree of parallelism in Spark SQL? I am using the following
but it somehow seems to allocate only two executors at a time.
sqlContext.sql(" set spark.sql.shuffle.partitions 200 ")
Thanks,
Swetha
--
View this message in context:
Thank you, Amit! I was looking for this kind of information.
I did not fully read your paper, I see in it a TODO with basically the same
question(s) [1], maybe someone from Spark team (including Databricks) will be
so kind to send some feedback..
Best,
Ovidiu
[1] Integrate “Structured
I was able to verify the similar exceptions occur in Spark 2.0.0-preview.
I have create this JIRA: https://issues.apache.org/jira/browse/SPARK-15467
You mentioned using beans instead of case classes, do you have an example
(or test case) that I can see?
-Don
On Fri, May 20, 2016 at 3:49 PM,
Hi,
I usually run Hive 2 on Spark 1..3.1 engine (as opposed using the
default MR or TEZ). I tried to make Hive 2 work with TEZ 0.82 but that did
not do much.
Anyway I will try to make it work.
Today I compiled Spark 1.6.1 from source excluding the Hadoop libraries. I
did this one before for
It seems I forgot to add the link to the “Technical Vision” paper so there it
is -
https://docs.google.com/document/d/1y4qlQinjjrusGWlgq-mYmbxRW2z7-_X5Xax-GG0YsC0/edit?usp=sharing
From: "Sela, Amit" >
Date: Saturday, May 21, 2016 at 11:52 PM
To:
This is a “Technical Vision” paper for the Spark runner, which provides general
guidelines to the future development of Spark’s Beam support as part of the
Apache Beam (incubating) project.
This is our JIRA -
Hi I am having DataFrame with huge skew data in terms of TB and I am doing
groupby on 8 fields which I cant avoid unfortunately. I am looking to
optimize this I have found hive has
set hive.groupby.skewindata=true;
I dont use Hive I have Spark DataFrame can we achieve above Spark? Please
guide.
https://issues.apache.org/jira/browse/SPARK-15078 was just a bunch of test
harness and added no new functionality. To reduce confusion, I just
backported it into branch-2.0 so SPARK-15078 is now in 2.0 too.
Can you paste a query you were testing?
On Sat, May 21, 2016 at 10:49 AM, Kamalesh Nair
Hi experts,
I'm using Apache Spark Streaming 1.6.1 to write a Java application that
joins two Key/Value data streams and writes the output to HDFS. The two data
streams contain K/V strings and are periodically ingested in Spark from HDFS
by using textFileStream().
The two data streams aren't
Hi,
>From the Spark 2.0 Release webinar what I understood is, the newer version
have significantly expanded the SQL capabilities of Spark, with the
introduction of a new ANSI SQL parser and support for Subqueries. It also
says, Spark 2.0 can run all the 99 TPC-DS queries, which require many of
Hi experts,I'm using Apache Spark Streaming 1.6.1 to write a Java application
that joins two Key/Value data streams and writes the output to HDFS. The two
data streams contain K/V strings and are periodically ingested in Spark from
HDFS by using textFileStream().
The two data streams aren't
I got my answer.
The way to access S3 has changed.
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.s3a.access.key", accessKey)
hadoopConf.set("fs.s3a.secret.key", secretKey)
val lines = ssc.textFileStream("s3a://amg-events-out/")
This worked.
Cheers,
Ben
> On May 21, 2016, at
Thanks Ted, I know in spark-she'll can we set same in spark-sql shell ?
If I don't set hive context from my understanding spark is using its own SQL
and date functions right ? Like for example interval ?
Thanks
Sri
Sent from my iPhone
> On 21 May 2016, at 08:19, Ted Yu
In spark-shell:
scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext
scala> var hc: HiveContext = new HiveContext(sc)
FYI
On Sat, May 21, 2016 at 8:11 AM, Sri wrote:
> Hi ,
>
> You mean hive-site.xml file right ?,I did
Hi ,
You mean hive-site.xml file right ?,I did placed the hive-site.xml in spark
conf but not sure how spark certain date functions like interval is still
working .
Hive 0.14 don't have interval function but how spark is managing to do that ?
Does spark has its own date functions ? I am using
Not that I can share, unfortunately. It is on my backlog to create a
repository with examples, but I am currently a bit overloaded, so don't
hold your breath. :-/
If you want to be notified when it happens, please follow me on Twitter or
Google+. See web site below for links.
Regards,
Lars
Ted,
I only see 1 jets3t-0.9.0 jar in the classpath after running this to list the
jars.
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/jars/jets3t-0.9.0.jar
I don’t know what else
3.And is the same behavior applied to streaming application also?
On Sat, May 21, 2016 at 7:44 PM, Shushant Arora
wrote:
> And will it allocate rest executors when other containers get freed which
> were occupied by other hadoop jobs/spark applications?
>
> And is
And will it allocate rest executors when other containers get freed which
were occupied by other hadoop jobs/spark applications?
And is there any minimum (% of executors demanded vs available) executors
it wait for to be freed or just start with even 1 .
Thanks!
On Thu, Apr 21, 2016 at 8:39 PM,
Maybe more than one version of jets3t-xx.jar was on the classpath.
FYI
On Fri, May 20, 2016 at 8:31 PM, Benjamin Kim wrote:
> I am trying to stream files from an S3 bucket using CDH 5.7.0’s version of
> Spark 1.6.0. It seems not to work. I keep getting this error.
>
>
What is the motivation to use such an old version of Hive? This will lead to
less performance and other risks.
> On 21 May 2016, at 01:57, "kali.tumm...@gmail.com"
> wrote:
>
> Hi All ,
>
> Is there a way to ask spark and spark-sql to use Hive 0.14 version instead
>
Sou want to use hive version 0.14 when using Spark 1.6?
Go to directory $SPARK_HOME/conf and create a softlink to hive-core.xml file
*cd $SPARK_HOME*
hduser@rhes564: /usr/lib/spark-1.6.1-bin-hadoop2.6>
*cd conf*hduser@rhes564: /usr/lib/spark-1.6.1-bin-hadoop2.6/conf> ls -ltr
lrwxrwxrwx 1
Hi I have Spark job which does group by and I cant avoid it because of my use
case. I have large dataset around 1 TB which I need to process/update in
DataFrame. Now my jobs shuffles huge data and slows things because of
shuffling and groupby. One reason I see is my data is skew some of my group
25 matches
Mail list logo