Hi All,
I am currently on Spark 1.6 and I was doing a sql join on two tables that
are over 100 million rows each and I noticed that it was spawn 3+ tasks
(this is the progress meter that we are seeing show up). We tried to
coalesece, repartition and shuffle partitions to drop the number of
Also, the results of the inner query produced the same results:
sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , d.account AS
d_acc , s.ad as s_ad , d.ad as d_ad , s.spend AS s_spend ,
d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s INNER JOIN
dps_pin_promo_lt d ON (s.date
Hi All,
I am running into a weird result with Spark SQL Outer joins. The results
for all of them seem to be the same, which does not make sense due to the
data. Here are the queries that I am running with the results:
sqlContext.sql("SELECT s.date AS edate , s.account AS s_acc , d.account AS
Hi All,
I am currently trying to debug a spark application written in scala. I have
a main method:
def main(args: Array[String]) {
...
SocialUtil.triggerAndWait(triggerUrl)
...
The SocialUtil object is included in a seperate jar. I launched the
spark-submit command using --jars
Hi All,
I was wondering how rdd transformation work on schemaRDDs. Is there a way
to force the rdd transform to keep the schemaRDD types or do I need to
recreate the schemaRDD by applying the applySchema method?
Currently what I have is an array of SchemaRDDs and I just want to do a
union
Looks like if I use unionAll this works.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-regular-rdd-transforms-on-schemaRDD-tp22105p22107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi All,
I wrote out a complex parquet file from spark sql and now I am trying to put
a hive table on top. I am running into issues with creating the hive table
itself. Here is the json that I wrote out to parquet using spark sql:
Hi All,
I was noodling around with loading in a json file into spark sql's hive
context and I noticed that I get the following message after loading in the
json file:
PhysicalRDD [_corrupt_record#0], MappedRDD[5] at map at JsonRDD.scala:47
I am using the HiveContext to load in the json file
Hi All,
I am current trying to write out a scheme RDD to avro. I noticed that there
is a databricks spark-avro library and I have included that in my
dependencies, but it looks like I am not able to access the AvroSaver
object. On compilation of the job I get this:
error: not found: value
Hi All,
I am currently trying to write a very wide file into parquet using spark
sql. I have 100K column records that I am trying to write out, but of
course I am running into space issues(out of memory - heap space). I was
wondering if there are any tweaks or work arounds for this.
I am
Hi All,
I am trying to create a class that wraps functionalities that I need; some
of these functions require access to the SparkContext, which I would like to
pass in. I know that the SparkContext is not seralizable, and I am not
planning on passing it to worker nodes or anything, I just want
Hi All,
I am currently having problem with the maven dependencies for version 1.2.0
of spark-core and spark-hive. Here are my dependencies:
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.2.0/version
/dependency
dependency
Hi All,
I am currently having issues reading in a json file using spark sql's api.
Here is what the json file looks like:
{
namespace: spacey,
name: namer,
type: record,
fields: [
{name:f1,type:[null,string]},
{name:f2,type:[null,string]},
{name:f3,type:[null,string]},
Hi All,
I am currently trying to build out a spark job that would basically convert
a csv file into parquet. From what I have seen it looks like spark sql is
the way to go and how I would go about this would be to load in the csv file
into an RDD and convert it into a schemaRDD by injecting in
Hi,
I have been using spark streaming in standalone mode and now I want to
migrate to spark running on yarn, but I am not sure how you would you would
go about designating a specific node in the cluster to act as an avro
listener since I am using flume based push approach with spark.
--
View
I currently writing an application that uses spark streaming. What I am
trying to do is basically read in a few files (I do this by using the spark
context textFile) and then process those files inside an action that I apply
to a streaming RDD. Here is the main code below:
def main(args:
I am still fairly new to spark and spark streaming. I have been struggling
with how to properly debug spark streaming and I was wondering what is the
best approach. I have been basically putting println statements everywhere,
but sometimes they show up when I run the job and sometimes they
Ok cool. So in that case the only way I could think of doing this would be
calling the toArray method on those RDDs which would return Array[String]
and store them as broadcast variables. I read about the broadcast
variables, but it still fuzzy. I am assume that since broadcast variables
are
Hi All,
I am trying to submit a spark job that I have built in maven using the
following command:
/usr/bin/spark-submit --deploy-mode client --class com.spark.TheMain
--master local[1] /home/cloudera/myjar.jar 100
But I seem to be getting the following error:
Exception in thread main
I have been trying to understand how spark streaming and hbase connect, but
have not been successful. What I am trying to do is given a spark stream,
process that stream and store the results in an hbase table. So far this is
what I have:
import org.apache.spark.SparkConf
import
in the classpath ?
Do you observe any exception from the code below or in region server
log
?
Which hbase release are you using ?
On Wed, Sep 3, 2014 at 2:05 PM, kpeng1 [hidden email]
http://user/SendEmail.jtp?type=nodenode=13385i=3 wrote:
I have been trying to understand how spark streaming
It looks like the issue I had is that I didn't pull in htrace-core jar into
the spark class path.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-Connecting-to-HBase-in-spark-shell-tp12855p12924.html
Sent from the Apache Spark User List mailing list
22 matches
Mail list logo