Hey,
Spark Submit adds maven central spark bintray to the ChainResolver before it
adds any external resolvers.
https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L821
When running on a cluster without internet access, this means the
Hi guys,
Running with a parquet backed table in hive ‘dim_promo_date_curr_p' which has
the following data;
scala sqlContext.sql(select * from pz.dim_promo_date_curr_p).show(3)
15/06/18 00:53:21 INFO ParseDriver: Parsing command: select * from
pz.dim_promo_date_curr_p
15/06/18 00:53:21 INFO
filed https://issues.apache.org/jira/browse/SPARK-8406 to track this. Will
deliver a fix ASAP and this will be included in 1.4.1.
Best,
Cheng
On 6/16/15 12:30 AM, Nathan McCarthy wrote:
Hi all,
Looks like data frame parquet writing is very broken in Spark 1.4.0. We had no
problems with Spark
Hi all,
Looks like data frame parquet writing is very broken in Spark 1.4.0. We had no
problems with Spark 1.3.
When trying to save a data frame with 569610608 rows.
dfc.write.format(parquet).save(“/data/map_parquet_file)
We get random results between runs. Caching the data frame in memory
and
full command you ran Spark with ?
On Wed, Apr 15, 2015 at 11:27 AM, Nathan McCarthy
nathan.mccar...@quantium.com.aumailto:nathan.mccar...@quantium.com.au wrote:
Just an update, tried with the old JdbcRDD and that worked fine.
From: Nathan
nathan.mccar...@quantium.com.aumailto:nathan.mccar
Tried with 1.3.0 release (built myself) the most recent 1.3.1 Snapshot off
the 1.3 branch.
Haven't tried with 1.4/master.
From: Wang, Daoyuan [daoyuan.w...@intel.com]
Sent: Wednesday, April 15, 2015 5:22 PM
To: Nathan McCarthy; user@spark.apache.org
Subject: RE
: Wednesday, April 15, 2015 5:22 PM
To: Nathan McCarthy; user@spark.apache.orgmailto:user@spark.apache.org
Subject: RE: SparkSQL JDBC Datasources API when running on YARN - Spark 1.3.0
Can you provide your spark version?
Thanks,
Daoyuan
From: Nathan McCarthy [mailto:nathan.mccar...@quantium.com.au]
Sent
Hi guys,
Trying to use a Spark SQL context’s .load(“jdbc, …) method to create a DF from
a JDBC data source. All seems to work well locally (master = local[*]), however
as soon as we try and run on YARN we have problems.
We seem to be running into problems with the class path and loading up the
Just an update, tried with the old JdbcRDD and that worked fine.
From: Nathan
nathan.mccar...@quantium.com.aumailto:nathan.mccar...@quantium.com.au
Date: Wednesday, 15 April 2015 1:57 pm
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL schemaRDD MapPartitions calls - performance issues -
columnar formats?
On 1/11/15 1:40 PM, Nathan McCarthy wrote:
Thanks Cheng Michael! Makes sense. Appreciate the tips!
Idiomatic scala isn't performant. I’ll
are inlined
below.
Cheng
On 1/7/15 11:53 AM, Nathan McCarthy wrote:
Hi,
I’m trying to use a combination of SparkSQL and ‘normal' Spark/Scala via
rdd.mapPartitions(…). Using the latest release 1.2.0.
Simple example; load up some sample data from parquet on HDFS (about 380m rows,
10 columns) on a 7 node
performance on MapPartitions on SchemaRDDs? Is there some unwrapping
going on in the background that catalyst does in a smart way that I’m missing?
Cheers,
~N
Nathan McCarthy
QUANTIUM
Level 25, 8 Chifley, 8-12 Chifley Square
Sydney NSW 2000
T: +61 2 8224 8922
F: +61 2 9292 6444
W
on SchemaRDDs? Is there some unwrapping
going on in the background that catalyst does in a smart way that I’m missing?
Cheers,
~N
Nathan McCarthy
QUANTIUM
Level 25, 8 Chifley, 8-12 Chifley Square
Sydney NSW 2000
T: +61 2 8224 8922
F: +61 2 9292 6444
W: quantium.com.auwww.quantium.com.au
13 matches
Mail list logo