Hi,
try this conf
val sc = new SparkContext(conf)
sc.hadoopConfiguration.setBoolean("parquet.enable.summary-metadata", false)
Regards,
Sai Ganesh
On Thu, Sep 15, 2016 at 11:34 PM, gaurav24 [via Apache Spark User List] <
ml-node+s1001560n27738...@n3.nabble.com> wrote:
> Hi Rok,
>
> facing
How are you writing it out? Can you post some code?
Regards
Sab
On 14-Nov-2015 5:21 am, "Rok Roskar" wrote:
> I'm not sure what you mean? I didn't do anything specifically to partition
> the columns
> On Nov 14, 2015 00:38, "Davies Liu" wrote:
>
>>
Have you use any partitioned columns when write as json or parquet?
On Fri, Nov 6, 2015 at 6:53 AM, Rok Roskar wrote:
> yes I was expecting that too because of all the metadata generation and
> compression. But I have not seen performance this bad for other parquet
> files
I'm not sure what you mean? I didn't do anything specifically to partition
the columns
On Nov 14, 2015 00:38, "Davies Liu" wrote:
> Do you have partitioned columns?
>
> On Thu, Nov 5, 2015 at 2:08 AM, Rok Roskar wrote:
> > I'm writing a ~100 Gb
Do you have partitioned columns?
On Thu, Nov 5, 2015 at 2:08 AM, Rok Roskar wrote:
> I'm writing a ~100 Gb pyspark DataFrame with a few hundred partitions into a
> parquet file on HDFS. I've got a few hundred nodes in the cluster, so for
> the size of file this is way
I'd expect writing Parquet files slower than writing JSON files since
Parquet involves more complicated encoders, but maybe not that slow.
Would you mind to try to profile one Spark executor using tools like YJP
to see what's the hotspot?
Cheng
On 11/6/15 7:34 AM, rok wrote:
Apologies if
Do you use some compression? Maybe there is some activated by default in your
Hadoop environment?
> On 06 Nov 2015, at 00:34, rok wrote:
>
> Apologies if this appears a second time!
>
> I'm writing a ~100 Gb pyspark DataFrame with a few hundred partitions into a
>
yes I was expecting that too because of all the metadata generation and
compression. But I have not seen performance this bad for other parquet
files I’ve written and was wondering if there could be something obvious
(and wrong) to do with how I’ve specified the schema etc. It’s a very
simple
On 11/6/15 10:53 PM, Rok Roskar wrote:
yes I was expecting that too because of all the metadata generation
and compression. But I have not seen performance this bad for other
parquet files I’ve written and was wondering if there could be
something obvious (and wrong) to do with how I’ve