-
From: tridib [mailto:tridib.sama...@live.com]
Sent: Tuesday, November 25, 2014 9:54 AM
To: u...@spark.incubator.apache.org
Subject: Control number of parquet generated from JavaSchemaRDD
Hello,
I am reading around 1000 input files from disk in an RDD and generating
parquet. It always produces
().setInt(parquet.block.size, MB_128);
No luck.
Is there a way to control the size/number of parquet files generated?
Thanks
Tridib
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717.html
Sent
());
JavaSchemaRDD claimSchemaRdd = sqlCtx.applySchema(claimRdd,
Claim.class);
claimSchemaRdd.coalesce(1)
claimSchemaRdd.saveAsParquetFile(parquetPath);
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from
repartition(1) too.
claimSchemaRdd.saveAsParquetFile(parquetPath);
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19776.html
Sent from the Apache Spark User List mailing
repartition(1) too.
claimSchemaRdd.saveAsParquetFile(parquetPath);
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19776.html
Sent from the Apache Spark User List mailing list
Ohh...how can I miss that. :(. Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19788.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
-list.1001560.n3.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19789.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717p19789.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
.nabble.com/Control-number-of-parquet-generated-from-JavaSchemaRDD-tp19717.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional