Re: Why does Spark 2.0 change number or partitions when reading a parquet file?

2016-12-22 Thread Daniel Siegmann
Spark 2.0.0 introduced "Automatic file coalescing for native data sources" ( http://spark.apache.org/releases/spark-release-2-0-0.html#performance-and-runtime). Perhaps that is the cause? I'm not sure if this feature is mentioned anywhere in the documentation or if there's any way to disable it.

Why does Spark 2.0 change number or partitions when reading a parquet file?

2016-12-22 Thread Kristina Rogale Plazonic
Hi, I write a randomly generated 30,000-row dataframe to parquet. I verify that it has 200 partitions (both in Spark and inspecting the parquet file in hdfs). When I read it back in, it has 23 partitions?! Is there some optimization going on? (This doesn't happen in Spark 1.5) *How can I force

Re: reading the parquet file

2016-03-09 Thread Xinh Huynh
You might want to avoid that unionAll(), which seems to be repeated over 1000 times. Could you do a collect() in each iteration, and collect your results in a local Array instead of a DataFrame? How many rows are returned in "temp1"? Xinh On Tue, Mar 8, 2016 at 10:00 PM, Angel Angel

reading the parquet file

2016-03-08 Thread Angel Angel
Hello Sir/Madam, I writing the spark application in spark 1.4.0. I have one text file with the size of 8 GB. I save that file in parquet format val df2 = sc.textFile("/root/Desktop/database_200/database_200.txt").map(_.split(",")).map(p => Table(p(0),p(1).trim.toInt, p(2).trim.toInt,

Re: reading the parquet file in spark sql

2016-03-07 Thread Manoj Awasthi
>From the parquet file content (dir content) it doesn't look like that parquet write was successful or complete. On Mon, Mar 7, 2016 at 11:17 AM, Angel Angel wrote: > Hello Sir/Madam, > > I am running one spark application having 3 slaves and one master. > > I am wring

reading the parquet file in spark sql

2016-03-06 Thread Angel Angel
Hello Sir/Madam, I am running one spark application having 3 slaves and one master. I am wring the my information using the parquet format. but when i am trying to read it got some error. Please help me to resolve this problem. code ; val sqlContext = new org.apache.spark.sql.SQLContext(sc)

reading multiple parquet file using spark sql

2015-09-01 Thread Hafiz Mujadid
-parquet-file-using-spark-sql-tp24537.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org