Spark 2.0.0 introduced "Automatic file coalescing for native data sources" (
http://spark.apache.org/releases/spark-release-2-0-0.html#performance-and-runtime).
Perhaps that is the cause?
I'm not sure if this feature is mentioned anywhere in the documentation or
if there's any way to disable it.
Hi,
I write a randomly generated 30,000-row dataframe to parquet. I verify that
it has 200 partitions (both in Spark and inspecting the parquet file in
hdfs).
When I read it back in, it has 23 partitions?! Is there some optimization
going on? (This doesn't happen in Spark 1.5)
*How can I force
You might want to avoid that unionAll(), which seems to be repeated over
1000 times. Could you do a collect() in each iteration, and collect your
results in a local Array instead of a DataFrame? How many rows are returned
in "temp1"?
Xinh
On Tue, Mar 8, 2016 at 10:00 PM, Angel Angel
Hello Sir/Madam,
I writing the spark application in spark 1.4.0.
I have one text file with the size of 8 GB.
I save that file in parquet format
val df2 =
sc.textFile("/root/Desktop/database_200/database_200.txt").map(_.split(",")).map(p
=> Table(p(0),p(1).trim.toInt, p(2).trim.toInt,
>From the parquet file content (dir content) it doesn't look like that
parquet write was successful or complete.
On Mon, Mar 7, 2016 at 11:17 AM, Angel Angel
wrote:
> Hello Sir/Madam,
>
> I am running one spark application having 3 slaves and one master.
>
> I am wring
Hello Sir/Madam,
I am running one spark application having 3 slaves and one master.
I am wring the my information using the parquet format.
but when i am trying to read it got some error.
Please help me to resolve this problem.
code ;
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
-parquet-file-using-spark-sql-tp24537.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org