subject:"How to control the number of files for dynamic partition in Spark SQL\?"

Re: How to control the number of files for dynamic partition in Spark SQL?

2016-02-01 Thread Benyi Wang

Thanks Deenar, both two methods work. I actually tried the second method in spark-shell, but it didn't work at that time. The reason might be: I registered the data frame eventwk as a temporary table, repartition, then register the table again. Unfortunately I could not reproduce it. Thanks

Re: How to control the number of files for dynamic partition in Spark SQL?

2016-01-30 Thread Deenar Toraskar

The following should work as long as your tables are created using Spark SQL event_wk.repartition(2).write.partitionBy("eventDate").format("parquet" ).insertInto("event) If you want to stick to using "insert overwrite" for Hive compatibility, then you can repartition twice, instead of setting

How to control the number of files for dynamic partition in Spark SQL?

2016-01-29 Thread Benyi Wang

I want to insert into a partition table using dynamic partition, but I don’t want to have 200 files for a partition because the files will be small for my case. sqlContext.sql( """ |insert overwrite table event |partition(eventDate) |select | user, | detail, | eventDate