Re: Controlling Number of small files while inserting into Hive table

2017-07-06 Thread Arpan Rajani
Thank you Saurabh. For the use case of reloading the entire table we can use this. Regards, Arpan On 25 Jun 2017 10:10 pm, "Db-Blog" wrote: Hi Arpan, Include the partition column in the distribute by clause of DML, it will generate only one file per day. Hope this will

Re: Controlling Number of small files while inserting into Hive table

2017-06-26 Thread Lefty Leverenz
Saquib Khan, to unsubscribe you need to send a message to user-unsubscr...@hive.apache.org as described here: Mailing Lists . Thanks. -- Lefty On Sun, Jun 25, 2017 at 7:14 PM, saquib khan wrote: > Please remove me from the user

Re: Controlling Number of small files while inserting into Hive table

2017-06-25 Thread saquib khan
Please remove me from the user list. On Sun, Jun 25, 2017 at 5:10 PM Db-Blog wrote: > Hi Arpan, > Include the partition column in the distribute by clause of DML, it will > generate only one file per day. Hope this will resolve the issue. > > "insert into 'target_table'

Re: Controlling Number of small files while inserting into Hive table

2017-06-25 Thread Db-Blog
Hi Arpan, Include the partition column in the distribute by clause of DML, it will generate only one file per day. Hope this will resolve the issue. > "insert into 'target_table' select a,b,c from x where ... distribute by > (date)" > PS: Backdated processing will generate additional file(s).

Controlling Number of small files while inserting into Hive table

2017-06-22 Thread Arpan Rajani
Hello everyone, I am sure many of you might have faced similar issue. We do "insert into 'target_table' select a,b,c from x where .." kind of queries for a nightly load. This insert goes in a new partition of the target_table. Now the concern is : *this inserts load hardly any data* ( I would