Re: HCatInputFormat combine splits

2015-05-14 Thread Pradeep Gollakota
The following property has been to no effect.

mapreduce.input.fileinputformat.split.maxsize = 67108864

I'm still getting 1 Mapper per file.

On Thu, May 14, 2015 at 10:27 AM, Ankit Bhatnagar ank...@yahoo-inc.com
wrote:

 you can explicitly set the split size



   On Wednesday, May 13, 2015 11:37 PM, Pradeep Gollakota 
 pradeep...@gmail.com wrote:


 Hi All,

 I'm writing an MR job to read data using HCatInputFormat... however, the
 job is generating too many splits. I don't have this problem when running
 queries in Hive since it combines splits by default.

 Is there an equivalent in MR so that I'm not generating thousands of
 mappers?

 Thanks,
 Pradeep





Re: HCatInputFormat combine splits

2015-05-14 Thread Ankit Bhatnagar
try thesemapred.max.split.size= mapred.min.split.size=  
mapreduce.input.fileinputformat.split.maxsize= 
mapreduce.input.fileinputformat.split.minsize=   



 On Thursday, May 14, 2015 11:04 AM, Pradeep Gollakota 
pradeep...@gmail.com wrote:
   

 The following property has been to no effect.
mapreduce.input.fileinputformat.split.maxsize = 67108864
I'm still getting 1 Mapper per file.
On Thu, May 14, 2015 at 10:27 AM, Ankit Bhatnagar ank...@yahoo-inc.com wrote:

you can explicitly set the split size 


 On Wednesday, May 13, 2015 11:37 PM, Pradeep Gollakota 
pradeep...@gmail.com wrote:
   

 Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the job is 
generating too many splits. I don't have this problem when running queries in 
Hive since it combines splits by default.
Is there an equivalent in MR so that I'm not generating thousands of mappers?
Thanks,Pradeep