Re: How to merge small files

2010-08-10 Thread lei liu
Thank you for your reply.

Could you tell me why it is slower if the two paremeters are true and how
slow it is?

2010/8/10 Namit Jain nj...@facebook.com

 Yes, it will try to run another map-reduce job to merge the files
 
 From: lei liu [liulei...@gmail.com]
 Sent: Monday, August 09, 2010 8:57 AM
 To: hive-user@hadoop.apache.org
 Subject: Re: How to merge small files

 Could you tell me whether the query is slower if I two parameters both are
 true?

 2010/8/9 Namit Jain nj...@facebook.commailto:nj...@facebook.com
 That's right

 
 From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com]
 Sent: Sunday, August 08, 2010 7:18 PM
 To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
 Subject: Re: How to merge small files

 Thank you for your reply.

 Your mean is I will execute below statement:

 statement.execute(set hive.merge.mapfiles=true);
 statement.execute(set hive.merge.mapredfiles=true);

 The two parementers are both true, right?

 2010/8/6 Namit Jain nj...@facebook.commailto:nj...@facebook.commailto:
 nj...@facebook.commailto:nj...@facebook.com
   HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
  HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


 Set the above parameters to true before your query.



 
 From: lei liu [liulei...@gmail.commailto:liulei...@gmail.commailto:
 liulei...@gmail.commailto:liulei...@gmail.com]
 Sent: Thursday, August 05, 2010 8:47 PM
 To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
 mailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
 Subject: How to merge small files

 When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1
 FROM from_statement, there are many files which size is zero are stored to
 hadoop,

 How can I merge these small files?

 Thanks,



 LiuLei






Re: How to merge small files

2010-08-10 Thread Edward Capriolo
On Tue, Aug 10, 2010 at 8:13 AM, lei liu liulei...@gmail.com wrote:
 Thank you for your reply.

 Could you tell me why it is slower if the two paremeters are true and how
 slow it is?

 2010/8/10 Namit Jain nj...@facebook.com

 Yes, it will try to run another map-reduce job to merge the files
 
 From: lei liu [liulei...@gmail.com]
 Sent: Monday, August 09, 2010 8:57 AM
 To: hive-user@hadoop.apache.org
 Subject: Re: How to merge small files

 Could you tell me whether the query is slower if I two parameters both are
 true?

 2010/8/9 Namit Jain nj...@facebook.commailto:nj...@facebook.com
 That's right

 
 From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com]
 Sent: Sunday, August 08, 2010 7:18 PM
 To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
 Subject: Re: How to merge small files

 Thank you for your reply.

 Your mean is I will execute below statement:

 statement.execute(set hive.merge.mapfiles=true);
 statement.execute(set hive.merge.mapredfiles=true);

 The two parementers are both true, right?

 2010/8/6 Namit Jain
 nj...@facebook.commailto:nj...@facebook.commailto:nj...@facebook.commailto:nj...@facebook.com
  HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
  HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


 Set the above parameters to true before your query.



 
 From: lei liu
 [liulei...@gmail.commailto:liulei...@gmail.commailto:liulei...@gmail.commailto:liulei...@gmail.com]
 Sent: Thursday, August 05, 2010 8:47 PM
 To:
 hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
 Subject: How to merge small files

 When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1
 FROM from_statement, there are many files which size is zero are stored to
 hadoop,

 How can I merge these small files?

 Thanks,



 LiuLei






How slow it is is relevant to how much data you have. We can not
answer questions like that, try it both ways and find out for
yourself.

Edward


RE: How to merge small files

2010-08-09 Thread Namit Jain
That's right


From: lei liu [liulei...@gmail.com]
Sent: Sunday, August 08, 2010 7:18 PM
To: hive-user@hadoop.apache.org
Subject: Re: How to merge small files

Thank you for your reply.

Your mean is I will execute below statement:

statement.execute(set hive.merge.mapfiles=true);
statement.execute(set hive.merge.mapredfiles=true);

The two parementers are both true, right?

2010/8/6 Namit Jain nj...@facebook.commailto:nj...@facebook.com
   HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
   HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


Set the above parameters to true before your query.




From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com]
Sent: Thursday, August 05, 2010 8:47 PM
To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
Subject: How to merge small files

When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1 FROM 
from_statement, there are many files which size is zero are stored to hadoop,

How can I merge these small files?

Thanks,



LiuLei




Re: How to merge small files

2010-08-09 Thread lei liu
Could you tell me whether the query is slower if I two parameters both are
true?

2010/8/9 Namit Jain nj...@facebook.com

 That's right

 
 From: lei liu [liulei...@gmail.com]
 Sent: Sunday, August 08, 2010 7:18 PM
 To: hive-user@hadoop.apache.org
 Subject: Re: How to merge small files

 Thank you for your reply.

 Your mean is I will execute below statement:

 statement.execute(set hive.merge.mapfiles=true);
 statement.execute(set hive.merge.mapredfiles=true);

 The two parementers are both true, right?

 2010/8/6 Namit Jain nj...@facebook.commailto:nj...@facebook.com
   HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
   HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


 Set the above parameters to true before your query.



 
 From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com]
 Sent: Thursday, August 05, 2010 8:47 PM
 To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
  Subject: How to merge small files

 When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1
 FROM from_statement, there are many files which size is zero are stored to
 hadoop,

 How can I merge these small files?

 Thanks,



 LiuLei





RE: How to merge small files

2010-08-09 Thread Namit Jain
Yes, it will try to run another map-reduce job to merge the files

From: lei liu [liulei...@gmail.com]
Sent: Monday, August 09, 2010 8:57 AM
To: hive-user@hadoop.apache.org
Subject: Re: How to merge small files

Could you tell me whether the query is slower if I two parameters both are true?

2010/8/9 Namit Jain nj...@facebook.commailto:nj...@facebook.com
That's right


From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com]
Sent: Sunday, August 08, 2010 7:18 PM
To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
Subject: Re: How to merge small files

Thank you for your reply.

Your mean is I will execute below statement:

statement.execute(set hive.merge.mapfiles=true);
statement.execute(set hive.merge.mapredfiles=true);

The two parementers are both true, right?

2010/8/6 Namit Jain 
nj...@facebook.commailto:nj...@facebook.commailto:nj...@facebook.commailto:nj...@facebook.com
  HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
  HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


Set the above parameters to true before your query.




From: lei liu 
[liulei...@gmail.commailto:liulei...@gmail.commailto:liulei...@gmail.commailto:liulei...@gmail.com]
Sent: Thursday, August 05, 2010 8:47 PM
To: 
hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
Subject: How to merge small files

When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1 FROM 
from_statement, there are many files which size is zero are stored to hadoop,

How can I merge these small files?

Thanks,



LiuLei





RE: How to merge small files

2010-08-09 Thread Bakshi, Ankita

Hi,

Sorry to hijack this thread. But I am curious if there any other in-built 
option to merge files in the directory before loading data into the table.

I have a directory in the local file system which contains many small files. I 
want to load it to a single hive table. I am wondering what would be the best 
approach to this problem.

Thanks,
Ankita


-Original Message-
From: Namit Jain [mailto:nj...@facebook.com] 
Sent: Monday, August 09, 2010 9:32 AM
To: hive-user@hadoop.apache.org
Subject: RE: How to merge small files

Yes, it will try to run another map-reduce job to merge the files

From: lei liu [liulei...@gmail.com]
Sent: Monday, August 09, 2010 8:57 AM
To: hive-user@hadoop.apache.org
Subject: Re: How to merge small files

Could you tell me whether the query is slower if I two parameters both are true?

2010/8/9 Namit Jain nj...@facebook.commailto:nj...@facebook.com
That's right


From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com]
Sent: Sunday, August 08, 2010 7:18 PM
To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
Subject: Re: How to merge small files

Thank you for your reply.

Your mean is I will execute below statement:

statement.execute(set hive.merge.mapfiles=true);
statement.execute(set hive.merge.mapredfiles=true);

The two parementers are both true, right?

2010/8/6 Namit Jain 
nj...@facebook.commailto:nj...@facebook.commailto:nj...@facebook.commailto:nj...@facebook.com
  HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
  HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


Set the above parameters to true before your query.




From: lei liu 
[liulei...@gmail.commailto:liulei...@gmail.commailto:liulei...@gmail.commailto:liulei...@gmail.com]
Sent: Thursday, August 05, 2010 8:47 PM
To: 
hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
Subject: How to merge small files

When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1 FROM 
from_statement, there are many files which size is zero are stored to hadoop,

How can I merge these small files?

Thanks,



LiuLei



The information contained in this email message and its attachments is intended 
only for the private and confidential use of the recipient(s) named above, 
unless the sender expressly agrees otherwise. Transmission of email over the 
Internet is not a secure communications medium. If you are requesting or have 
requested the transmittal of personal data, as defined in applicable privacy 
laws by means of email or in an attachment to email, you must select a more 
secure alternate means of transmittal that supports your obligations to protect 
such personal data. If the reader of this message is not the intended recipient 
and/or you have received this email in error, you must take no action based on 
the information in this email and you are hereby notified that any 
dissemination, misuse or copying or disclosure of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify us immediately by email and delete the original message. 



Re: How to merge small files

2010-08-09 Thread Edward Capriolo
Lei,

Are you still using hive 4.1 or have you upgraded, the merge options
mentioned above were probable not present until 5.0

Edward

On Mon, Aug 9, 2010 at 9:59 PM, Todd Lee ronnietodd...@gmail.com wrote:
 as long as the files are inside the same directory, hive will treat them as a 
 table.


 Todd

 On Aug 9, 2010, at 6:07 PM, Bakshi, Ankita ankita.bak...@ironmountain.com 
 wrote:


 Hi,

 Sorry to hijack this thread. But I am curious if there any other in-built 
 option to merge files in the directory before loading data into the table.

 I have a directory in the local file system which contains many small files. 
 I want to load it to a single hive table. I am wondering what would be the 
 best approach to this problem.

 Thanks,
 Ankita


 -Original Message-
 From: Namit Jain [mailto:nj...@facebook.com]
 Sent: Monday, August 09, 2010 9:32 AM
 To: hive-user@hadoop.apache.org
 Subject: RE: How to merge small files

 Yes, it will try to run another map-reduce job to merge the files
 
 From: lei liu [liulei...@gmail.com]
 Sent: Monday, August 09, 2010 8:57 AM
 To: hive-user@hadoop.apache.org
 Subject: Re: How to merge small files

 Could you tell me whether the query is slower if I two parameters both are 
 true?

 2010/8/9 Namit Jain nj...@facebook.commailto:nj...@facebook.com
 That's right

 
 From: lei liu [liulei...@gmail.commailto:liulei...@gmail.com]
 Sent: Sunday, August 08, 2010 7:18 PM
 To: hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
 Subject: Re: How to merge small files

 Thank you for your reply.

 Your mean is I will execute below statement:

 statement.execute(set hive.merge.mapfiles=true);
 statement.execute(set hive.merge.mapredfiles=true);

 The two parementers are both true, right?

 2010/8/6 Namit Jain 
 nj...@facebook.commailto:nj...@facebook.commailto:nj...@facebook.commailto:nj...@facebook.com
  HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
  HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


 Set the above parameters to true before your query.



 
 From: lei liu 
 [liulei...@gmail.commailto:liulei...@gmail.commailto:liulei...@gmail.commailto:liulei...@gmail.com]
 Sent: Thursday, August 05, 2010 8:47 PM
 To: 
 hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.orgmailto:hive-user@hadoop.apache.org
 Subject: How to merge small files

 When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1 
 FROM from_statement, there are many files which size is zero are stored to 
 hadoop,

 How can I merge these small files?

 Thanks,



 LiuLei



 The information contained in this email message and its attachments is 
 intended only for the private and confidential use of the recipient(s) named 
 above, unless the sender expressly agrees otherwise. Transmission of email 
 over the Internet is not a secure communications medium. If you are 
 requesting or have requested the transmittal of personal data, as defined in 
 applicable privacy laws by means of email or in an attachment to email, you 
 must select a more secure alternate means of transmittal that supports your 
 obligations to protect such personal data. If the reader of this message is 
 not the intended recipient and/or you have received this email in error, you 
 must take no action based on the information in this email and you are 
 hereby notified that any dissemination, misuse or copying or disclosure of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please notify us immediately by email and delete the 
 original message.




Re: How to merge small files

2010-08-08 Thread lei liu
Thank you for your reply.

Your mean is I will execute below statement:

statement.execute(set hive.merge.mapfiles=true);
statement.execute(set hive.merge.mapredfiles=true);

The two parementers are both true, right?

2010/8/6 Namit Jain nj...@facebook.com

HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


 Set the above parameters to true before your query.



 
 From: lei liu [liulei...@gmail.com]
 Sent: Thursday, August 05, 2010 8:47 PM
 To: hive-user@hadoop.apache.org
 Subject: How to merge small files

 When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1
 FROM from_statement, there are many files which size is zero are stored to
 hadoop,

 How can I merge these small files?

 Thanks,



 LiuLei




RE: How to merge small files

2010-08-06 Thread Namit Jain
HIVEMERGEMAPFILES(hive.merge.mapfiles, true),
HIVEMERGEMAPREDFILES(hive.merge.mapredfiles, false),


Set the above parameters to true before your query.




From: lei liu [liulei...@gmail.com]
Sent: Thursday, August 05, 2010 8:47 PM
To: hive-user@hadoop.apache.org
Subject: How to merge small files

When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1 FROM 
from_statement, there are many files which size is zero are stored to hadoop,

How can I merge these small files?

Thanks,



LiuLei