Re: Loading a flat file + one additional field to a Hive table
Assuming that u can replicate this problem with say 100K records in the log file versus some records in Hive The way I would start the debug is Select * from my_hive_table | sort > hive.out.check1.sorted Your original log file is say log.original.txt Sort log.original.txt > log.original.txt.sorted Diff log.original.txt.sorted hive.out.check1.sorted See what u get…and analyze why some extra records came in My guess is = In your Hive Meta Store, possibly there is another PARTITION that could be pointing to a data location containing data from some previous logs…. sanjay From: Raj Hadoop mailto:hadoop...@yahoo.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>>, Raj Hadoop mailto:hadoop...@yahoo.com>> Date: Friday, July 5, 2013 3:27 PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: Re: Loading a flat file + one additional field to a Hive table Thanks Sanjay. I will look into this. Also - one more question. When I am trying to load log file to Hive and comparing the counts like this select count(*) from <> Versus wc -l <> I see a few hundred records greater in <>. How should I debug it? Any tips please. From: Sanjay Subramanian mailto:sanjay.subraman...@wizecommerce.com>> To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>>; Raj Hadoop mailto:hadoop...@yahoo.com>> Sent: Saturday, July 6, 2013 4:32 AM Subject: Re: Loading a flat file + one additional field to a Hive table How about this ? Assume you have a log file called oompaloompa.log TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put - /user/sasubramanian/oompaloopa.log.${TIMESTAMP} This will directly put the file on HDFS and u can put it to the LOCATION specified by your HIVE TABLE definition sanjay From: "manishbh...@rocketmail.com<mailto:manishbh...@rocketmail.com>" mailto:manishbh...@rocketmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Date: Friday, July 5, 2013 10:39 AM To: Raj Hadoop mailto:hadoop...@yahoo.com>>, Hive mailto:user@hive.apache.org>> Subject: Re: Loading a flat file + one additional field to a Hive table Raj, You should dump the data in a temp table first and then move the data into final table with select query. Select date(), c1,c2. From temp table. Reason: we should avoid custom operation in load unless it is necessary. Sent via Rocket from my HTC ----- Reply message - From: "Raj Hadoop" mailto:hadoop...@yahoo.com>> To: "Hive" mailto:user@hive.apache.org>> Subject: Loading a flat file + one additional field to a Hive table Date: Fri, Jul 5, 2013 10:30 PM Hi, Can any one please suggest the best way to do the following in Hive? Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive table T1 ( D1,C1,C2,C3,C4) Can the following command be modified in some way to acheive the above hive > load data local inpath '/software/home/hadoop/dat_files/' into table T1; My requirement is to append a date stamp to a Web log file and then load it to Hive table. Thanks, Raj CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Loading a flat file + one additional field to a Hive table
Thanks Sanjay. I will look into this. Also - one more question. When I am trying to load log file to Hive and comparing the counts like this select count(*) from <> Versus wc -l <> I see a few hundred records greater in <>. How should I debug it? Any tips please. From: Sanjay Subramanian To: "user@hive.apache.org" ; Raj Hadoop Sent: Saturday, July 6, 2013 4:32 AM Subject: Re: Loading a flat file + one additional field to a Hive table How about this ? Assume you have a log file called oompaloompa.log TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put - /user/sasubramanian/oompaloopa.log.${TIMESTAMP} This will directly put the file on HDFS and u can put it to the LOCATION specified by your HIVE TABLE definition sanjay From: "manishbh...@rocketmail.com" Reply-To: "user@hive.apache.org" Date: Friday, July 5, 2013 10:39 AM To: Raj Hadoop , Hive Subject: Re: Loading a flat file + one additional field to a Hive table Raj, You should dump the data in a temp table first and then move the data into final table with select query. Select date(), c1,c2. From temp table. Reason: we should avoid custom operation in load unless it is necessary. Sent via Rocket from my HTC - Reply message - From: "Raj Hadoop" To: "Hive" Subject: Loading a flat file + one additional field to a Hive table Date: Fri, Jul 5, 2013 10:30 PM Hi, Can any one please suggest the best way to do the following in Hive? Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive table T1 ( D1,C1,C2,C3,C4) Can the following command be modified in some way to acheive the above hive > load data local inpath '/software/home/hadoop/dat_files/' into table T1; My requirement is to append a date stamp to a Web log file and then load it to Hive table. Thanks, Raj CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Loading a flat file + one additional field to a Hive table
How about this ? Assume you have a log file called oompaloompa.log TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put - /user/sasubramanian/oompaloopa.log.${TIMESTAMP} This will directly put the file on HDFS and u can put it to the LOCATION specified by your HIVE TABLE definition sanjay From: "manishbh...@rocketmail.com<mailto:manishbh...@rocketmail.com>" mailto:manishbh...@rocketmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Date: Friday, July 5, 2013 10:39 AM To: Raj Hadoop mailto:hadoop...@yahoo.com>>, Hive mailto:user@hive.apache.org>> Subject: Re: Loading a flat file + one additional field to a Hive table Raj, You should dump the data in a temp table first and then move the data into final table with select query. Select date(), c1,c2. From temp table. Reason: we should avoid custom operation in load unless it is necessary. Sent via Rocket from my HTC - Reply message - From: "Raj Hadoop" mailto:hadoop...@yahoo.com>> To: "Hive" mailto:user@hive.apache.org>> Subject: Loading a flat file + one additional field to a Hive table Date: Fri, Jul 5, 2013 10:30 PM Hi, Can any one please suggest the best way to do the following in Hive? Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive table T1 ( D1,C1,C2,C3,C4) Can the following command be modified in some way to acheive the above hive > load data local inpath '/software/home/hadoop/dat_files/' into table T1; My requirement is to append a date stamp to a Web log file and then load it to Hive table. Thanks, Raj CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Loading a flat file + one additional field to a Hive table
Raj, You should dump the data in a temp table first and then move the data into final table with select query. Select date(), c1,c2. From temp table. Reason: we should avoid custom operation in load unless it is necessary. Sent via Rocket from my HTC - Reply message - From: "Raj Hadoop" To: "Hive" Subject: Loading a flat file + one additional field to a Hive table Date: Fri, Jul 5, 2013 10:30 PM Hi, Can any one please suggest the best way to do the following in Hive? Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive table T1 ( D1,C1,C2,C3,C4) Can the following command be modified in some way to acheive the above hive > load data local inpath '/software/home/hadoop/dat_files/' into table T1; My requirement is to append a date stamp to a Web log file and then load it to Hive table. Thanks, Raj
Loading a flat file + one additional field to a Hive table
Hi, Can any one please suggest the best way to do the following in Hive? Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive table T1 ( D1,C1,C2,C3,C4) Can the following command be modified in some way to acheive the above hive > load data local inpath '/software/home/hadoop/dat_files/' into table T1; My requirement is to append a date stamp to a Web log file and then load it to Hive table. Thanks, Raj