Assuming that u can replicate this problem with say 100K records in the log
file versus some records in Hive
The way I would start the debug is
Select * from my_hive_table | sort hive.out.check1.sorted
Your original log file is say log.original.txt
Sort log.original.txt log.original.txt.sorted
Make sure separators in both files hive.out.check1.sorted and
log.original.txt.sorted are same
Diff log.original.txt.sorted hive.out.check1.sorted
See what u get…and analyze why some extra records came in
My guess is
=
In your Hive Meta Store, possibly there is another PARTITION that could be
pointing to a data location containing data from some previous logs….
sanjay
From: Raj Hadoop hadoop...@yahoo.commailto:hadoop...@yahoo.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org, Raj Hadoop
hadoop...@yahoo.commailto:hadoop...@yahoo.com
Date: Friday, July 5, 2013 3:27 PM
To: user@hive.apache.orgmailto:user@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Loading a flat file + one additional field to a Hive table
Thanks Sanjay. I will look into this.
Also - one more question.
When I am trying to load log file to Hive and comparing the counts like this
select count(*) from Table
Versus
wc -l File
I see a few hundred records greater in Table. How should I debug it? Any
tips please.
From: Sanjay Subramanian
sanjay.subraman...@wizecommerce.commailto:sanjay.subraman...@wizecommerce.com
To: user@hive.apache.orgmailto:user@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org; Raj Hadoop
hadoop...@yahoo.commailto:hadoop...@yahoo.com
Sent: Saturday, July 6, 2013 4:32 AM
Subject: Re: Loading a flat file + one additional field to a Hive table
How about this ?
Assume you have a log file called
oompaloompa.log
TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log
oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put -
/user/sasubramanian/oompaloopa.log.${TIMESTAMP}
This will directly put the file on HDFS and u can put it to the LOCATION
specified by your HIVE TABLE definition
sanjay
From: manishbh...@rocketmail.commailto:manishbh...@rocketmail.com
manishbh...@rocketmail.commailto:manishbh...@rocketmail.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org
Date: Friday, July 5, 2013 10:39 AM
To: Raj Hadoop hadoop...@yahoo.commailto:hadoop...@yahoo.com, Hive
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Re: Loading a flat file + one additional field to a Hive table
Raj,
You should dump the data in a temp table first and then move the data into
final table with select query.
Select date(), c1,c2. From temp table.
Reason: we should avoid custom operation in load unless it is necessary.
Sent via Rocket from my HTC
- Reply message -
From: Raj Hadoop hadoop...@yahoo.commailto:hadoop...@yahoo.com
To: Hive user@hive.apache.orgmailto:user@hive.apache.org
Subject: Loading a flat file + one additional field to a Hive table
Date: Fri, Jul 5, 2013 10:30 PM
Hi,
Can any one please suggest the best way to do the following in Hive?
Load 'todays date stamp' + ALL FIELDS C1,C2,C3,C4 IN A FILE F1 to a Hive
table T1 ( D1,C1,C2,C3,C4)
Can the following command be modified in some way to acheive the above
hive load data local inpath '/software/home/hadoop/dat_files/' into table
T1;
My requirement is to append a date stamp to a Web log file and then load it to
Hive table.
Thanks,
Raj
CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the
intended recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited. If you
are not the intended recipient, please contact the sender by reply email and
destroy all copies of the original message along with any attachments, from
your computer system. If you are the intended recipient, please be advised that
the content of this message is subject to access, review and disclosure by the
sender's Email System Administrator.
CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the
intended recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited. If you
are not the intended recipient, please contact the sender by reply email and
destroy all copies of the original message along with any attachments, from
your computer system. If you are the intended recipient, please be advised that
the content of this message is subject to access, review and disclosure by the
sender's Email System Administrator.