Re: Loading a flat file + one additional field to a Hive table

2013-07-06 Thread Sanjay Subramanian
Assuming that u can replicate this problem with say 100K records in the log 
file versus some records in Hive
The way I would start the debug is
Select * from my_hive_table | sort > hive.out.check1.sorted

Your original log file is say log.original.txt

Sort log.original.txt > log.original.txt.sorted


Diff log.original.txt.sorted  hive.out.check1.sorted

See what u get…and analyze why some extra records came in

My guess is
=
In your Hive Meta Store, possibly there is another PARTITION that could be 
pointing to a data location containing data from some previous logs….

sanjay

From: Raj Hadoop mailto:hadoop...@yahoo.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>, Raj Hadoop 
mailto:hadoop...@yahoo.com>>
Date: Friday, July 5, 2013 3:27 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: Loading a flat file + one additional field to a Hive table

Thanks Sanjay. I will look into this.

Also - one more question.

When I am trying to load log file to Hive and comparing the counts like this

select count(*) from <>

Versus

wc -l <>

I see a few hundred records greater in <>. How should I debug it? Any 
tips please.


From: Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>; Raj Hadoop 
mailto:hadoop...@yahoo.com>>
Sent: Saturday, July 6, 2013 4:32 AM
Subject: Re: Loading a flat file + one additional field to a Hive table

How about this ?

Assume you have a log file called
oompaloompa.log

TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log 
oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put - 
/user/sasubramanian/oompaloopa.log.${TIMESTAMP}

This will directly put the file on HDFS and u can put it to the LOCATION 
specified by your HIVE TABLE definition

sanjay


From: "manishbh...@rocketmail.com<mailto:manishbh...@rocketmail.com>" 
mailto:manishbh...@rocketmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Friday, July 5, 2013 10:39 AM
To: Raj Hadoop mailto:hadoop...@yahoo.com>>, Hive 
mailto:user@hive.apache.org>>
Subject: Re: Loading a flat file + one additional field to a Hive table

Raj,

You should dump the data in a temp table first and then move the data into 
final table with select query.
Select date(), c1,c2. From temp table.
Reason: we should avoid custom operation in load unless it is necessary.


Sent via Rocket from my HTC

----- Reply message -
From: "Raj Hadoop" mailto:hadoop...@yahoo.com>>
To: "Hive" mailto:user@hive.apache.org>>
Subject: Loading a flat file + one additional field to a Hive table
Date: Fri, Jul 5, 2013 10:30 PM


Hi,

Can any one please suggest the best way to do the following in Hive?

Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive 
table  T1 ( D1,C1,C2,C3,C4)

Can the following command be modified in some way to acheive the above
hive > load data local inpath '/software/home/hadoop/dat_files/' into table 
T1;

My requirement is to append a date stamp to a Web log file and then load it to 
Hive table.

Thanks,
Raj

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.



CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Loading a flat file + one additional field to a Hive table

2013-07-05 Thread Raj Hadoop
Thanks Sanjay. I will look into this.

Also - one more question.

When I am trying to load log file to Hive and comparing the counts like this

select count(*) from <>

Versus

wc -l <>

I see a few hundred records greater in <>. How should I debug it? Any 
tips please.



 From: Sanjay Subramanian 
To: "user@hive.apache.org" ; Raj Hadoop 
 
Sent: Saturday, July 6, 2013 4:32 AM
Subject: Re: Loading a flat file + one additional field to a Hive table
 


How about this ?

Assume you have a log file called 
oompaloompa.log

TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log 
oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put - 
/user/sasubramanian/oompaloopa.log.${TIMESTAMP}

This will directly put the file on HDFS and u can put it to the LOCATION 
specified by your HIVE TABLE definition

sanjay
 
From: "manishbh...@rocketmail.com" 
Reply-To: "user@hive.apache.org" 
Date: Friday, July 5, 2013 10:39 AM
To: Raj Hadoop , Hive 
Subject: Re: Loading a flat file + one additional field to a Hive table


Raj,

You should dump the data in a temp table first and then move the data into 
final table with select query.
Select date(), c1,c2. From temp table.
Reason: we should avoid custom operation in load unless it is necessary.


Sent via Rocket from my HTC 

- Reply message -
From: "Raj Hadoop" 
To: "Hive" 
Subject: Loading a flat file + one additional field to a Hive table
Date: Fri, Jul 5, 2013 10:30 PM


Hi,
 
Can any one please suggest the best way to do the following in Hive?
 
Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive 
table  T1 ( D1,C1,C2,C3,C4) 
 
Can the following command be modified in some way to acheive the above
hive > load data local inpath '/software/home/hadoop/dat_files/' into table 
T1; 
 
My requirement is to append a date stamp to a Web log file and then load it to 
Hive table.
 
Thanks,
Raj 

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient,
 please contact the sender by reply email and destroy all copies of the 
original message along with any attachments, from your computer system. If you 
are the intended recipient, please be advised that the content of this message 
is subject to access, review
 and disclosure by the sender's Email System Administrator.

Re: Loading a flat file + one additional field to a Hive table

2013-07-05 Thread Sanjay Subramanian
How about this ?

Assume you have a log file called
oompaloompa.log

TIMESTAMP=$(date +%Y_%m_%d_T%H_%M_%S);mv oompaloopa.log 
oompaloopa.log.${TIMESTAMP};cat oompaloopa.log.${TIMESTAMP}| hdfs dfs -put - 
/user/sasubramanian/oompaloopa.log.${TIMESTAMP}

This will directly put the file on HDFS and u can put it to the LOCATION 
specified by your HIVE TABLE definition

sanjay


From: "manishbh...@rocketmail.com<mailto:manishbh...@rocketmail.com>" 
mailto:manishbh...@rocketmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Friday, July 5, 2013 10:39 AM
To: Raj Hadoop mailto:hadoop...@yahoo.com>>, Hive 
mailto:user@hive.apache.org>>
Subject: Re: Loading a flat file + one additional field to a Hive table

Raj,

You should dump the data in a temp table first and then move the data into 
final table with select query.
Select date(), c1,c2. From temp table.
Reason: we should avoid custom operation in load unless it is necessary.


Sent via Rocket from my HTC

- Reply message -
From: "Raj Hadoop" mailto:hadoop...@yahoo.com>>
To: "Hive" mailto:user@hive.apache.org>>
Subject: Loading a flat file + one additional field to a Hive table
Date: Fri, Jul 5, 2013 10:30 PM


Hi,

Can any one please suggest the best way to do the following in Hive?

Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive 
table  T1 ( D1,C1,C2,C3,C4)

Can the following command be modified in some way to acheive the above
hive > load data local inpath '/software/home/hadoop/dat_files/' into table 
T1;

My requirement is to append a date stamp to a Web log file and then load it to 
Hive table.

Thanks,
Raj

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Loading a flat file + one additional field to a Hive table

2013-07-05 Thread manishbh...@rocketmail.com
Raj,

You should dump the data in a temp table first and then move the data into 
final table with select query.
Select date(), c1,c2. From temp table.
Reason: we should avoid custom operation in load unless it is necessary.


Sent via Rocket from my HTC 

- Reply message -
From: "Raj Hadoop" 
To: "Hive" 
Subject: Loading a flat file + one additional field to a Hive table
Date: Fri, Jul 5, 2013 10:30 PM


Hi,
 
Can any one please suggest the best way to do the following in Hive?
 
Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive 
table  T1 ( D1,C1,C2,C3,C4) 
 
Can the following command be modified in some way to acheive the above
hive > load data local inpath '/software/home/hadoop/dat_files/' into table 
T1; 
 
My requirement is to append a date stamp to a Web log file and then load it to 
Hive table.
 
Thanks,
Raj

Loading a flat file + one additional field to a Hive table

2013-07-05 Thread Raj Hadoop
Hi,
 
Can any one please suggest the best way to do the following in Hive?
 
Load 'todays date stamp' + << ALL FIELDS C1,C2,C3,C4 IN A FILE F1 >> to a Hive 
table  T1 ( D1,C1,C2,C3,C4) 
 
Can the following command be modified in some way to acheive the above
hive > load data local inpath '/software/home/hadoop/dat_files/' into table 
T1; 
 
My requirement is to append a date stamp to a Web log file and then load it to 
Hive table.
 
Thanks,
Raj