RE: NULL values and != operations

2014-02-09 Thread Dima Machlin
Look here : 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-RelationalOperators
If one the sides of != is NULL, the results is NULL (not true, but not false 
also)


From: Blaine Elliott [mailto:bla...@chegg.com]
Sent: Wednesday, February 05, 2014 8:50 PM
To: user@hive.apache.org
Subject: NULL values and != operations

I have come across a strange situation in hive and I want to know if there is 
an explanation.  The CASE operation below does not work when the operator is != 
but does work then the operator is =.  Maybe it is true that an = operation is 
valid if a value is NULL.  But an != operation is invalid if a value is NULL.  
That seems bizarre.  Is this a bug or can this be explained?

I am using Amazon EMR w/hadoop v1.0.3  hive v0.11.0

-- the following SQL results are expected such that the last column is 1 or 0
SELECT
user_name
  , val0
  , val1
  , CASE WHEN val0 = val1 THEN 1 ELSE 0 END
FROM
(
SELECT
user_name
  , MIN(STR_TO_MAP(kvp, , =)['val0']) AS val0
  , MIN(STR_TO_MAP(kvp, , =)['val1']) AS val1
FROM
stgdb.fact_webrequest
GROUP BY
user_name
) x;

user0 42.01   42.01   1
user1 NULL14.1301 0
user2 NULL15.03   0
user3 NULL43.01   0
user4 NULL40.05   0
user5 NULL13.1305 0
user6 51.0913 51.0913 1
user7 NULL11.0701 0
user8 NULL52.02   0

-- the following SQL results are strange such that the last column is always 0
SELECT
user_name
  , val0
  , val1
  , CASE WHEN val0 != val1 THEN 1 ELSE 0 END
FROM
(
SELECT
user_name
  , MIN(STR_TO_MAP(kvp, , =)['val0']) AS val0
  , MIN(STR_TO_MAP(kvp, , =)['val1']) AS val1
FROM
stgdb.fact_webrequest
GROUP BY
user_name
) x;

user0 42.01   42.01   0
user1 NULL14.1301 0
user2 NULL15.03   0
user3 NULL43.01   0
user4 NULL40.05   0
user5 NULL13.1305 0
user6 51.0913 51.0913 0
user7 NULL11.0701 0
user8 NULL52.02   0


Blaine Elliott
Chegg | Senior Data Engineer
* 805 637 4556 | * bla...@chegg.commailto:bla...@chegg.com





This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals  computer 
viruses.



Finding Hive and Hadoop version from command line

2014-02-09 Thread Raj Hadoop
All,

Is there any way from the command prompt I can find which hive version I am 
using and Hadoop version too?


Thanks in advance.

Regards,
Raj

Re: Finding Hive and Hadoop version from command line

2014-02-09 Thread Stephen Sprague
I like:

hive version:  $ schematool -info -dbType your metastore dbtype

hadoop version:  $ hadoop version


Lefty recently plugged schematool not long ago so props to her on that one.
Under the covers you'll see that it's a shortcut for hive --service
schemaTool which is the official way to run it. Either way will do it.


On Sun, Feb 9, 2014 at 8:32 AM, Raj Hadoop hadoop...@yahoo.com wrote:

 All,

 Is there any way from the command prompt I can find which hive version I
 am using and Hadoop version too?

 Thanks in advance.

 Regards,
 Raj



How to use hive api monitor hive job

2014-02-09 Thread yankunhad...@gmail.com
Hi all
   How to use Hive API, monitor running Hive jobs 




yankunhad...@gmail.com

Add few record(s) to a Hive table or a HDFS file on a daily basis

2014-02-09 Thread Raj Hadoop



Hi,

My requirement is a typical Datawarehouse and ETL requirement. I need to 
accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table 
or file is not a big table ( approximately 10 records per day). I don't want to 
Partition the table / file.


I am reading a few articles on this. It was being mentioned that we need to 
load to a staging table in Hive. And then insert like the below :

insertoverwrite tablefinaltable select*fromstaging;


I am not getting this logic. How should I populate the staging table daily.

Thanks,
Raj

Re: Add few record(s) to a Hive table or a HDFS file on a daily basis

2014-02-09 Thread pandees waran
Why not INSERT INTO for appending new records?

a)load the new records into a staging table
b)INSERT INTO final table from the staging table
On 10-Feb-2014 8:16 am, Raj Hadoop hadoop...@yahoo.com wrote:



 Hi,

 My requirement is a typical Datawarehouse and ETL requirement. I need to
 accomplish

 1) Daily Insert transaction records to a Hive table or a HDFS file. This
 table or file is not a big table ( approximately 10 records per day). I
 don't want to Partition the table / file.


 I am reading a few articles on this. It was being mentioned that we need
 to load to a staging table in Hive. And then insert like the below :

 insert overwrite table finaltable select * from staging;

 I am not getting this logic. How should I populate the staging table daily.

 Thanks,
 Raj