Hive Avro union data access

2014-05-29 Thread Valluri, Sathish
Hi,

 

I have an Hive table created with 3 different union data types for alias_host 
column name as shown. (array,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type":
 [{

  "type" : "array",

  "items" : "string"

},"string","null"]}]}

') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but 
when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 
1000;

Error: Error while processing statement: FAILED: SemanticException [Error 
10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of 
EQUAL  is expected to a primitive type, but union is found 
(state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri

 

 

 

 

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Control_A_Error

2014-05-29 Thread Rishabh Bhardwaj



Try putting the ^A character using vim editor via pressing ctrl+V then 1 then 
spacebar.

Regards,Rishabh. 


On , Rishabh Bhardwaj  wrote:
 



Try putting the ^A character using vim editor via pressing ctrl+V then 1 then 
spacebar.

Regards,
Rishabh.



On Friday, 30 May 2014 7:51 AM, Ashish Garg  wrote:
 


But, default is ^A, right? My question is why it is not taking? Is it an issue 
with hive version?
Thanks and Regards,
Ashish Garg



On Thu, May 29, 2014 at 9:11 PM, Siddharth Tiwari  
wrote:


>You can write a SerDe to handle the control character.
>
>**
>Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
>"Every duty is holy, and devotion to duty is the highest form of worship of 
>God.” 
>"Maybe other people will try to limit me but I don't limit myself"
>
>
>
>
>
>Date: Thu, 29 May 2014 19:26:56 -0400
>Subject: Control_A_Error
>From: gargcreation1...@gmail.com
>To: user@hive.apache.org
>
>
>
>Can you try to fix this problem?
>
>I have a local file called mytest.txt (restored in hdfs already). The content 
>is like this:
>$ cat -A  HDFSLOAD_DIR/mytest.txt
>49139801^A25752451^Aunknown$
>49139801^A24751754^Aunknown$
>49139801^A2161696^Anice$
> 
>To load this raw data above, I then defined the table like this in HQL:
> 
>create table my_test(
>userid  BIGINT,
>movieId BIGINT,
>comment STRING
>)
>ROW FORMAT DELIMITED
>FIELDS TERMINATED BY '\001'
>STORED AS TEXTFILE;
> 
>My problem is that when I “SELECT * FROM my_test;” , I got this:
> 
>NULL    NULL    NULL
>NULL    NULL    NULL
>NULL    NULL    NULL
> 
>I then replace “^A” with “^” in mytest.txt, and also re-defined my table 
>structure by using :
>FIELDS TERMINATED BY '^’
> 
>So when I select all, I got the correct results.
> 
>Any thoughts??? Thanks in advance.

Re: Control_A_Error

2014-05-29 Thread Rishabh Bhardwaj

Try putting the ^A character using vim editor via pressing ctrl+V then 1 then 
spacebar.

Regards,
Rishabh.



On Friday, 30 May 2014 7:51 AM, Ashish Garg  wrote:
 


But, default is ^A, right? My question is why it is not taking? Is it an issue 
with hive version?
Thanks and Regards,
Ashish Garg



On Thu, May 29, 2014 at 9:11 PM, Siddharth Tiwari  
wrote:


>You can write a SerDe to handle the control character.
>
>**
>Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
>"Every duty is holy, and devotion to duty is the highest form of worship of 
>God.” 
>"Maybe other people will try to limit me but I don't limit myself"
>
>
>
>
>
>Date: Thu, 29 May 2014 19:26:56 -0400
>Subject: Control_A_Error
>From: gargcreation1...@gmail.com
>To: user@hive.apache.org
>
>
>
>Can you try to fix this problem?
>
>I have a local file called mytest.txt (restored in hdfs already). The content 
>is like this:
>$ cat -A  HDFSLOAD_DIR/mytest.txt
>49139801^A25752451^Aunknown$
>49139801^A24751754^Aunknown$
>49139801^A2161696^Anice$
> 
>To load this raw data above, I then defined the table like this in HQL:
> 
>create table my_test(
>userid  BIGINT,
>movieId BIGINT,
>comment STRING
>)
>ROW FORMAT DELIMITED
>FIELDS TERMINATED BY '\001'
>STORED AS TEXTFILE;
> 
>My problem is that when I “SELECT * FROM my_test;” , I got this:
> 
>NULL    NULL    NULL
>NULL    NULL    NULL
>NULL    NULL    NULL
> 
>I then replace “^A” with “^” in mytest.txt, and also re-defined my table 
>structure by using :
>FIELDS TERMINATED BY '^’
> 
>So when I select all, I got the correct results.
> 
>Any thoughts??? Thanks in advance.

Re: Control_A_Error

2014-05-29 Thread Ashish Garg
But, default is ^A, right? My question is why it is not taking? Is it an
issue with hive version?
Thanks and Regards,
Ashish Garg


On Thu, May 29, 2014 at 9:11 PM, Siddharth Tiwari  wrote:

>
> You can write a SerDe to handle the control character.
>
> ****
> *Cheers !!!*
> *Siddharth Tiwari*
> Have a refreshing day !!!
> *"Every duty is holy, and devotion to duty is the highest form of worship
> of God.” *
> *"Maybe other people will try to limit me but I don't limit myself"*
>
>
> --
> Date: Thu, 29 May 2014 19:26:56 -0400
> Subject: Control_A_Error
> From: gargcreation1...@gmail.com
> To: user@hive.apache.org
>
>
> Can you try to fix this problem?
>
> I have a local file called mytest.txt (restored in hdfs already). The
> content is like this:
>
> $ cat -A  HDFSLOAD_DIR/mytest.txt
>
> 49139801^A25752451^Aunknown$
>
> 49139801^A24751754^Aunknown$
>
> 49139801^A2161696^Anice$
>
>
>
> To load this raw data above, I then defined the table like this in HQL:
>
>
>
> create table my_test(
>
> userid  BIGINT,
>
> movieId BIGINT,
>
> comment STRING
>
> )
>
> ROW FORMAT DELIMITED
>
> FIELDS TERMINATED BY '\001'
>
> STORED AS TEXTFILE;
>
>
>
> My problem is that when I “SELECT * FROM my_test;” , I got this:
>
>
>
> NULLNULLNULL
>
> NULLNULLNULL
>
> NULLNULLNULL
>
>
>
> I then replace “^A” with “^” in mytest.txt, and also re-defined my table
> structure by using :
>
> FIELDS TERMINATED BY '^’
>
>
>
> So when I select all, I got the correct results.
>
>
>
> Any thoughts??? Thanks in advance.
>


RE: Control_A_Error

2014-05-29 Thread Siddharth Tiwari
You can write a SerDe to handle the control character.

**

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of 
God.” 

"Maybe other people will try to limit me but I don't limit myself"


Date: Thu, 29 May 2014 19:26:56 -0400
Subject: Control_A_Error
From: gargcreation1...@gmail.com
To: user@hive.apache.org

Can you try to fix this problem?
I have a local file called mytest.txt (restored in hdfs already). The content 
is like this:
$ cat -A  HDFSLOAD_DIR/mytest.txt49139801^A25752451^Aunknown$
49139801^A24751754^Aunknown$49139801^A2161696^Anice$
 To load this raw data above, I then defined the table like this in HQL:
 create table my_test(
userid  BIGINT,
movieId BIGINT,comment STRING
)
ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\001'
STORED AS TEXTFILE;
 My problem is that when I “SELECT * FROM my_test;” , I got this:
 NULLNULLNULL
NULLNULLNULL
NULLNULLNULL 
I then replace “^A” with “^” in mytest.txt, and also re-defined my table 
structure by using :FIELDS TERMINATED BY '^’
 
So when I select all, I got the correct results.
 Any thoughts??? Thanks in advance.
  

Help with query

2014-05-29 Thread Software Dev
We have a table with user entered queries, their IP. How could we
write a query that will count and order queries by their count having
a unique IP count > X. For example if we had the same IP enter the
same query Y times we wouldnlt want to include this in the final
result unless there have been X-Y other IP's that searched for that
query.

Is this perhaps better suited fro Pig?

Thanks


Re: problem with delimiters (control A)

2014-05-29 Thread Sanjay Subramanian
Hi Jack 

Since u already have your data with columns separated by CtrlA then u need to 
define the HIVE table as follows 
(by default Hive will assume CtrlA as column delimiter)

create table if not exists my_test(
    userid  BIGINT,
    movieId BIGINT,
    comment STRING
);


regards

sanjay


From: Petter von Dolwitz (Hem) 
To: "user@hive.apache.org"  
Sent: Thursday, May 22, 2014 8:00 AM
Subject: Re: problem with delimiters (control A)
 


Hi Jack,

could it be that your source file does not use \001 as delimiter but the 
character sequence "^A"? I get this feeling when looking at your cat printout 
and also when you say you can do a search and replace. If I do a print out of a 
file with the same delimiter I get the following

2014-04-097846478510
2014-04-107851178558

These are actually three columns but you cannot see the ^A because it is not a 
printable character. Check your file in a hex editor or something to verify 
that you have the correct delimiter.

Br,
Petter













2014-05-18 6:40 GMT+02:00 Jack Yang :

Hi All, 
>I have a local file called mytest.txt (restored in hdfs already). The content 
>is like this:
>$ cat -A  HDFSLOAD_DIR/mytest.txt
>49139801^A25752451^Aunknown$
>49139801^A24751754^Aunknown$
>49139801^A2161696^Anice$
> 
>To load this raw data above, I then defined the table like this in HQL:
> 
>create table my_test(
>userid  BIGINT,
>movieId BIGINT,
>comment STRING
>) 
>ROW FORMAT DELIMITED 
>FIELDS TERMINATED BY '\001'
>STORED AS TEXTFILE;
> 
>My problem is that when I “SELECT * FROM my_test;” , I got this:
> 
>NULL    NULL    NULL
>NULL    NULL    NULL
>NULL    NULL    NULL
> 
>I then replace “^A” with “^” in mytest.txt, and also re-defined my table 
>structure by using :
>FIELDS TERMINATED BY '^’
> 
>So when I select all, I got the correct results.
> 
>Any thoughts??? Thanks in advance.
> 
> 
>Best regards,
>Jack 
> 
> 

Help with query

2014-05-29 Thread Software Dev
We have a table with user entered queries, their IP. How could we
write a query that will count and order queries by their count having
a unique IP count > X. For example if we had the same IP enter the
same query Y times we wouldnlt want to include this in the final
result unless there have been X-Y other IP's that searched for that
query.

Is this perhaps better suited fro Pig?

Thanks


Control_A_Error

2014-05-29 Thread Ashish Garg
Can you try to fix this problem?

I have a local file called mytest.txt (restored in hdfs already). The
content is like this:

$ cat -A  HDFSLOAD_DIR/mytest.txt

49139801^A25752451^Aunknown$

49139801^A24751754^Aunknown$

49139801^A2161696^Anice$



To load this raw data above, I then defined the table like this in HQL:



create table my_test(

userid  BIGINT,

movieId BIGINT,

comment STRING

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

STORED AS TEXTFILE;



My problem is that when I “SELECT * FROM my_test;” , I got this:



NULLNULLNULL

NULLNULLNULL

NULLNULLNULL



I then replace “^A” with “^” in mytest.txt, and also re-defined my table
structure by using :

FIELDS TERMINATED BY '^’



So when I select all, I got the correct results.



Any thoughts??? Thanks in advance.


Hive unit tests cannot load Mysql driver

2014-05-29 Thread Pavel Chadnov
Hello,

I'm trying to run Hive unit tests on my local machine. At first Hive could
not load Mysql driver (some tests for ql are failing). I put the mysql
connector jar to the conf dir. It didn't help (tests failing) but when I
choose some of the tests and run it individually (just one test class), it
passes.
So the question is: how can I make the unit tests not to fail while trying
to load mysql driver?

-- 
Regards,
Pavel Chadnov


RE: ObjectInspectors

2014-05-29 Thread Doug Christie
I need to access the actual values sent in on the HIVEQL line for the argument 
list for a HIVE UDTF. Is there any way to access the those values via the 
ObjectInspector Array passed in to the initialize function? Thank you.



Doug



Fwd: Hive unit testing and MySQL driver

2014-05-29 Thread Pavel Chadnov
Hello,

I'm trying to make sure that unit tests are passed on my local machine for
Hive 0.13 but for ql module I get a java.lang.ClassNotFoundException:
com.mysql.jdbc.Driver.
I'm running the tests using line: "mvn test -Phadoop-2"
How can I have the tests passed?
Thank you
-- 
Regards,
Pavel Chadnov


Hive unit testing and MySQL driver

2014-05-29 Thread Pavel Chadnov
Hello,

I'm trying to make sure that unit tests are passed on my local machine for
Hive 0.13 but for ql module I get a java.lang.ClassNotFoundException:
com.mysql.jdbc.Driver.
I'm running the tests using line: "mvn test -Phadoop-2"
How can I have the tests passed?
Thank you

-- 
Regards,
Pavel Chadnov


Hive Avro union data access

2014-05-29 Thread Valluri, Sathish
 

Hi,

 

I have an Hive table created with 3 different union data types for alias_host 
column name as shown. (array,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type":
 [{

  "type" : "array",

  "items" : "string"

},"string","null"]}]}

') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but 
when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 
1000;

Error: Error while processing statement: FAILED: SemanticException [Error 
10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of 
EQUAL  is expected to a primitive type, but union is found 
(state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri

 

 

 

 

 



smime.p7s
Description: S/MIME cryptographic signature


Hive Avro union data access

2014-05-29 Thread Valluri, Sathish
Hi,

 

I have an Hive table created with 3 different union data types for alias_host 
column name as shown. (array,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type":
 [{

  "type" : "array",

  "items" : "string"

},"string","null"]}]}

') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but 
when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 
1000;

Error: Error while processing statement: FAILED: SemanticException [Error 
10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of 
EQUAL  is expected to a primitive type, but union is found 
(state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri

 

 

 

 

 



smime.p7s
Description: S/MIME cryptographic signature


Hive Avro union data access

2014-05-29 Thread Valluri, Sathish
Hi,



I have an Hive table created with 3 different union data types for alias_host 
column name as shown. (array,string, null).



CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type":
 [{

  "type" : "array",

  "items" : "string"

},"string","null"]}]}

') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/arrayTests';



How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but 
when once I set the datatype as union the following queries are failing.



Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 
1000;

Error: Error while processing statement: FAILED: SemanticException [Error 
10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of 
EQUAL  is expected to a primitive type, but union is found 
(state=42000,code=10016)



Can anyone suggest how to access and query the contents of union data types.



Regards

Sathish Valluri













Re: Hive can I contribute to Hive confluence wiki documents?

2014-05-29 Thread Lefty Leverenz
Thank you for pointing this out.  The 'hive.cache.expr.evaluation'
parameter was added by
HIVE-4209 and
it needs to be included in the wiki.  Its description is in
hive-default.xml.template:

If true, evaluation result of deterministic expression referenced twice or
> more will be cached. For example, in filter condition like ".. where key +
> 10 > 10 or key + 10 = 0" "key + 10" will be evaluated/cached once and
> reused for following expression ("key + 10 = 0"). Currently, this is
> applied only to expressions in select or filter operator.


Do you want to provide more information than that?  To contribute to the
wiki, create a Confluence
accountand send a
message to this list asking for wiki editing privilege and
giving your account name.  But if you just want the description to be added
to the wiki, I can take care of that.

If you have additional information, it might not belong in Configuration
Properties
(although
once the parameter is there it can link to another page for details).  The
Selectpage
is one possibility, or maybe you'll find a better place for it.

If you want to revise the description in hive-default.xml.template, you
need to open a JIRA and patch the file.


-- Lefty


On Sat, May 24, 2014 at 1:01 AM, 郭士伟  wrote:

> I have used a configuration property 'hive.cache.expr.evaluation' in hive
> 0.12 recently and found it useful. But I cannot find any documents about it
> on the wiki page:
> https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties.
> Indeed, I pick it up from the source code.
>
> I think it maybe useful for other hive users if I can write some documents
> about the property.
>
> So, can anyone give me some pointers that how can I make the change happen
> ?
>
> Thanks, anyway.
>