Re: multiple users for hive access

2015-07-07 Thread Jeff Zhang
Have you tried to start hive cli using these 2 users ? What issue did you
see ?

On Wed, Jul 8, 2015 at 11:50 AM, Jack Yang  wrote:

> Thanks, mate. I have mysql run as my metadata store.
>
>
>
> What is the next step?  when I start hive (0.13 version), I just type in
> hive in my command line.
>
>
>
> Now, that is say I have two users: A and B. I would like A and B access
> hive tables using hive-cli.
>
>
>
> How can I do that?
>
>
>
>
>
> *From:* Jeff Zhang [mailto:zjf...@gmail.com]
> *Sent:* Tuesday, 7 July 2015 4:22 PM
> *To:* user@hive.apache.org
> *Subject:* Re: multiple users for hive access
>
>
>
> Hive support multiple user scenario as long as the metadata store support
> multiple user access. By default hive use derby embedded mode which don't
> support multiple user access. You can configure it to server mode or use
> other metadata store like mysql etc.  Here's the tutorial for how to
> configure derby server mode
>
>
>
> https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode
>
>
>
>
>
>
>
> On Tue, Jul 7, 2015 at 1:50 PM, Jack Yang  wrote:
>
> Hi all,
>
>
>
> I would like to have multiple users to access hive.
>
> Does anyone try that before?
>
> Is there any tutorial or link I can study from?
>
>
>
> Best regards,
>
> Jack
>
>
>
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang


RE: multiple users for hive access

2015-07-07 Thread Jack Yang
Thanks, mate. I have mysql run as my metadata store.

What is the next step?  when I start hive (0.13 version), I just type in hive 
in my command line.

Now, that is say I have two users: A and B. I would like A and B access hive 
tables using hive-cli.

How can I do that?


From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Tuesday, 7 July 2015 4:22 PM
To: user@hive.apache.org
Subject: Re: multiple users for hive access

Hive support multiple user scenario as long as the metadata store support 
multiple user access. By default hive use derby embedded mode which don't 
support multiple user access. You can configure it to server mode or use other 
metadata store like mysql etc.  Here's the tutorial for how to configure derby 
server mode

https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode



On Tue, Jul 7, 2015 at 1:50 PM, Jack Yang 
mailto:j...@uow.edu.au>> wrote:
Hi all,

I would like to have multiple users to access hive.
Does anyone try that before?
Is there any tutorial or link I can study from?

Best regards,
Jack




--
Best Regards

Jeff Zhang


Re: Hive Tez support matrix

2015-07-07 Thread Jim Green
Thanks Vikram. That looks great.


On Tue, Jul 7, 2015 at 4:27 PM, Vikram Dixit  wrote:

>  Hi Jim,
>
>  I just created a page with the matrix of supported releases.
>
>  https://cwiki.apache.org/confluence/display/Hive/Hive-tez+compatibility
>
>  Although pom is a source of truth, we also work with versions of tez
> where there have been no API changes (compared to the version in the pom).
>
>  Yes, 1.2 release of hive is the latest and greatest. Although, the
> version of tez there is 0.5.3 it does work with tez 0.7.0.
>
>  -Vikram.
>
>   From: Bikas Saha 
> Reply-To: "u...@tez.apache.org" 
> Date: Tuesday, July 7, 2015 at 3:12 PM
> To: "u...@tez.apache.org" , "user@hive.apache.org" <
> user@hive.apache.org>
> Subject: RE: Hive Tez support matrix
>
>   I don’t think hive has documentation for that. The source of truth is
> probably the release pom.xml J
>
>
>
> Bikas
>
>
>
> *From:* Jim Green [mailto:openkbi...@gmail.com ]
> *Sent:* Tuesday, July 07, 2015 2:58 PM
> *To:* user@hive.apache.org
> *Cc:* u...@tez.apache.org
> *Subject:* Re: Hive Tez support matrix
>
>
>
> Do you know where is the hive documentation about it? Or do you mean it
> will be added?
>
>
>
> I saw many issues are in Hive code, and fixed in Hive 1.2 version.
>
> BTW, which combination of hive/tez is the most stable one?
>
> My assumption is Hive 1.2+Tez 0.7. Am I right?
>
>
>
> On Tue, Jul 7, 2015 at 1:17 PM, Bikas Saha  wrote:
>
>  That would be in the hive documentation because it’s the dependent
> project and determines its compatibility with downstream projects like Tez.
>
>
>
> *From:* Jim Green [mailto:openkbi...@gmail.com]
> *Sent:* Tuesday, July 07, 2015 10:38 AM
> *To:* u...@tez.apache.org
> *Cc:* user@hive.apache.org
> *Subject:* Re: Hive Tez support matrix
>
>
>
> Thanks Hitesh.
>
>
>
> Should we put a support matrix on Documentation?Or maybe I missed it if it
> is already there?
>
>
>
>
>
> On Tue, Jul 7, 2015 at 10:34 AM, Hitesh Shah  wrote:
>
> From a Tez perspective, there was a major compatibility change between Tez
> 0.4 and Tez 0.5. However, Tez-0.7.x and Tez-0.6.x are compatible with
> Tez-0.5.x.
>
> I believe Hive 0.13 is compatible only with Tez 0.4.
> For Hive 0.14 onwards ( including the Hive-1.x. releases ), they should
> work with anything in the range of Tez versions: 0.5.2 <= x <= 0.7.x .
>
> thanks
> — Hitesh
>
>
> On Jul 7, 2015, at 10:12 AM, Jim Green  wrote:
>
> > Hi Team,
> >
> > Is there any Hive <-> Tez support matrix?
> > For example, Hive 1.2 should be on Tez which version?
> > Tez 0.5.3 only supports which versions of Hive?
> > etc…
> >
> > My understanding is that it does not matter which version of Hive and
> which version of Tez.
> >
> > --
> > Thanks,
> > www.openkb.info
> > (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>
>
>
>
>
> --
>
> Thanks,
>
> www.openkb.info
>
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>
>
>
>
>
> --
>
> Thanks,
>
> www.openkb.info
>
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>



-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


Re: Hive Tez support matrix

2015-07-07 Thread Vikram Dixit
Hi Jim,

I just created a page with the matrix of supported releases.

https://cwiki.apache.org/confluence/display/Hive/Hive-tez+compatibility

Although pom is a source of truth, we also work with versions of tez where 
there have been no API changes (compared to the version in the pom).

Yes, 1.2 release of hive is the latest and greatest. Although, the version of 
tez there is 0.5.3 it does work with tez 0.7.0.

-Vikram.

From: Bikas Saha mailto:bi...@hortonworks.com>>
Reply-To: "u...@tez.apache.org" 
mailto:u...@tez.apache.org>>
Date: Tuesday, July 7, 2015 at 3:12 PM
To: "u...@tez.apache.org" 
mailto:u...@tez.apache.org>>, 
"user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: RE: Hive Tez support matrix

I don't think hive has documentation for that. The source of truth is probably 
the release pom.xml :)

Bikas

From: Jim Green [mailto:openkbi...@gmail.com]
Sent: Tuesday, July 07, 2015 2:58 PM
To: user@hive.apache.org
Cc: u...@tez.apache.org
Subject: Re: Hive Tez support matrix

Do you know where is the hive documentation about it? Or do you mean it will be 
added?

I saw many issues are in Hive code, and fixed in Hive 1.2 version.
BTW, which combination of hive/tez is the most stable one?
My assumption is Hive 1.2+Tez 0.7. Am I right?

On Tue, Jul 7, 2015 at 1:17 PM, Bikas Saha 
mailto:bi...@hortonworks.com>> wrote:
That would be in the hive documentation because it's the dependent project and 
determines its compatibility with downstream projects like Tez.

From: Jim Green [mailto:openkbi...@gmail.com]
Sent: Tuesday, July 07, 2015 10:38 AM
To: u...@tez.apache.org
Cc: user@hive.apache.org
Subject: Re: Hive Tez support matrix

Thanks Hitesh.

Should we put a support matrix on Documentation?Or maybe I missed it if it is 
already there?


On Tue, Jul 7, 2015 at 10:34 AM, Hitesh Shah 
mailto:hit...@apache.org>> wrote:
>From a Tez perspective, there was a major compatibility change between Tez 0.4 
>and Tez 0.5. However, Tez-0.7.x and Tez-0.6.x are compatible with Tez-0.5.x.

I believe Hive 0.13 is compatible only with Tez 0.4.
For Hive 0.14 onwards ( including the Hive-1.x. releases ), they should work 
with anything in the range of Tez versions: 0.5.2 <= x <= 0.7.x .

thanks
- Hitesh

On Jul 7, 2015, at 10:12 AM, Jim Green 
mailto:openkbi...@gmail.com>> wrote:

> Hi Team,
>
> Is there any Hive <-> Tez support matrix?
> For example, Hive 1.2 should be on Tez which version?
> Tez 0.5.3 only supports which versions of Hive?
> etc...
>
> My understanding is that it does not matter which version of Hive and which 
> version of Tez.
>
> --
> Thanks,
> www.openkb.info
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)



--
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)



--
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


RE: Limiting outer join

2015-07-07 Thread Bennie Leo
It went from about 60 mins to 3 mins. Hive was traversing the whole table 
multiple times, which is obviously inefficient!
 
> Date: Tue, 7 Jul 2015 15:55:19 -0700
> Subject: Re: Limiting outer join
> From: gop...@apache.org
> To: user@hive.apache.org
> 
> 
> > Never mind, I got it working with UDF. I just pass the file location to
> >my evaluate function. Thanks! :)
> 
> Nice. Would be very interested in looking at performance of such a UDF, if
> you have numbers before/after.
> 
> I suspect it will be a magnitude or more faster than the BETWEEN/JOIN
> clauses.
> 
> Cheers,
> Gopal
> 
> 
  

Re: Limiting outer join

2015-07-07 Thread Gopal Vijayaraghavan

> Never mind, I got it working with UDF. I just pass the file location to
>my evaluate function. Thanks! :)

Nice. Would be very interested in looking at performance of such a UDF, if
you have numbers before/after.

I suspect it will be a magnitude or more faster than the BETWEEN/JOIN
clauses.

Cheers,
Gopal




RE: Hive Tez support matrix

2015-07-07 Thread Bikas Saha
I don’t think hive has documentation for that. The source of truth is probably 
the release pom.xml ☺

Bikas

From: Jim Green [mailto:openkbi...@gmail.com]
Sent: Tuesday, July 07, 2015 2:58 PM
To: user@hive.apache.org
Cc: u...@tez.apache.org
Subject: Re: Hive Tez support matrix

Do you know where is the hive documentation about it? Or do you mean it will be 
added?

I saw many issues are in Hive code, and fixed in Hive 1.2 version.
BTW, which combination of hive/tez is the most stable one?
My assumption is Hive 1.2+Tez 0.7. Am I right?

On Tue, Jul 7, 2015 at 1:17 PM, Bikas Saha 
mailto:bi...@hortonworks.com>> wrote:
That would be in the hive documentation because it’s the dependent project and 
determines its compatibility with downstream projects like Tez.

From: Jim Green [mailto:openkbi...@gmail.com]
Sent: Tuesday, July 07, 2015 10:38 AM
To: u...@tez.apache.org
Cc: user@hive.apache.org
Subject: Re: Hive Tez support matrix

Thanks Hitesh.

Should we put a support matrix on Documentation?Or maybe I missed it if it is 
already there?


On Tue, Jul 7, 2015 at 10:34 AM, Hitesh Shah 
mailto:hit...@apache.org>> wrote:
From a Tez perspective, there was a major compatibility change between Tez 0.4 
and Tez 0.5. However, Tez-0.7.x and Tez-0.6.x are compatible with Tez-0.5.x.

I believe Hive 0.13 is compatible only with Tez 0.4.
For Hive 0.14 onwards ( including the Hive-1.x. releases ), they should work 
with anything in the range of Tez versions: 0.5.2 <= x <= 0.7.x .

thanks
— Hitesh

On Jul 7, 2015, at 10:12 AM, Jim Green 
mailto:openkbi...@gmail.com>> wrote:

> Hi Team,
>
> Is there any Hive <-> Tez support matrix?
> For example, Hive 1.2 should be on Tez which version?
> Tez 0.5.3 only supports which versions of Hive?
> etc…
>
> My understanding is that it does not matter which version of Hive and which 
> version of Tez.
>
> --
> Thanks,
> www.openkb.info
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)



--
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)



--
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


Re: Hive Tez support matrix

2015-07-07 Thread Jim Green
Do you know where is the hive documentation about it? Or do you mean it
will be added?

I saw many issues are in Hive code, and fixed in Hive 1.2 version.
BTW, which combination of hive/tez is the most stable one?
My assumption is Hive 1.2+Tez 0.7. Am I right?

On Tue, Jul 7, 2015 at 1:17 PM, Bikas Saha  wrote:

>  That would be in the hive documentation because it’s the dependent
> project and determines its compatibility with downstream projects like Tez.
>
>
>
> *From:* Jim Green [mailto:openkbi...@gmail.com]
> *Sent:* Tuesday, July 07, 2015 10:38 AM
> *To:* u...@tez.apache.org
> *Cc:* user@hive.apache.org
> *Subject:* Re: Hive Tez support matrix
>
>
>
> Thanks Hitesh.
>
>
>
> Should we put a support matrix on Documentation?Or maybe I missed it if it
> is already there?
>
>
>
>
>
> On Tue, Jul 7, 2015 at 10:34 AM, Hitesh Shah  wrote:
>
> From a Tez perspective, there was a major compatibility change between Tez
> 0.4 and Tez 0.5. However, Tez-0.7.x and Tez-0.6.x are compatible with
> Tez-0.5.x.
>
> I believe Hive 0.13 is compatible only with Tez 0.4.
> For Hive 0.14 onwards ( including the Hive-1.x. releases ), they should
> work with anything in the range of Tez versions: 0.5.2 <= x <= 0.7.x .
>
> thanks
> — Hitesh
>
>
> On Jul 7, 2015, at 10:12 AM, Jim Green  wrote:
>
> > Hi Team,
> >
> > Is there any Hive <-> Tez support matrix?
> > For example, Hive 1.2 should be on Tez which version?
> > Tez 0.5.3 only supports which versions of Hive?
> > etc…
> >
> > My understanding is that it does not matter which version of Hive and
> which version of Tez.
> >
> > --
> > Thanks,
> > www.openkb.info
> > (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>
>
>
>
>
> --
>
> Thanks,
>
> www.openkb.info
>
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>



-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


RE: Limiting outer join

2015-07-07 Thread Bennie Leo
Never mind, I got it working with UDF. I just pass the file location to my 
evaluate function. Thanks! :)
 
From: tben...@hotmail.com
To: user@hive.apache.org
Subject: RE: Limiting outer join
Date: Tue, 7 Jul 2015 09:59:22 -0700




Thanks for your replies.
 
I see how extracting the first country would work, however I was hoping to 
speed up my query by stopping the search once a country has been found.
 
Are you suggesting that I pass the whole IP table to a UDF and perform the 
search myself? I've only programmed simple UDFs so far (ex: reformat a string), 
so any additional details would be appreciated. I am mostly concerned about 
importing said table (currently stored in Hive) and distributing the task 
across nodes (note that I use Tez).
 
Regards,
B
 
> Date: Mon, 6 Jul 2015 18:18:44 -0700
> Subject: Re: Limiting outer join
> From: gop...@apache.org
> To: user@hive.apache.org
> 
> 
> > In the following query, it is possible to limit the amount of entries
> >returned by an outer join to a single value? I want to obtain a single
> >country from ipv4geotable for each entry in logontable.
> 
> Yes, the PTF DENSE_RANK()/ROW_NUMBER() basically gives you that - you can
> read the first row out of each logon.IP except, there¹s no way to force
> which country wins over the other without an order by country in the
> OVER() clause as well.
> 
> That said, it will only get slower to produce 1 row per group, because of
> the distributed nature of the SQL engine, the reduction of data happens
> after a ordering shuffle.
> 
> You¹re doing range joins in a SQL engine without theta joins and MapReduce
> had no way to implement those at runtime (Tez has, with EdgeManager
> plugins).
> 
> The easiest/traditional approach out of doing geo-IP lookups is a compact
> UDF model without any joins at all.
> 
> There¹s some old threads on discussing this as a built-in & some code
> (with potential licensing issues) -
> http://markmail.org/message/w54j4upwg2wbh3xg
> 
> Cheers,
> Gopal
> 
> 

  

RE: Hive Tez support matrix

2015-07-07 Thread Bikas Saha
That would be in the hive documentation because it’s the dependent project and 
determines its compatibility with downstream projects like Tez.

From: Jim Green [mailto:openkbi...@gmail.com]
Sent: Tuesday, July 07, 2015 10:38 AM
To: u...@tez.apache.org
Cc: user@hive.apache.org
Subject: Re: Hive Tez support matrix

Thanks Hitesh.

Should we put a support matrix on Documentation?Or maybe I missed it if it is 
already there?


On Tue, Jul 7, 2015 at 10:34 AM, Hitesh Shah 
mailto:hit...@apache.org>> wrote:
From a Tez perspective, there was a major compatibility change between Tez 0.4 
and Tez 0.5. However, Tez-0.7.x and Tez-0.6.x are compatible with Tez-0.5.x.

I believe Hive 0.13 is compatible only with Tez 0.4.
For Hive 0.14 onwards ( including the Hive-1.x. releases ), they should work 
with anything in the range of Tez versions: 0.5.2 <= x <= 0.7.x .

thanks
— Hitesh

On Jul 7, 2015, at 10:12 AM, Jim Green 
mailto:openkbi...@gmail.com>> wrote:

> Hi Team,
>
> Is there any Hive <-> Tez support matrix?
> For example, Hive 1.2 should be on Tez which version?
> Tez 0.5.3 only supports which versions of Hive?
> etc…
>
> My understanding is that it does not matter which version of Hive and which 
> version of Tez.
>
> --
> Thanks,
> www.openkb.info
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)



--
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


Re: Hive Tez support matrix

2015-07-07 Thread Jim Green
Thanks Hitesh.

Should we put a support matrix on Documentation?Or maybe I missed it if it
is already there?


On Tue, Jul 7, 2015 at 10:34 AM, Hitesh Shah  wrote:

> From a Tez perspective, there was a major compatibility change between Tez
> 0.4 and Tez 0.5. However, Tez-0.7.x and Tez-0.6.x are compatible with
> Tez-0.5.x.
>
> I believe Hive 0.13 is compatible only with Tez 0.4.
> For Hive 0.14 onwards ( including the Hive-1.x. releases ), they should
> work with anything in the range of Tez versions: 0.5.2 <= x <= 0.7.x .
>
> thanks
> — Hitesh
>
> On Jul 7, 2015, at 10:12 AM, Jim Green  wrote:
>
> > Hi Team,
> >
> > Is there any Hive <-> Tez support matrix?
> > For example, Hive 1.2 should be on Tez which version?
> > Tez 0.5.3 only supports which versions of Hive?
> > etc…
> >
> > My understanding is that it does not matter which version of Hive and
> which version of Tez.
> >
> > --
> > Thanks,
> > www.openkb.info
> > (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>
>


-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


Re: Hive Tez support matrix

2015-07-07 Thread Hitesh Shah
From a Tez perspective, there was a major compatibility change between Tez 0.4 
and Tez 0.5. However, Tez-0.7.x and Tez-0.6.x are compatible with Tez-0.5.x. 

I believe Hive 0.13 is compatible only with Tez 0.4. 
For Hive 0.14 onwards ( including the Hive-1.x. releases ), they should work 
with anything in the range of Tez versions: 0.5.2 <= x <= 0.7.x .  

thanks
— Hitesh

On Jul 7, 2015, at 10:12 AM, Jim Green  wrote:

> Hi Team,
> 
> Is there any Hive <-> Tez support matrix?
> For example, Hive 1.2 should be on Tez which version?
> Tez 0.5.3 only supports which versions of Hive?
> etc…
> 
> My understanding is that it does not matter which version of Hive and which 
> version of Tez.
> 
> -- 
> Thanks,
> www.openkb.info 
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)



Hive 1.1 arg!

2015-07-07 Thread Edward Capriolo
Hey all. I am using cloudera 5.4.something which uses hive 1.1 almost.

I am getting bit by this error:
https://issues.apache.org/jira/browse/HIVE-10437

So I am trying to update my test setup to 1.1 so I can include the
annotation.


@SerDeSpec(schemaProps = {serdeConstants.LIST_COLUMNS,
  serdeConstants.LIST_COLUMN_TYPES,
  serdeConstants.TIMESTAMP_FORMATS})

I added this annotation. Now during my testing I am seeing this:

My serde does not read any table meta-data. It always returns the same list
of columns.

There are a lot of deeply nested columns. I have a unit test that is
creating a table using this serde.

Hive is angry:

Caused by: java.sql.SQLDataException: A truncation error was encountered
trying to shrink VARCHAR 'Video fields in beacon: vidId, vidAdViewed,
vidTime, vidStat&' to length 256.
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown
Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)

Does anyone understand why Hive is attempting to edit the meta-store. It
should just always read the values from this serde, and not need to persist
the columns.

AFAIK there is NO documentation anywhere as to what schema props should be
set when

What does serdeConstants.LIST_COLUMNS, do? When should someone use it? When
should someone not use it?


Hive Tez support matrix

2015-07-07 Thread Jim Green
Hi Team,

Is there any Hive <-> Tez support matrix?
For example, Hive 1.2 should be on Tez which version?
Tez 0.5.3 only supports which versions of Hive?
etc…

My understanding is that it does not matter which version of Hive and which
version of Tez.

-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)


RE: Limiting outer join

2015-07-07 Thread Bennie Leo
Thanks for your replies.
 
I see how extracting the first country would work, however I was hoping to 
speed up my query by stopping the search once a country has been found.
 
Are you suggesting that I pass the whole IP table to a UDF and perform the 
search myself? I've only programmed simple UDFs so far (ex: reformat a string), 
so any additional details would be appreciated. I am mostly concerned about 
importing said table (currently stored in Hive) and distributing the task 
across nodes (note that I use Tez).
 
Regards,
B
 
> Date: Mon, 6 Jul 2015 18:18:44 -0700
> Subject: Re: Limiting outer join
> From: gop...@apache.org
> To: user@hive.apache.org
> 
> 
> > In the following query, it is possible to limit the amount of entries
> >returned by an outer join to a single value? I want to obtain a single
> >country from ipv4geotable for each entry in logontable.
> 
> Yes, the PTF DENSE_RANK()/ROW_NUMBER() basically gives you that - you can
> read the first row out of each logon.IP except, there¹s no way to force
> which country wins over the other without an order by country in the
> OVER() clause as well.
> 
> That said, it will only get slower to produce 1 row per group, because of
> the distributed nature of the SQL engine, the reduction of data happens
> after a ordering shuffle.
> 
> You¹re doing range joins in a SQL engine without theta joins and MapReduce
> had no way to implement those at runtime (Tez has, with EdgeManager
> plugins).
> 
> The easiest/traditional approach out of doing geo-IP lookups is a compact
> UDF model without any joins at all.
> 
> There¹s some old threads on discussing this as a built-in & some code
> (with potential licensing issues) -
> http://markmail.org/message/w54j4upwg2wbh3xg
> 
> Cheers,
> Gopal
> 
> 
  

Re: WHERE ... NOT IN (...) + NULL values = BUG

2015-07-07 Thread Furcy Pin
Thanks matshyeq,

you are right, I tested it on other sql engines and the result is the same.
(but I still find this confusing...)

SELECT 1 IN (1,2,3,NULL) ;
> true

SELECT 1 IN (2,3) ;
> false

SELECT 1 IN (2,3,NULL) ;
> NULL

SELECT 1 NOT IN (1,2,3,NULL) ;
> false

SELECT 1 NOT IN (2,3,NULL) ;
> NULL

SELECT 1 NOT IN (2,3) ;
> true






On Tue, Jul 7, 2015 at 5:24 PM, Grant Overby (groverby) 
wrote:

>  "I call it my billion-dollar mistake. It was the invention of the null
> reference in 1965.”
> — Tony Hoare
>
>
> *Grant Overby*
> Software Engineer
> Cisco.com 
> grove...@cisco.com
> Mobile: *865 724 4910 <865%20724%204910>*
>
>
>
>Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> Please click here
>  for
> Company Registration Information.
>
>
>
>
>   From: matshyeq 
> Reply-To: "user@hive.apache.org" 
> Date: Tuesday, July 7, 2015 at 9:25 AM
> To: user 
> Subject: Re: WHERE ... NOT IN (...) + NULL values = BUG
>
>   >Obviously, the expected answer is always 2.
>
>  That's incorrect.
> It's expected behaviour, SQL standard and I would expect every other DBs
> behave same way.
> The direct comparison to NULL returns FALSE. Always. Doesn't matter if
>  used as <> ,=, IN, NOT IN.
> IS (NOT) NULL is the right way to handle such cases. COALESCE is some
> alternative too.
>
>  Thank you,
> Kind Regards
> ~Maciek
>
> On Tue, Jul 7, 2015 at 11:46 AM, Furcy Pin  wrote:
>
>> Hi folks,
>>
>>  just to let my fellow Hive users know that we found a bug with subquery
>> in where clauses and created a JIRA for it.
>>
>>  https://issues.apache.org/jira/browse/HIVE-11192
>>
>>  The latest version seems to be affected.
>>
>>  Regards,
>>
>>  Furcy Pin
>>
>
>


Re: WHERE ... NOT IN (...) + NULL values = BUG

2015-07-07 Thread Grant Overby (groverby)
"I call it my billion-dollar mistake. It was the invention of the null 
reference in 1965.”
— Tony Hoare


[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com
grove...@cisco.com
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you 
print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
here for 
Company Registration Information.





From: matshyeq mailto:matsh...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, July 7, 2015 at 9:25 AM
To: user mailto:user@hive.apache.org>>
Subject: Re: WHERE ... NOT IN (...) + NULL values = BUG

>Obviously, the expected answer is always 2.

That's incorrect.
It's expected behaviour, SQL standard and I would expect every other DBs behave 
same way.
The direct comparison to NULL returns FALSE. Always. Doesn't matter if  used as 
<> ,=, IN, NOT IN.
IS (NOT) NULL is the right way to handle such cases. COALESCE is some 
alternative too.

Thank you,
Kind Regards
~Maciek

On Tue, Jul 7, 2015 at 11:46 AM, Furcy Pin 
mailto:furcy@flaminem.com>> wrote:
Hi folks,

just to let my fellow Hive users know that we found a bug with subquery in 
where clauses and created a JIRA for it.

https://issues.apache.org/jira/browse/HIVE-11192

The latest version seems to be affected.

Regards,

Furcy Pin



Re: WHERE ... NOT IN (...) + NULL values = BUG

2015-07-07 Thread matshyeq
>Obviously, the expected answer is always 2.

That's incorrect.
It's expected behaviour, SQL standard and I would expect every other DBs
behave same way.
The direct comparison to NULL returns FALSE. Always. Doesn't matter if
 used as <> ,=, IN, NOT IN.
IS (NOT) NULL is the right way to handle such cases. COALESCE is some
alternative too.

Thank you,
Kind Regards
~Maciek

On Tue, Jul 7, 2015 at 11:46 AM, Furcy Pin  wrote:

> Hi folks,
>
> just to let my fellow Hive users know that we found a bug with subquery in
> where clauses and created a JIRA for it.
>
> https://issues.apache.org/jira/browse/HIVE-11192
>
> The latest version seems to be affected.
>
> Regards,
>
> Furcy Pin
>


Re: array become struct<> when doing select

2015-07-07 Thread Karan Kumar
Issue is with the thrift version you are using most probably.
https://issues.apache.org/jira/browse/THRIFT-2172

I used thrift-0.9.2 to generate my thrift classes which solved this kind of
issue.

On Tue, Jul 7, 2015 at 6:14 PM, Binglin Chang  wrote:

> Sorry, forgot to mention, the table is using thrift serde, but 'show
> create table' shows the table is ROW FORMAT DELIMITED, which I think is a
> bug.
> When select simple text format table, the query runs fine, but when select
>  thrift table,  error occurs.
>
> original create table statement:
>
> CREATE EXTERNAL TABLE xxx(commonUserId
> STRUCT,lastActiveTime BIGINT,searchWords
> ARRAY,calls MAP>,hasUsedRecharge
> INT,hasUsedExpress INT,hasUsedViolateRegulation
> INT,hasUsedLicensePlateLottery INT,interestedShops ARRAY) ROW
> FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH
> SERDEPROPERTIES('serialization.format'='org.apache.thrift.protocol.TCompactProtocol','serialization.class'='com.xiaomi.data.spec.platform.xxx')
> STORED AS SEQUENCEFILE LOCATION ''
>
>
>
>
> On Tue, Jul 7, 2015 at 7:44 PM, Binglin Chang  wrote:
>
>> Hi,
>>
>> I have a table with some array fields, when preview them using "select
>> limit" at beeline, I got following errors, it seems the typeinfo string is
>> changed from array to struct<>
>> I am using hive-0.13.1
>>
>> 0: jdbc:hive2://lg-hadoop-hive01.bj:32203/> show create table xxx;
>>
>> ++--+
>> | createtab_stmt
>> |
>>
>> ++--+
>> | CREATE EXTERNAL TABLE `xxx`(|
>> |   `commonuserid` struct,
>>|
>> |   `lastactivetime` bigint,
>> |
>> |   `searchwords` array,
>> |
>> |   `calls` map>,
>> |
>> |   `hasusedrecharge` int,
>> |
>> |   `hasusedexpress` int,
>>|
>> |   `hasusedviolateregulation` int,
>>|
>> |   `hasusedlicenseplatelottery` int,
>>|
>> |   `interestedshops` array)
>> |
>> | ROW FORMAT DELIMITED
>> |
>> | STORED AS INPUTFORMAT
>>|
>> |   'org.apache.hadoop.mapred.SequenceFileInputFormat'
>> |
>> | OUTPUTFORMAT
>> |
>> |   'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
>>|
>>
>>
>> 0: jdbc:hive2://lg-hadoop-hive01.bj:32203/> select searchwords from
>>  yellowpage.yp_user_actions limit 1;
>> Error: Error while compiling statement: FAILED: SemanticException
>> java.lang.IllegalArgumentException: Error: name expected at the position 7
>> of 'struct<>' but '>' is found. (state=42000,code=4)
>>
>> Full stack:
>>
>> org.apache.hadoop.hive.ql.parse.SemanticException: 
>> java.lang.IllegalArgumentException: Error: name expected at the position 7 
>> of 'struct<>' but '>' is found.
>>  at 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:5949)
>>  at 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:5845)
>>  at 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8235)
>>  at 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8126)
>>  at 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8956)
>>  at 
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9209)
>>  at 
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:206)
>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:435)
>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:333)
>>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:989)
>>  at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:982)
>>  at 
>> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:123)
>>  at 
>> org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:197)
>>  at 
>> org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:734)
>>  at 
>> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:376)
>>  at 
>> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:362)
>>  at 
>> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:240)
>>  at 
>> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:378)
>>  at 
>> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
>>  at 
>> org.apache.hiv

Re: array become struct<> when doing select

2015-07-07 Thread Binglin Chang
Sorry, forgot to mention, the table is using thrift serde, but 'show create
table' shows the table is ROW FORMAT DELIMITED, which I think is a bug.
When select simple text format table, the query runs fine, but when select
 thrift table,  error occurs.

original create table statement:

CREATE EXTERNAL TABLE xxx(commonUserId
STRUCT,lastActiveTime BIGINT,searchWords
ARRAY,calls MAP>,hasUsedRecharge
INT,hasUsedExpress INT,hasUsedViolateRegulation
INT,hasUsedLicensePlateLottery INT,interestedShops ARRAY) ROW
FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH
SERDEPROPERTIES('serialization.format'='org.apache.thrift.protocol.TCompactProtocol','serialization.class'='com.xiaomi.data.spec.platform.xxx')
STORED AS SEQUENCEFILE LOCATION ''




On Tue, Jul 7, 2015 at 7:44 PM, Binglin Chang  wrote:

> Hi,
>
> I have a table with some array fields, when preview them using "select
> limit" at beeline, I got following errors, it seems the typeinfo string is
> changed from array to struct<>
> I am using hive-0.13.1
>
> 0: jdbc:hive2://lg-hadoop-hive01.bj:32203/> show create table xxx;
>
> ++--+
> | createtab_stmt
>   |
>
> ++--+
> | CREATE EXTERNAL TABLE `xxx`(|
> |   `commonuserid` struct,
>|
> |   `lastactivetime` bigint,
>   |
> |   `searchwords` array,
>   |
> |   `calls` map>,
>   |
> |   `hasusedrecharge` int,
>   |
> |   `hasusedexpress` int,
>|
> |   `hasusedviolateregulation` int,
>|
> |   `hasusedlicenseplatelottery` int,
>|
> |   `interestedshops` array)
>   |
> | ROW FORMAT DELIMITED
>   |
> | STORED AS INPUTFORMAT
>|
> |   'org.apache.hadoop.mapred.SequenceFileInputFormat'
>   |
> | OUTPUTFORMAT
>   |
> |   'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
>|
>
>
> 0: jdbc:hive2://lg-hadoop-hive01.bj:32203/> select searchwords from
>  yellowpage.yp_user_actions limit 1;
> Error: Error while compiling statement: FAILED: SemanticException
> java.lang.IllegalArgumentException: Error: name expected at the position 7
> of 'struct<>' but '>' is found. (state=42000,code=4)
>
> Full stack:
>
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> java.lang.IllegalArgumentException: Error: name expected at the position 7 of 
> 'struct<>' but '>' is found.
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:5949)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:5845)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8235)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8126)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8956)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9209)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:206)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:435)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:333)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:989)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:982)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:123)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:197)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:734)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:376)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:362)
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:240)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:378)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S

array become struct<> when doing select

2015-07-07 Thread Binglin Chang
Hi,

I have a table with some array fields, when preview them using "select
limit" at beeline, I got following errors, it seems the typeinfo string is
changed from array to struct<>
I am using hive-0.13.1

0: jdbc:hive2://lg-hadoop-hive01.bj:32203/> show create table xxx;
++--+
| createtab_stmt
  |
++--+
| CREATE EXTERNAL TABLE `xxx`(|
|   `commonuserid` struct,
 |
|   `lastactivetime` bigint,
  |
|   `searchwords` array,
  |
|   `calls` map>,
  |
|   `hasusedrecharge` int,
  |
|   `hasusedexpress` int,
 |
|   `hasusedviolateregulation` int,
 |
|   `hasusedlicenseplatelottery` int,
 |
|   `interestedshops` array)
  |
| ROW FORMAT DELIMITED
  |
| STORED AS INPUTFORMAT
 |
|   'org.apache.hadoop.mapred.SequenceFileInputFormat'
  |
| OUTPUTFORMAT
  |
|   'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
 |


0: jdbc:hive2://lg-hadoop-hive01.bj:32203/> select searchwords from
 yellowpage.yp_user_actions limit 1;
Error: Error while compiling statement: FAILED: SemanticException
java.lang.IllegalArgumentException: Error: name expected at the position 7
of 'struct<>' but '>' is found. (state=42000,code=4)

Full stack:

org.apache.hadoop.hive.ql.parse.SemanticException:
java.lang.IllegalArgumentException: Error: name expected at the
position 7 of 'struct<>' but '>' is found.
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:5949)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:5845)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8235)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8126)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:8956)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9209)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:206)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:435)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:333)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:989)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:982)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:123)
at 
org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:197)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:734)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:376)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:362)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:240)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:378)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:677)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalArgumentException: Error: name expected at
the position 7 of 'struct<>' but '>' is found.
at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:354)
at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:478)
at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
at 
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUt

WHERE ... NOT IN (...) + NULL values = BUG

2015-07-07 Thread Furcy Pin
Hi folks,

just to let my fellow Hive users know that we found a bug with subquery in
where clauses and created a JIRA for it.

https://issues.apache.org/jira/browse/HIVE-11192

The latest version seems to be affected.

Regards,

Furcy Pin


hbase column without prefix

2015-07-07 Thread Wojciech Indyk
Hi!
I use hbase column regex matching to create map column in hive, like:
"hbase.columns.mapping" = ":key,s:ap_.*"
then I have values in column:
{"ap_col1":"23","ap_col2":"7"}
is it possible to cut the prefix ap_ to have values like below?
{"col1":"23","col2":"7"}

Kindly regards
Wojciech Indyk