Re: LIKE filter pushdown for tables and partitions

2013-08-26 Thread Sergey Shelukhin
Adding user list. Any objections to removing LIKE support from
getPartitionsByFilter?

On Mon, Aug 26, 2013 at 2:54 PM, Ashutosh Chauhan wrote:

> Couple of questions:
>
> 1. What about LIKE operator for Hive itself? Will that continue to work
> (presumably because there is an alternative path for that).
> 2. This will nonetheless break other direct consumers of metastore client
> api (like HCatalog).
>
> I see your point that we have a buggy implementation, so whats out there is
> not safe to use. Question than really is shall we remove this code, thereby
> breaking people for whom current buggy implementation is good enough (or
> you can say salvage them from breaking in future). Or shall we try to fix
> it now?
> My take is if there are no users of this anyways, then there is no point
> fixing it for non-existing users, but if there are we probably have to. I
> will suggest you to send an email to users@hive to ask if there are users
> for this.
>
> Thanks,
> Ashutosh
>
>
>
> On Mon, Aug 26, 2013 at 2:08 PM, Sergey Shelukhin  >wrote:
>
> > Since there's no response I am assuming nobody cares about this code...
> > Jira is HIVE-5134, I will attach a patch with removal this week.
> >
> > On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin <
> ser...@hortonworks.com
> > >wrote:
> >
> > > Hi.
> > >
> > > I think there are issues with the way hive can currently do LIKE
> > > operator JDO pushdown and it the code should be removed for partitions
> > > and tables.
> > > Are there objections to removing LIKE from Filter.g and related areas?
> > > If no I will file a JIRA and do it.
> > >
> > > Details:
> > > There's code in metastore that is capable of pushing down LIKE
> > > expression into JDO for string partition keys, as well as tables.
> > > The code for tables doesn't appear used, and partition code definitely
> > > doesn't run in Hive proper because metastore client doesn't send LIKE
> > > expressions to server. It may be used in e.g. HCat and other places,
> > > but after asking some people here, I found out it probably isn't.
> > > I was trying to make it run and noticed some problems:
> > > 1) For partitions, Hive sends SQL patterns in a filter for like, e.g.
> > > "%foo%", whereas metastore passes them into matches() JDOQL method
> > > which expects Java regex.
> > > 2) Converting the pattern to Java regex via UDFLike method, I found
> > > out that not all regexes appear to work in DN. ".*foo" seems to work
> > > but anything complex (such as escaping the pattern using
> > > Pattern.quote, which UDFLike does) breaks and no longer matches
> > > properly.
> > > 3) I tried to implement common cases using JDO methods
> > > startsWith/endsWith/indexOf (I will file a JIRA), but when I run tests
> > > on Derby, they also appear to have problems with some strings (for
> > > example, partition with backslash in the name cannot be matched by
> > > LIKE "%\%" (single backslash in a string), after being converted to
> > > .indexOf(param) where param is "\" (escaping the backslash once again
> > > doesn't work either, and anyway there's no documented reason why it
> > > shouldn't work properly), while other characters match correctly, even
> > > e.g. "%".
> > >
> > > For tables, there's no SQL-like, it expects Java regex, but I am not
> > > convinced all Java regexes are going to work.
> > >
> > > So, I think that for future correctness sake it's better to remove this
> > > code.
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: LIKE filter pushdown for tables and partitions

2013-08-27 Thread Sergey Shelukhin
This method is used to prune partitions for the job (separately from
actually processing data).
There are a few ways to get partitions from Hive for a query (to avoid
reading all partitions when filtering involves partition columns)  -
get-by-filter that I want to modify is one of them. Hive itself uses it as
a perf optimization; the normal path gets all partition column values (via
partition names) and applies the filter locally, whereas the optimized path
converts the filter to JDOQL for DataNucleus (that Hive metastore uses
internally), which converts it to SQL queries for e.g. MySQL. This normally
happens before MR job is even run.

Hive uses the latter (JDOQL pushdown) path for a restricted set of filters.
These are enforced in Hive metastore client, not server; the server
supports a wider set of filters, but Hive itself doesn't use them. While
trying to enable Hive to use a wider set I noticed that the LIKE filter
doesn't work properly - both regex and indexOf/... functions in DN seem to
have some weird edge cases. It may be sending some things directly to
datastore which would not actually work.
However they would work for simple regexes (definition of simple is not
clear and may be not the same for all datastores).

Given that there's normal path to filter partitions in hive client and
pre-job perf optimization for like is not that important, I want to remove
this for Hive,
I assume that other products using this path must apply filtering on client
too sometimes (because getPartitionsByFilter doesn't support all filters
even on server, e.g. such  operators as not, between, etc.).

On Tue, Aug 27, 2013 at 9:13 AM, Stephen Sprague  wrote:

> sorry to be dumb-ass but what does that translate into in the HSQL dialect?
>
> Judging from the name you use, getPartitionsByFilter, you're saying you
> want to remove the use case of using like clause on a partition column?
>
> if so, um, yeah, i would think that's surely used.
>
>
>
> On Mon, Aug 26, 2013 at 7:48 PM, Sergey Shelukhin  >wrote:
>
> > Adding user list. Any objections to removing LIKE support from
> > getPartitionsByFilter?
> >
> > On Mon, Aug 26, 2013 at 2:54 PM, Ashutosh Chauhan  > >wrote:
> >
> > > Couple of questions:
> > >
> > > 1. What about LIKE operator for Hive itself? Will that continue to work
> > > (presumably because there is an alternative path for that).
> > > 2. This will nonetheless break other direct consumers of metastore
> client
> > > api (like HCatalog).
> > >
> > > I see your point that we have a buggy implementation, so whats out
> there
> > is
> > > not safe to use. Question than really is shall we remove this code,
> > thereby
> > > breaking people for whom current buggy implementation is good enough
> (or
> > > you can say salvage them from breaking in future). Or shall we try to
> fix
> > > it now?
> > > My take is if there are no users of this anyways, then there is no
> point
> > > fixing it for non-existing users, but if there are we probably have
> to. I
> > > will suggest you to send an email to users@hive to ask if there are
> > users
> > > for this.
> > >
> > > Thanks,
> > > Ashutosh
> > >
> > >
> > >
> > > On Mon, Aug 26, 2013 at 2:08 PM, Sergey Shelukhin <
> > ser...@hortonworks.com
> > > >wrote:
> > >
> > > > Since there's no response I am assuming nobody cares about this
> code...
> > > > Jira is HIVE-5134, I will attach a patch with removal this week.
> > > >
> > > > On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin <
> > > ser...@hortonworks.com
> > > > >wrote:
> > > >
> > > > > Hi.
> > > > >
> > > > > I think there are issues with the way hive can currently do LIKE
> > > > > operator JDO pushdown and it the code should be removed for
> > partitions
> > > > > and tables.
> > > > > Are there objections to removing LIKE from Filter.g and related
> > areas?
> > > > > If no I will file a JIRA and do it.
> > > > >
> > > > > Details:
> > > > > There's code in metastore that is capable of pushing down LIKE
> > > > > expression into JDO for string partition keys, as well as tables.
> > > > > The code for tables doesn't appear used, and partition code
> > definitely
> > > > > doesn't run in Hive proper because metastore client doesn't send
> LIKE
> > > > > expressions to server. It may be used in e.g. HCat a

Re: [DISCUSS] Proposed Changes to the Apache Hive Project Bylaws

2013-12-27 Thread Sergey Shelukhin
I actually have a patch out on a jira that says it will be committed in 24
hours from long ago ;)

Is 24h rule is needed at all? In other projects, I've seen patches simply
reverted by author (or someone else). It's a rare occurrence, and it should
be possible to revert a patch if someone -1s it after commit, esp. within
the same 24 hours when not many other changes are in.


On Fri, Dec 27, 2013 at 1:03 PM, Thejas Nair  wrote:

> I agree with Ashutosh that the 24 hour waiting period after +1 is
> cumbersome, I have also forgotten to commit patches after +1,
> resulting in patches going stale.
>
> But I think 24 hours wait between creation of jira and patch commit is
> not very useful, as the thing to be examined is the patch and not the
> jira summary/description.
> I think having a waiting period of 24 hours between a jira being made
> 'patch available' and committing is better and sufficient.
>
>
> On Fri, Dec 27, 2013 at 11:44 AM, Ashutosh Chauhan 
> wrote:
> > Proposed changes look good to me, both suggested by Carl and Thejas.
> > Another one I would like to add for consideration is: 24 hour rule
> between
> > +1 and commit. Since this exists only in Hive (no other apache project
> > which I am aware of) this surprises new contributors. More importantly, I
> > have seen multiple cases where patch didn't get committed because
> committer
> > after +1 forgot to commit after 24 hours have passed. I propose to modify
> > that one such that there must be 24 hour duration between creation of
> jira
> > and patch commit, that will ensure that there is sufficient time for
> folks
> > to see changes which are happening on trunk.
> >
> > Thanks,
> > Ashutosh
> >
> >
> > On Fri, Dec 27, 2013 at 9:33 AM, Thejas Nair 
> wrote:
> >
> >> The changes look good to me.
> >> Only concern I have is with the 7 days for release candidate voting.
> >> Based on my experience with releases, it often takes few cycles to get
> >> the candidate out, and people tend to vote closer to the end of the
> >> voting period. This can mean that it takes several weeks to get a
> >> release out. But this will not be so much of a problem as long as
> >> people don't wait for end of the voting period to vote, or if they
> >> look at the candidate branch even before the release candidate is out.
> >>
> >> Should we also include a provision for branch merges ? I think we
> >> should have a longer voting period for branch merges (3 days instead
> >> of 1?) and require 3 +1s (this part is also in the hadoop by-law ) .
> >>
> >>
> >> On Thu, Dec 26, 2013 at 7:08 PM, Carl Steinbach  wrote:
> >> > I think we should make several changes to the Apache Hive Project
> Bylaws.
> >> > The proposed changes are available for review here:
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38568856
> >> >
> >> > Most of the changes were directly inspired by provisions found in the
> >> Apache
> >> > Hadoop Project Bylaws.
> >> >
> >> > Summary of proposed changes:
> >> >
> >> > * Add provisions for branch committers and speculative branches.
> >> >
> >> > * Define the responsibilities of a release manager.
> >> >
> >> > * PMC Chairs serve for one year and are elected by the PMC using
> Single
> >> > Transferable Vote (STV) voting.
> >> >
> >> > * With the exception of code change votes, the minimum length of all
> >> voting
> >> > periods is extended to seven days.
> >> >
> >> > Thanks.
> >> >
> >> > Carl
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> immediately
> >> and delete it from your system. Thank You.
> >>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient,

Re: [ANNOUNCE] New Hive Committer - Vikram Dixit

2014-01-06 Thread Sergey Shelukhin
Congrats Vikram!


On Mon, Jan 6, 2014 at 8:58 AM, Carl Steinbach  wrote:

> The Apache Hive PMC has voted to make Vikram Dixit a committer on the
> Apache Hive Project.
>
> Please join me in congratulating Vikram!
>
> Thanks.
>
> Carl
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [ANNOUNCE] New Hive Committers - Sergey Shelukhin and Jason Dere

2014-01-27 Thread Sergey Shelukhin
Thanks guys!


On Mon, Jan 27, 2014 at 9:24 AM, Jarek Jarcec Cecho wrote:

> Congratulations Sergey and Jason, good job!
>
> Jarcec
>
> On Mon, Jan 27, 2014 at 08:36:37AM -0800, Carl Steinbach wrote:
> > The Apache Hive PMC has voted to make Sergey Shelukhin and Jason Dere
> > committers on the Apache Hive Project.
> >
> > Please join me in congratulating Sergey and Jason!
> >
> > Thanks.
> >
> > Carl
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [ANNOUNCE] New Hive Committers - Alan Gates, Daniel Dai, and Sushanth Sowmyan

2014-04-14 Thread Sergey Shelukhin
Congrats!


On Mon, Apr 14, 2014 at 10:55 AM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> Congratulations everyone!!
>
> Thanks
> Prasanth Jayachandran
>
> On Apr 14, 2014, at 10:51 AM, Carl Steinbach  wrote:
>
> > The Apache Hive PMC has voted to make Alan Gates, Daniel Dai, and
> Sushanth
> > Sowmyan committers on the Apache Hive Project.
> >
> > Please join me in congratulating Alan, Daniel, and Sushanth!
> >
> > - Carl
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-27 Thread Sergey Shelukhin
Congrats!


On Mon, Jun 23, 2014 at 11:05 AM, Jayesh Senjaliya 
wrote:

> Congratulations Gopal and Szehon !!
>
>
> On Mon, Jun 23, 2014 at 10:35 AM, Vikram Dixit 
> wrote:
>
> > Congrats Gopal and Szehon!
> >
> >
> > On Mon, Jun 23, 2014 at 10:34 AM, Jason Dere 
> > wrote:
> >
> >> Congrats!
> >>
> >> On Jun 23, 2014, at 10:28 AM, Hari Subramaniyan <
> >> hsubramani...@hortonworks.com> wrote:
> >>
> >> > congrats to Gopal and Szehon!
> >> >
> >> > Thanks
> >> > Hari
> >> >
> >> >
> >> > On Mon, Jun 23, 2014 at 9:59 AM, Xiaobing Zhou  >
> >> > wrote:
> >> >
> >> >> Congrats!
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Jun 23, 2014 at 9:52 AM, Vaibhav Gumashta <
> >> >> vgumas...@hortonworks.com
> >> >>> wrote:
> >> >>
> >> >>> Congrats Gopal and Szehon!
> >> >>>
> >> >>> --Vaibhav
> >> >>>
> >> >>>
> >> >>> On Mon, Jun 23, 2014 at 8:48 AM, Szehon Ho 
> >> wrote:
> >> >>>
> >>  Thank you all very much, and congrats Gopal!
> >>  Szehon
> >> 
> >> 
> >>  On Sun, Jun 22, 2014 at 8:42 PM, Carl Steinbach 
> >> >> wrote:
> >> 
> >> > The Apache Hive PMC has voted to make Gopal Vijayaraghavan and
> >> Szehon
> >> >>> Ho
> >> > committers on the Apache Hive Project.
> >> >
> >> > Please join me in congratulating Gopal and Szehon!
> >> >
> >> > Thanks.
> >> >
> >> > - Carl
> >> >
> >> 
> >> >>>
> >> >>> --
> >> >>> CONFIDENTIALITY NOTICE
> >> >>> NOTICE: This message is intended for the use of the individual or
> >> entity
> >> >> to
> >> >>> which it is addressed and may contain information that is
> >> confidential,
> >> >>> privileged and exempt from disclosure under applicable law. If the
> >> reader
> >> >>> of this message is not the intended recipient, you are hereby
> notified
> >> >> that
> >> >>> any printing, copying, dissemination, distribution, disclosure or
> >> >>> forwarding of this communication is strictly prohibited. If you have
> >> >>> received this communication in error, please contact the sender
> >> >> immediately
> >> >>> and delete it from your system. Thank You.
> >> >>>
> >> >>
> >> >> --
> >> >> CONFIDENTIALITY NOTICE
> >> >> NOTICE: This message is intended for the use of the individual or
> >> entity to
> >> >> which it is addressed and may contain information that is
> confidential,
> >> >> privileged and exempt from disclosure under applicable law. If the
> >> reader
> >> >> of this message is not the intended recipient, you are hereby
> notified
> >> that
> >> >> any printing, copying, dissemination, distribution, disclosure or
> >> >> forwarding of this communication is strictly prohibited. If you have
> >> >> received this communication in error, please contact the sender
> >> immediately
> >> >> and delete it from your system. Thank You.
> >> >>
> >> >
> >> > --
> >> > CONFIDENTIALITY NOTICE
> >> > NOTICE: This message is intended for the use of the individual or
> >> entity to
> >> > which it is addressed and may contain information that is
> confidential,
> >> > privileged and exempt from disclosure under applicable law. If the
> >> reader
> >> > of this message is not the intended recipient, you are hereby notified
> >> that
> >> > any printing, copying, dissemination, distribution, disclosure or
> >> > forwarding of this communication is strictly prohibited. If you have
> >> > received this communication in error, please contact the sender
> >> immediately
> >> > and delete it from your system. Thank You.
> >>
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> >> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> >> immediately
> >> and delete it from your system. Thank You.
> >>
> >
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> > to which it is addressed and may contain information that is
> confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the read

wiki edit access

2014-09-02 Thread Sergey Shelukhin
Hi.
Can I get wiki edit access for Hive? Confluence username sershe

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [ANNOUNCE] New Hive Committers -- Chao Sun, Chengxiang Li, and Rui Li

2015-02-11 Thread Sergey Shelukhin
Congrats!

On 15/2/10, 19:00, "Xu, Cheng A"  wrote:

>Congrats!
>
>-Original Message-
>From: Li, Rui [mailto:rui...@intel.com]
>Sent: Wednesday, February 11, 2015 10:26 AM
>To: user@hive.apache.org; d...@hive.apache.org
>Subject: RE: [ANNOUNCE] New Hive Committers -- Chao Sun, Chengxiang Li,
>and Rui Li
>
>Thanks guys. It's a great honor!
>
>Cheers,
>Rui Li
>
>
>-Original Message-
>From: Vaibhav Gumashta [mailto:vgumas...@hortonworks.com]
>Sent: Tuesday, February 10, 2015 6:12 AM
>To: user@hive.apache.org; d...@hive.apache.org
>Subject: Re: [ANNOUNCE] New Hive Committers -- Chao Sun, Chengxiang Li,
>and Rui Li
>
>Congratulations to all.
>
>
>On 2/9/15, 2:06 PM, "Prasanth Jayachandran"
> wrote:
>
>>Congratulations!
>>
>>> On Feb 9, 2015, at 1:57 PM, Na Yang  wrote:
>>> 
>>> Congratulations!
>>> 
>>> On Mon, Feb 9, 2015 at 1:06 PM, Vikram Dixit K 
>>> wrote:
>>> 
 Congrats guys!
 
 On Mon, Feb 9, 2015 at 12:42 PM, Szehon Ho 
wrote:
 
> Congratulations guys !
> 
> On Mon, Feb 9, 2015 at 3:38 PM, Jimmy Xiang 
>wrote:
> 
>> Congrats!!
>> 
>> On Mon, Feb 9, 2015 at 12:36 PM, Alexander Pivovarov <
> apivova...@gmail.com
>>> 
>> wrote:
>> 
>>> Congrats!
>>> 
>>> On Mon, Feb 9, 2015 at 12:31 PM, Carl Steinbach 
> wrote:
>>> 
 The Apache Hive PMC has voted to make Chao Sun, Chengxiang Li, and
 Rui
>> Li
 committers on the Apache Hive Project.
 
 Please join me in congratulating Chao, Chengxiang, and Rui!
 
 Thanks.
 
 - Carl
 
 
>>> 
>> 
> 
 
 
 
 --
 Nothing better than when appreciated for hard work.
 -Mark
 
>>
>



Re: [ANNOUNCE] New Hive PMC Member - Sergey Shelukhin

2015-02-25 Thread Sergey Shelukhin
Thanks guys!

On 15/2/25, 16:02, "Xiaobing Zhou"  wrote:

>Congrats Sergey!
>
>On Feb 25, 2015, at 1:56 PM, Prasanth Jayachandran
> wrote:
>
>> Congrats Sergey!
>> 
>> On Feb 25, 2015, at 1:50 PM, Alexander Pivovarov
>>mailto:apivova...@gmail.com>> wrote:
>> 
>> Congrats!
>> 
>> On Wed, Feb 25, 2015 at 12:33 PM, Vaibhav Gumashta
>>mailto:vgumas...@hortonworks.com>> wrote:
>> Congrats Sergey!
>> 
>> On 2/25/15, 9:06 AM, "Vikram Dixit"
>>mailto:vik...@hortonworks.com>> wrote:
>> 
>>> Congrats Sergey!
>>> 
>>> On 2/25/15, 8:43 AM, "Carl Steinbach"
>>>mailto:c...@apache.org>> wrote:
>>> 
>>>> I am pleased to announce that Sergey Shelukhin has been elected to the
>>>> Hive
>>>> Project Management Committee. Please join me in congratulating Sergey!
>>>> 
>>>> Thanks.
>>>> 
>>>> - Carl
>>> 
>> 
>> 
>> 
>



Re: [ANNOUNCE] New Hive Committers - Jimmy Xiang, Matt McCline, and Sergio Pena

2015-03-23 Thread Sergey Shelukhin
Congrats!

From: Carl Steinbach mailto:c...@apache.org>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Monday, March 23, 2015 at 10:08
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, Matthew McCline 
mailto:mmccl...@hortonworks.com>>, 
"jxi...@apache.org" 
mailto:jxi...@apache.org>>, Sergio Pena 
mailto:sergio.p...@cloudera.com>>
Subject: [ANNOUNCE] New Hive Committers - Jimmy Xiang, Matt McCline, and Sergio 
Pena

The Apache Hive PMC has voted to make Jimmy Xiang, Matt McCline, and Sergio 
Pena committers on the Apache Hive Project.

Please join me in congratulating Jimmy, Matt, and Sergio.

Thanks.

- Carl



Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-15 Thread Sergey Shelukhin
Congrats!

From: , Cheng A mailto:cheng.a...@intel.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, April 14, 2015 at 18:03
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, Chris Drome 
mailto:cdr...@yahoo-inc.com>>
Cc: "mit...@apache.org" 
mailto:mit...@apache.org>>
Subject: RE: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan


Congrats Mithun!

From: Gunther Hagleitner [mailto:ghagleit...@hortonworks.com]
Sent: Wednesday, April 15, 2015 8:10 AM
To: d...@hive.apache.org; Chris Drome; 
user@hive.apache.org
Cc: mit...@apache.org
Subject: Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan


Congrats Mithun!



Thanks,

Gunther.


From: Chao Sun mailto:c...@cloudera.com>>
Sent: Tuesday, April 14, 2015 3:48 PM
To: d...@hive.apache.org; Chris Drome
Cc: user@hive.apache.org; 
mit...@apache.org
Subject: Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

Congrats Mithun!

On Tue, Apr 14, 2015 at 3:29 PM, Chris Drome 
mailto:cdr...@yahoo-inc.com.invalid>> wrote:
Congratulations Mithun!



 On Tuesday, April 14, 2015 2:57 PM, Carl Steinbach 
mailto:c...@apache.org>> wrote:


 The Apache Hive PMC has voted to make Mithun Radhakrishnan a committer on the 
Apache Hive Project.
Please join me in congratulating Mithun.
Thanks.
- Carl






--
Best,
Chao


Re: Hive CBO - Calcite Interface

2015-08-14 Thread Sergey Shelukhin
You can also take a look at https://issues.apache.org/jira/browse/HIVE-11471 
(although there’s no patch yet).

From: John Pullokkaran 
mailto:jpullokka...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, August 14, 2015 at 12:11
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Subject: Re: Hive CBO - Calcite Interface

Hi Raajay,

#1 No, there is no API for this.
#2 If you enable Logging (BaseSemanticAnalyzer) then CalcitePlanner will print 
out the plan with cost.

John

From: Raajay mailto:raaja...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Monday, August 10, 2015 at 8:48 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Subject: Hive CBO - Calcite Interface

nterface for Hive to get the absolute cost (based on Hive Cost Factory) of a 
operator tree returned by Calcite ?


Re: MapJoin bug?

2015-08-14 Thread Sergey Shelukhin
This looks like a real bug. Matthew might know if there’s already a fix or a 
ticket, otherwise you should open a JIRA.

From: Ted Xu mailto:frank...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, August 14, 2015 at 03:56
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: MapJoin bug?

Hi all,

I was doing TPC-H benchmark on Hive recently while I found some queries went 
wrong.

Following are the two cases, both are MapJoin while the join key is bigint 
type. After disabling auto convert join the error is gone.

Case 1.
Query (TPC-H query4):

create table q4_result as
select
o_orderpriority,
count(*) as order_count
from
orders o
join
(
select
distinct l_orderkey
from
(
select
*
from
lineitem
where
l_commitdate < l_receiptdate
) tab1
) tab2
on tab2.l_orderkey = o.o_orderkey
where
o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01'
group by
o_orderpriority
order by
o_orderpriority;

The query will cause data-loss if MapJoin is enabled. Both side of join have 
expected output but some data can't be joined together here. (Note l_orderkey & 
o_orderkey is bigint).

Case 2:
Query (TPC-H query9):

create table q9_result as
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
substr(o_orderdate,1,4) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
supplier s
join lineitem l on s.s_suppkey = l.l_suppkey
join partsupp ps on ps.ps_suppkey = l.l_suppkey and ps.ps_partkey = l.l_partkey
join part p on p.p_partkey = l.l_partkey
join orders o on o.o_orderkey = l.l_orderkey
join nation n on s.s_nationkey = n.n_nationkey
where
p_name like '%green%'
) profit
group by
nation,
o_year
order by
nation,
o_year desc;


The error is when joining table s and n, we got an exception as follows:

Error: Failure while running task:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
... 15 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:368)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFilterO

Re: HiveServer2 & Kerberos

2015-08-24 Thread Sergey Shelukhin
If that is the case it sounds like a bug…

From: Jary Du mailto:jary...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, August 20, 2015 at 08:56
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: HiveServer2 & Kerberos

My understanding is that it will always ask you user/password even though you 
don’t need them. It is just the way how hive is setup.

On Aug 20, 2015, at 8:28 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

!connect 
jdbc:hive2://192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl
 org.apache.hive.jdbc.HiveDriver
scan complete in 13ms
Connecting to 
jdbc:hive2://192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl
Enter password for 
jdbc:hive2://192.168.6.210:1/chaneldb;principal=hive/hiveh...@westeros.wl:

And if I press enter everything works perfectly, because I am using Kerberos 
authentication, that's actually why I was asking what is Hive asking for, 
because in my case, it seems that I shouldn't be asked for a password when 
connecting.

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-20 17:06 GMT+02:00 Jary Du 
mailto:jary...@gmail.com>>:
How does Beeline ask you? What happens if you just press enter?



On Aug 20, 2015, at 12:15 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

Indeed, I don't need the password, but why is Beeline asking me for one ? To 
what does it correspond ?

Thanks again,


Loïc

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-19 18:22 GMT+02:00 Jary Du 
mailto:jary...@gmail.com>>:
Correct me if I am wrong, my understanding is that after using kerberos 
authentication, you probably don’t need the password.

Hope it helps

Thanks,
Jary


On Aug 19, 2015, at 9:09 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

By the way, thanks a lot for your help, because your solution works, but I'm 
still interested in knowing what is the password I did not enter.

Thanks again,


Loïc

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-19 18:07 GMT+02:00 Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>>:
All right, but then, what is the password hive asks for ? Hive's one ? How do I 
know its value ?

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-19 17:51 GMT+02:00 Jary Du 
mailto:jary...@gmail.com>>:
For Beeline connection string, it should be "!connect 
jdbc:hive2://:/;principal=”. 
Please make sure it is the hive’s principal, not the user’s. And when you 
kinit, it should be kinit user’s keytab, not the hive’s keytab.





On Aug 19, 2015, at 8:46 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

Yeah, I forgot to mention it, but each time I did a kinit user/hive before 
launching beeline, as I read somewhere that Beeline does not handle Kerberos 
connection.

So, as I can make klist before launching beeline and having a good result, the 
problem does not come from this. Thanks a lot for your response though.
Do you have another idea ?

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-19 17:42 GMT+02:00 Jary Du 
mailto:jary...@gmail.com>>:
"The Beeline client must have a valid Kerberos ticket in the ticket cache 
before attempting to connect." 
(http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_dataintegration/content/ch_using-hive-clients-examples.html)

So you need kinit first to have the valid Kerberos ticket int the ticket cache 
before using beeline to connect to HS2.

Jary

On Aug 19, 2015, at 8:36 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

Hi again,

As I searched another way to make some requests with Kerberos enabled for 
security on HiveServer, I found that this request should do the same :
!connect 
jdbc:hive2://192.168.6.210:1/default;principal=user/h...@westeros.wl
 org.apache.hive.jdbc.HiveDriver
But now I've got another error :
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://192.168.6.210:1/default;principal=user/h...@westeros.wl:
 Peer indicated failure: GSS initiate failed (state=08S01,code=0)

As I saw that it was maybe a simple Kerberos ticket related problem, I tried to 
re-generate Kerberos keytabs, and to ensure that Hive has the path to access to 
its keytab, but nothing changed.

Does anyone has an idea about how to solve this issue ?

Thanks in advance for your help :)


Loïc

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at

Re: Run multiple queries simultaneously

2015-08-25 Thread Sergey Shelukhin
You can start HiveServer2, then submit queries to it using JDBC. If you open 
multiple sessions using multiple threads, you will be able to submit queries in 
parallel, although the compilation is still currently serialized.

From: Raajay mailto:raaja...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, August 25, 2015 at 06:21
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Run multiple queries simultaneously

Hello,

I want to compare the running time of an query when run alone against the run 
time in presence of other queries.

What is the ideal setup required to run this experiment ? Should I have two 
Hive CLI's open and issue queries simultaneously ? How to script such 
experiment in Hive ?

Raajay


Re: HiveServer2 & Kerberos

2015-08-25 Thread Sergey Shelukhin
Sure!

From: Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, August 25, 2015 at 00:23
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: HiveServer2 & Kerberos

It is the case.
Would you like me to fill a JIRA about it ?

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-24 19:24 GMT+02:00 Sergey Shelukhin 
mailto:ser...@hortonworks.com>>:
If that is the case it sounds like a bug…

From: Jary Du mailto:jary...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Thursday, August 20, 2015 at 08:56
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: HiveServer2 & Kerberos

My understanding is that it will always ask you user/password even though you 
don’t need them. It is just the way how hive is setup.

On Aug 20, 2015, at 8:28 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

!connect 
jdbc:hive2://192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl<http://192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl>
 org.apache.hive.jdbc.HiveDriver
scan complete in 13ms
Connecting to 
jdbc:hive2://192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl<http://192.168.6.210:1/db;principal=hive/hiveh...@westeros.wl>
Enter password for 
jdbc:hive2://192.168.6.210:1/chaneldb;principal=hive/hiveh...@westeros.wl<http://192.168.6.210:1/chaneldb;principal=hive/hiveh...@westeros.wl>:

And if I press enter everything works perfectly, because I am using Kerberos 
authentication, that's actually why I was asking what is Hive asking for, 
because in my case, it seems that I shouldn't be asked for a password when 
connecting.

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-20 17:06 GMT+02:00 Jary Du 
mailto:jary...@gmail.com>>:
How does Beeline ask you? What happens if you just press enter?



On Aug 20, 2015, at 12:15 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

Indeed, I don't need the password, but why is Beeline asking me for one ? To 
what does it correspond ?

Thanks again,


Loïc

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-19 18:22 GMT+02:00 Jary Du 
mailto:jary...@gmail.com>>:
Correct me if I am wrong, my understanding is that after using kerberos 
authentication, you probably don’t need the password.

Hope it helps

Thanks,
Jary


On Aug 19, 2015, at 9:09 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

By the way, thanks a lot for your help, because your solution works, but I'm 
still interested in knowing what is the password I did not enter.

Thanks again,


Loïc

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-19 18:07 GMT+02:00 Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>>:
All right, but then, what is the password hive asks for ? Hive's one ? How do I 
know its value ?

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-19 17:51 GMT+02:00 Jary Du 
mailto:jary...@gmail.com>>:
For Beeline connection string, it should be "!connect 
jdbc:hive2://:/;principal=”. 
Please make sure it is the hive’s principal, not the user’s. And when you 
kinit, it should be kinit user’s keytab, not the hive’s keytab.





On Aug 19, 2015, at 8:46 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

Yeah, I forgot to mention it, but each time I did a kinit user/hive before 
launching beeline, as I read somewhere that Beeline does not handle Kerberos 
connection.

So, as I can make klist before launching beeline and having a good result, the 
problem does not come from this. Thanks a lot for your response though.
Do you have another idea ?

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

2015-08-19 17:42 GMT+02:00 Jary Du 
mailto:jary...@gmail.com>>:
"The Beeline client must have a valid Kerberos ticket in the ticket cache 
before attempting to connect." 
(http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_dataintegration/content/ch_using-hive-clients-examples.html)

So you need kinit first to have the valid Kerberos ticket int the ticket cache 
before using beeline to connect to HS2.

Jary

On Aug 19, 2015, at 8:36 AM, Loïc Chanel 
mailto:loic.cha...@telecomnancy.net>> wrote:

Hi again,

As I searched another way to make some requests with Kerberos enabled for 
security on HiveServer, I found that this request should do the same :
!connect 
jdbc:hive2://192.168.6.210:1/default

Re: sql mapjoin very slow

2015-08-27 Thread Sergey Shelukhin
Are you using MR and Tez? You could try optimized hash table in case of Tez, 
although it’s supposed to improve memory, not necessarily perf.

Can you also share characteristics of the query and data? It is surprising to 
see so much time for HashMap.get.

From: "r7raul1...@163.com" 
mailto:r7raul1...@163.com>>
Reply-To: user mailto:user@hive.apache.org>>
Date: Thursday, August 27, 2015 at 18:03
To: user mailto:user@hive.apache.org>>
Subject: sql mapjoin very slow


When I enable mapjoin ,I see Mapjoin task run very slow. My envrioment is 
hadoop 2.3.0 hive 1.1.0.

My attach is  one map hive log and this map's xprof log.

In map xprof log ,I see
Compiled + native Method
92.3% 643527 + 0 java.util.HashMap.get
2.8% 19856 + 0 java.util.HashMap.put
1.2% 8623 + 0 
org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper$GetAdaptor.setFromRow
0.1% 953 + 0 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate
0.1% 576 + 0 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject


r7raul1...@163.com


Re: sql mapjoin very slow

2015-08-27 Thread Sergey Shelukhin
Is the small-side table large, does it have a lot of rows for the same keys, or 
does it have a lot of skew?
Are there lots of misses (where there’d be no value in the small table for the 
large table value)?

If you have enough memory you can try increasing initial size and decreasing 
load factor. Although without low-level debugging it’s hard to tell if the 
issue is not obvious (I.e the above).
If there’s no obvious problem you might consider not using map join.


From: "r7raul1...@163.com<mailto:r7raul1...@163.com>" 
mailto:r7raul1...@163.com>>
Reply-To: user mailto:user@hive.apache.org>>
Date: Thursday, August 27, 2015 at 18:51
To: user mailto:user@hive.apache.org>>
Subject: Re: Re: sql mapjoin very slow

I use MR.
My mapjoin config as showed in follow picture:
[cid:_Foxmail.1@7f3eed6a-4406-fa48-f0a1-ec347b3ed46e]
[cid:_Foxmail.1@b40e2cf8-de17-015c-7e26-f2a86647137d]


r7raul1...@163.com<mailto:r7raul1...@163.com>

From: Sergey Shelukhin<mailto:ser...@hortonworks.com>
Date: 2015-08-28 09:21
To: user<mailto:user@hive.apache.org>
Subject: Re: sql mapjoin very slow
Are you using MR and Tez? You could try optimized hash table in case of Tez, 
although it’s supposed to improve memory, not necessarily perf.

Can you also share characteristics of the query and data? It is surprising to 
see so much time for HashMap.get.

From: "r7raul1...@163.com<mailto:r7raul1...@163.com>" 
mailto:r7raul1...@163.com>>
Reply-To: user mailto:user@hive.apache.org>>
Date: Thursday, August 27, 2015 at 18:03
To: user mailto:user@hive.apache.org>>
Subject: sql mapjoin very slow


When I enable mapjoin ,I see Mapjoin task run very slow. My envrioment is 
hadoop 2.3.0 hive 1.1.0.

My attach is  one map hive log and this map's xprof log.

In map xprof log ,I see
Compiled + native Method
92.3% 643527 + 0 java.util.HashMap.get
2.8% 19856 + 0 java.util.HashMap.put
1.2% 8623 + 0 
org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper$GetAdaptor.setFromRow
0.1% 953 + 0 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate
0.1% 576 + 0 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject


r7raul1...@163.com<mailto:r7raul1...@163.com>


Re: sql mapjoin very slow

2015-08-28 Thread Sergey Shelukhin
Can you check if this is actually being used in your case?

From: "r7raul1...@163.com<mailto:r7raul1...@163.com>" 
mailto:r7raul1...@163.com>>
Reply-To: user mailto:user@hive.apache.org>>
Date: Friday, August 28, 2015 at 00:53
To: user mailto:user@hive.apache.org>>
Subject: Re: Re: sql mapjoin very slow

I found a method in HashMapWrapper class .  I think hive will use statistics  
to adjust threshold automatically.
public static int calculateTableSize(
float keyCountAdj, int threshold, float loadFactor, long keyCount) {
if (keyCount >= 0 && keyCountAdj != 0) {
// We have statistics for the table. Size appropriately.
threshold = (int)Math.ceil(keyCount / (keyCountAdj * loadFactor));
}
LOG.info("Key count from statistics is " + keyCount + "; setting map size to " 
+ threshold);
return threshold;
}
I have a question. I use hive 1.1.0 ,so hive.stats.dbclass default value is fs. 
Mean store statistics in local filesystem.  Any one can tell what is the  file 
path to store statistics ?


r7raul1...@163.com<mailto:r7raul1...@163.com>

From: r7raul1...@163.com<mailto:r7raul1...@163.com>
Date: 2015-08-28 13:03
To: user<mailto:user@hive.apache.org>
Subject: Re: Re: sql mapjoin very slow
I increase hive.hashtable.initialCapacity to 100 and decrease 
hive.hashtable.loadfactor to 0.5  .  The query run faster.

________
r7raul1...@163.com<mailto:r7raul1...@163.com>

From: Sergey Shelukhin<mailto:ser...@hortonworks.com>
Date: 2015-08-28 09:56
To: user<mailto:user@hive.apache.org>
Subject: Re: sql mapjoin very slow
Is the small-side table large, does it have a lot of rows for the same keys, or 
does it have a lot of skew?
Are there lots of misses (where there’d be no value in the small table for the 
large table value)?

If you have enough memory you can try increasing initial size and decreasing 
load factor. Although without low-level debugging it’s hard to tell if the 
issue is not obvious (I.e the above).
If there’s no obvious problem you might consider not using map join.


From: "r7raul1...@163.com<mailto:r7raul1...@163.com>" 
mailto:r7raul1...@163.com>>
Reply-To: user mailto:user@hive.apache.org>>
Date: Thursday, August 27, 2015 at 18:51
To: user mailto:user@hive.apache.org>>
Subject: Re: Re: sql mapjoin very slow

I use MR.
My mapjoin config as showed in follow picture:
[cid:_Foxmail.1@fc8fdd99-c32a-61d6-2d6d-06e990b5a482]
[cid:_Foxmail.1@dc2474ea-3f79-4654-d985-f29b19ae8a3e]


r7raul1...@163.com<mailto:r7raul1...@163.com>

From: Sergey Shelukhin<mailto:ser...@hortonworks.com>
Date: 2015-08-28 09:21
To: user<mailto:user@hive.apache.org>
Subject: Re: sql mapjoin very slow
Are you using MR and Tez? You could try optimized hash table in case of Tez, 
although it’s supposed to improve memory, not necessarily perf.

Can you also share characteristics of the query and data? It is surprising to 
see so much time for HashMap.get.

From: "r7raul1...@163.com<mailto:r7raul1...@163.com>" 
mailto:r7raul1...@163.com>>
Reply-To: user mailto:user@hive.apache.org>>
Date: Thursday, August 27, 2015 at 18:03
To: user mailto:user@hive.apache.org>>
Subject: sql mapjoin very slow


When I enable mapjoin ,I see Mapjoin task run very slow. My envrioment is 
hadoop 2.3.0 hive 1.1.0.

My attach is  one map hive log and this map's xprof log.

In map xprof log ,I see
Compiled + native Method
92.3% 643527 + 0 java.util.HashMap.get
2.8% 19856 + 0 java.util.HashMap.put
1.2% 8623 + 0 
org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper$GetAdaptor.setFromRow
0.1% 953 + 0 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate
0.1% 576 + 0 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject


r7raul1...@163.com<mailto:r7raul1...@163.com>


Re: [ANNOUNCE] New Hive Committer - Lars Francke

2015-09-08 Thread Sergey Shelukhin
Congrats!

From: Daniel Lopes mailto:dan...@bankfacil.com.br>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, September 8, 2015 at 15:02
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "kulkarni.swar...@gmail.com" 
mailto:kulkarni.swar...@gmail.com>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Subject: Re: [ANNOUNCE] New Hive Committer - Lars Francke

Congrats!

Daniel Lopes, B.Eng
Data Scientist - BankFacil
CREA/SP 
5069410560
Mob +55 (18) 99764-2733
Ph +55 (11) 3522-8009
http://about.me/dannyeuu

Av. Nova Independência, 956, São Paulo, SP
Bairro Brooklin Paulista
CEP 04570-001
https://www.bankfacil.com.br


On Tue, Sep 8, 2015 at 6:34 PM, Lars Francke 
mailto:lars.fran...@gmail.com>> wrote:
Thank you so much everyone!

Looking forward to continue working with all of you.

On Tue, Sep 8, 2015 at 3:26 AM, 
kulkarni.swar...@gmail.com 
mailto:kulkarni.swar...@gmail.com>> wrote:
Congrats!

On Mon, Sep 7, 2015 at 3:54 AM, Carl Steinbach 
mailto:c...@apache.org>> wrote:
The Apache Hive PMC has voted to make Lars Francke a committer on the Apache 
Hive Project.

Please join me in congratulating Lars!

Thanks.

- Carl




--
Swarnim




Re: mapjoin with left join

2015-09-11 Thread Sergey Shelukhin
As far as I know it’s not currently supported.
The large table will be streamed in multiple tasks with the small table in 
memory, so there’s not one place that knows for sure there was no row in the 
large table for a particular small table row in any of the locations. It could 
have no match in one task but a match in other task.
You can try rewriting the query as inner join unioned with not in, but “not in” 
might still be slow…
IIRC there was actually a JIRA to solve this, but no work has been done so far.

From: Steve Howard mailto:stevedhow...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, September 11, 2015 at 09:48
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: mapjoin with left join

We would like to utilize mapjoin for the following SQL construct:

select small.* from small s left join large l on s.id = 
l.id where l.id is null;

We can easily fit small into RAM, but large is over 1TB according to optimizer 
stats. Unless we set hive.auto.convert.join.noconditionaltask.size = to at 
least the size of "large", the optimizer falls back to a common map join, which 
is incredibly slow.

Given the fact it is a left join, which means we won't always have rows in 
large for each row in small, is this behavior expected? Could it be that 
reading the large table would miss the new rows in small, so the large one has 
to be the one that is probed for matches?

We simply want to load the 81K rows in to RAM, then for each row in large, 
check the small hash table and if it the row in small is not in large, then add 
it to large.

Again, the optimizer will use a mapjoin if we set 
hive.auto.convert.join.noconditionaltask.size = 1TB (the size of the large 
table). This is of course, not practical. The small table is only 50MB.

At the link below is the entire test case with two tables, one of which has 
three rows and other has 96. We can duplicate it with tables this small, which 
leads me to believe I am missing something, or this is a bug.

The link has the source code that shows each table create, as well as the 
explain with an argument for hive.auto.convert.join.noconditionaltask.size that 
is passed at the command line. The output shows a mergejoin when the 
hive.auto.convert.join.noconditionaltask.size size is less than 192 (the size 
of the larger table), and a mapjoin when 
hive.auto.convert.join.noconditionaltask.size is larger than 192 (large table 
fits).

http://pastebin.com/Qg6hb8yV

The business case is loading only new rows into a large fact table.  The new 
rows are the ones that are small in number.


Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

2015-09-16 Thread Sergey Shelukhin
Congrats!

From: Alpesh Patel mailto:alpeshrpa...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, September 16, 2015 at 13:24
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

Congratulations Ashutosh

On Wed, Sep 16, 2015 at 1:23 PM, Pengcheng Xiong 
mailto:pxi...@apache.org>> wrote:
Congratulations Ashutosh!

On Wed, Sep 16, 2015 at 1:17 PM, John Pullokkaran 
mailto:jpullokka...@hortonworks.com>> wrote:
Congrats Ashutosh!

From: Vaibhav Gumashta 
mailto:vgumas...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, September 16, 2015 at 1:01 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Cc: Ashutosh Chauhan mailto:hashut...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

Congrats Ashutosh!

—Vaibhav

From: Prasanth Jayachandran 
mailto:pjayachand...@hortonworks.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, September 16, 2015 at 12:50 PM
To: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, 
"user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, Ashutosh Chauhan 
mailto:hashut...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

Congratulations Ashutosh!





On Wed, Sep 16, 2015 at 12:48 PM -0700, "Xuefu Zhang" 
mailto:xzh...@cloudera.com>> wrote:

Congratulations, Ashutosh!. Well-deserved.

Thanks to Carl also for the hard work in the past few years!

--Xuefu

On Wed, Sep 16, 2015 at 12:39 PM, Carl Steinbach 
mailto:c...@apache.org>> wrote:

> I am very happy to announce that Ashutosh Chauhan is taking over as the
> new VP of the Apache Hive project. Ashutosh has been a longtime contributor
> to Hive and has played a pivotal role in many of the major advances that
> have been made over the past couple of years. Please join me in
> congratulating Ashutosh on his new role!
>




Re: Q: UDFs & Threading

2015-09-17 Thread Sergey Shelukhin
1 – I don’t believe there’s anything like that… I could be mistaken. You can 
file a JIRA and add a patch, seems like it could be useful. I am not very 
familiar with user UDF lifecycle; IIRC, the object could live longer than the 
query runtime though; is it ok for your use case if the handle to the library 
is kept alive and reused between queries? The UDF object doesn’t seem like the 
most natural place for query completion notification…
2 – as of now, nothing will access the UDF from multiple threads. However, with 
HIVE-7926, it would change. Note that this feature will be off by default.

From: Chris Losinger mailto:chris.losin...@sas.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, September 17, 2015 at 09:16
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Q: UDFs & Threading


Hi,

I'm writing some Hive UDFs, using JNI to talk to a native C library. The C 
library requires some expensive initialization, and maintains its internal 
state via a handle. To avoid re-initializing this library at every row, I 
initialize the library on the first row, then store the handle as a static 
variable in the Java world and fetch that for subsequent rows. This is all 
working fine.

The tough part is that the library also requires the caller to do cleanup, to 
release that internal state. Being Java, there are no destructors, of course. 
And I can't rely on 'finalize'. So I can't figure out where to clean up this 
library.

Q 1: Is there anything in the Hive + UDF world that will tell my Java code when 
the query is finished, so that I can cleanup that library? Or, is there any 
Java mechanism that I can use to do this?

I'm using the 'UDF' class not 'GenericUDF', but I don't think that matters. I 
don't see anything in either that looks like a cleanup, and GenericUDF's 
'close' doesn't ever get called, AFAICT.

Q 2: Because I’m storing the library’s internal state handle as a static 
variable in the Java code, it would be available to any threads that use the 
Java code. That would be a problem. So, my question is: Will a single UDF 
instance ever be accessed by more than one thread ? In other words, are UDFs 
thread-safe ? Even if the query contains multiple UDF calls ? I need to know if 
my assumption about being able to store this C-library’s state as a Java 
‘static’ is a safe assumption or not.



Thanks in advance

-c



Re: Hive Start Up Time Manifolds Greater than Execution Time

2015-09-18 Thread Sergey Shelukhin
Which version of the Hive, and file format, are you using?
It could be either reading file footers for ORC - in recent version there’s way 
to disable that (set hive.exec.orc.split.strategy=BI); or some similar feature 
for other formats that I’m not immediately familiar with.
It could also be slow metastore calls.

From: Sreenath mailto:sreenaths1...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, September 18, 2015 at 02:24
To: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, 
"user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Hive Start Up Time Manifolds Greater than Execution Time

Hi All,

Something interesting fell to my notice last day when i was using hive for some 
queries. The time taken by hive to launch a mapreduce job was manifolds higher 
than the time taken by hadoop to actually execute it.
This is the table details on which the query is being fired.

CREATE EXTERNAL TABLE A
(
user_id string,
stage strig,
url string
)
PARTITIONED BY (dt string , id string)

All the data for table is stored in S3 and each day there will be around 2000 
unique id i.e 2000 partitions being added daily. And we can assume that each 
partition has on a average 100MB gzip compressed data.
Now when I run a query like "SELECT DISTINCT user_id FROM A  WHERE 
dt>='20150101' and dt <= '20150401'" ie over a period of 3 months approx 6 
partitions it takes hive approximately 2 hrs to launch the map reduce job and 
the launched job just finishes in 20 min. So was wondering if someone can help 
me in understanding what hive is doing in this 2 hrs ?
Would really appreciate some help here . Thanks in advance 


Best,
Sreenath



Re: Hive Start Up Time Manifolds Greater than Execution Time

2015-09-18 Thread Sergey Shelukhin
Actually, on 2nd though, even listing directories (which is necessary to
launch the job) could take long.
If there are any client logs, you can try to take a look to see where the
time is spent.
If you are running under Hive CLI, the logs would be in
/tmp/$USER/hive.log by default.

On 15/9/18, 11:46, "Sergey Shelukhin"  wrote:

>Which version of the Hive, and file format, are you using?
>It could be either reading file footers for ORC - in recent version
>there’s way to disable that (set hive.exec.orc.split.strategy=BI); or
>some similar feature for other formats that I’m not immediately familiar
>with.
>It could also be slow metastore calls.
>
>From: Sreenath mailto:sreenaths1...@gmail.com>>
>Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>"
>mailto:user@hive.apache.org>>
>Date: Friday, September 18, 2015 at 02:24
>To: "d...@hive.apache.org<mailto:d...@hive.apache.org>"
>mailto:d...@hive.apache.org>>,
>"user@hive.apache.org<mailto:user@hive.apache.org>"
>mailto:user@hive.apache.org>>
>Subject: Hive Start Up Time Manifolds Greater than Execution Time
>
>Hi All,
>
>Something interesting fell to my notice last day when i was using hive
>for some queries. The time taken by hive to launch a mapreduce job was
>manifolds higher than the time taken by hadoop to actually execute it.
>This is the table details on which the query is being fired.
>
>CREATE EXTERNAL TABLE A
>(
>user_id string,
>stage strig,
>url string
>)
>PARTITIONED BY (dt string , id string)
>
>All the data for table is stored in S3 and each day there will be around
>2000 unique id i.e 2000 partitions being added daily. And we can assume
>that each partition has on a average 100MB gzip compressed data.
>Now when I run a query like "SELECT DISTINCT user_id FROM A  WHERE
>dt>='20150101' and dt <= '20150401'" ie over a period of 3 months approx
>6 partitions it takes hive approximately 2 hrs to launch the map
>reduce job and the launched job just finishes in 20 min. So was wondering
>if someone can help me in understanding what hive is doing in this 2 hrs ?
>Would really appreciate some help here . Thanks in advance 
>
>
>Best,
>Sreenath
>



Re: ORA-8177 with Hive transactions

2015-09-18 Thread Sergey Shelukhin
There’s HIVE-11831 and 
https://issues.apache.org/jira/browse/HIVE-11833 that try to address this.
We can do a patch similar to the first one; can you file a JIRA?

From: Steve Howard mailto:stevedhow...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, September 18, 2015 at 10:54
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: ORA-8177 with Hive transactions

While troubleshooting an issue with transactions shortly after enabling them, I 
noticed the following in an Oracle trace, which is our metastore for hive...

ORA-8177: can't serialize access for this transaction

These were thrown on "insert into HIVE_LOCKS..."

Traditionally in Oracle, if an application actually needs serializable 
transactions, the fix is to to set initrans and maxtrans to the number of 
concurrent writers.  When I ran what is below on a table similar to HIVE_LOCKS, 
this exception was thrown everywhere.  The fix is to recreate the table with 
higher values for initrans (only 1 is the default for initrans, and 255 is the 
default for maxtrans).  When I did this and re-ran what is below, the 
exceptions were no longer thrown.

Does anyone have any feedback on this performance hint?  The exceptions in hive 
are thrown from the checkRetryable method in the TxnHandler class, but I 
couldn't find what class.method throws them.  Perhaps the exceptions are not 
impactful, but given the fact the method expects them as it checks for the 
string in the exception message, I thought I would ask for feedback before we 
recreate the HIVE_LOCKS table with a higher value for INITRANS.


import java.sql.*;public class testLock implements Runnable {
  public static void main (String[] args) throws Exception {
Class.forName("oracle.jdbc.driver.OracleDriver");
for (int i = 1; i <= 100; i++) {
  testLock tl = new testLock();
}
  }

  public testLock() {
Thread t = new Thread(this);
t.start();
  }

  public void run() {
try {
  Connection conn = 
DriverManager.getConnection("jdbc:oracle:thin:username/pwd@dbhost:1521/dbservice");
  conn.createStatement().execute("alter session set isolation_level = 
serializable");
  PreparedStatement pst = conn.prepareStatement("update test set a = ?");
  for (int j = 1; j <= 1; j++) {
pst.setInt(1,j);
pst.execute();
conn.commit();
System.out.println("worked");
  }
}
catch (Exception e) {
  System.out.println(e.getMessage());
}
  }}


Re: Hive s3 external table with sub directories

2015-10-22 Thread Sergey Shelukhin
I don’t think Hive picks up partitions automatically in this scenario. Maybe a 
ticket could be added to add partitions based on some additional syntax, as 
this seems to be an occasionally used scenario. I’ve seen msck used as a hack 
to “restore” partitions into metastore (it will find the directories and create 
the partitions if all goes well), note that new partitions also won’t be picked 
up.
Make sure to try it first on test directory so you could see if it works for 
you.

From: Hafiz Mujadid mailto:hafizmujadi...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, October 22, 2015 at 03:57
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Hive s3 external table with sub directories


I have following s3 directory structure.

Data/
   Year=2015/
 Month=01/
Day=01/
files
Day=02/
files
 Month=02/
Day=01/
files
Day=02/
files
 .
 .
 .

   Year=2014/
 Month=01/
Day=01/
files
Day=02/
files
 Month=02/
Day=01/
files
Day=02/
files
So i am creating hive external table as follow

CREATE external TABLE trips
(
 trip_id  STRING,probe_id STRING,provider_id STRING,
 is_moving TINYINT,is_completed BOOLEAN,start_time STRING,
 start_lat  DOUBLE,start_lon DOUBLE,start_lat_adj DOUBLE)
  PARTITIONED BY (year INT,month INT,day INT)
  STORED AS TEXTFILE
  LOCATION 's3n://accesskey:secretkey@bucket/data/';

When i run query on this table no data is returned without any exception. If i 
place same files in one directory only and without partitioning, then it runs 
fine. I also tried bey setting

set mapred.input.dir.recursive=true;
set hive.mapred.supports.subdirectories=true;

Any idea where i am wrong?


Re: SemanticException Unable to fetch table t. null

2015-11-03 Thread Sergey Shelukhin
This error is not related to the issue, it’s just an old DB type check that 
runs db-specific queries and sees which ones don’t fail. It has been fixed to 
not use db-specific queries lately.

From: Mich Talebzadeh mailto:m...@peridale.co.uk>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, November 3, 2015 at 07:59
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: RE: SemanticException Unable to fetch table t. null

Correct.

I am getting this error in hive.log

2015-11-03 11:20:28,470 INFO  [main]: hive.metastore 
(HiveMetaStoreClient.java:open(319)) - Trying to connect to metastore with URI 
thrift://localhost:9083
2015-11-03 11:20:28,471 INFO  [main]: hive.metastore 
(HiveMetaStoreClient.java:open(410)) - Connected to metastore.
2015-11-03 11:20:28,471 INFO  [pool-3-thread-83]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(713)) - 59: source:127.0.0.1 get_table : 
db=asehadoop tbl=t
2015-11-03 11:20:28,476 INFO  [pool-3-thread-83]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(339)) - ugi=hduser  ip=127.0.0.1
cmd=source:127.0.0.1 get_table : db=asehadoop tbl=t
2015-11-03 11:20:28,476 INFO  [pool-3-thread-83]: metastore.HiveMetaStore 
(HiveMetaStore.java:newRawStore(556)) - 59: Opening raw store with 
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-11-03 11:20:28,476 INFO  [pool-3-thread-83]: metastore.ObjectStore 
(ObjectStore.java:initialize(264)) - ObjectStore, initialize called
2015-11-03 11:20:28,478 INFO  [pool-3-thread-83]: metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:(109)) - MySQL check failed, assuming we are not 
on mysql: ORA-00922: missing or invalid option


My metastore is Oracle and this indicates

ORA-00922:

missing or invalid option

Cause:

An invalid option was specified in defining a column or storage clause. The 
valid option in specifying a column is NOT NULL to specify that the column 
cannot contain any NULL values. Only constraints may follow the datatype. 
Specifying a maximum length on a DATE or LONG datatype also causes this error.


So I may have to hack the metadata manually.


Mich Talebzadeh

Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

From: Rajkumar Singh [mailto:rkstech.j...@gmail.com]
Sent: 03 November 2015 11:11
To: user@hive.apache.org
Subject: Re: SemanticException Unable to fetch table t. null

it seems that you are having a problem metastore.

On Tue, Nov 3, 2015 at 1:46 PM, Mich Talebzadeh 
mailto:m...@peridale.co.uk>> wrote:
Hi,

Has anyone got a quick fix for dropping such table please?

hive> drop table t;
FAILED: SemanticException Unable to fetch table t. null
hive> desc t;
FAILED: SemanticException Unable to fetch table t. null

Thanks,

Mich Talebzadeh

Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so

Re: hive metastore update from 0.12 to 1.0

2015-11-03 Thread Sergey Shelukhin
Is your metastore schema version actually 0.12? The upgrade should be able to 
determine the schema version, so there should be no need to force it to use 
0.12.

From: Sanjeev Verma 
mailto:sanjeev.verm...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, November 3, 2015 at 02:56
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: hive metastore update from 0.12 to 1.0

Hi

I am trying to update the metastore using schematool but getting error

schematool -dbType derby -upgradeSchemaFrom 0.12

Upgrade script upgrade-0.12.0-to-0.13.0.derby.sql
Error: Table/View 'TXNS' already exists in Schema 'APP'. 
(state=X0Y32,code=3)
org.apache.hadoop.hive.metastore.HiveMetaException: Upgrade FAILED! Metastore 
state would be inconsistent !!

could you let me know whats going wrong here

Thanks


Re: create table as ORC with SORTED BY fails

2015-11-03 Thread Sergey Shelukhin
IIRC sorted by is only supported on bucketed tables.

From: Mich Talebzadeh mailto:m...@peridale.co.uk>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, November 3, 2015 at 01:23
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: create table as ORC with SORTED BY fails

Hi,

Any idea why this simple create table fails?

hive> create table test (
>  owner   varchar(30)
> ,object_name varchar(30)
> ,object_id   bigint
> )
> SORTED BY (object_id)
> STORED AS ORC
>  TBLPROPERTIES ("orc.compress"="SNAPPY"
> ,"orc.create,index"="true");
FAILED: ParseException line 6:0 missing EOF at 'SORTED' near ')'

If I remove SORTED BY clause it works!

Thanks

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.



Re: SemanticException Unable to fetch table t. null

2015-11-03 Thread Sergey Shelukhin
https://issues.apache.org/jira/browse/HIVE-11123

However the point is, it’s not a real error, i.e. not the cause of this bug.

From: Mich Talebzadeh mailto:m...@peridale.co.uk>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, November 3, 2015 at 12:07
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: RE: SemanticException Unable to fetch table t. null

Thanks where is it fixed? Is there any patch available.



Mich Talebzadeh

Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

From: Sergey Shelukhin [mailto:ser...@hortonworks.com]
Sent: 03 November 2015 19:52
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: SemanticException Unable to fetch table t. null

This error is not related to the issue, it’s just an old DB type check that 
runs db-specific queries and sees which ones don’t fail. It has been fixed to 
not use db-specific queries lately.

From: Mich Talebzadeh mailto:m...@peridale.co.uk>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, November 3, 2015 at 07:59
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: RE: SemanticException Unable to fetch table t. null

Correct.

I am getting this error in hive.log

2015-11-03 11:20:28,470 INFO  [main]: hive.metastore 
(HiveMetaStoreClient.java:open(319)) - Trying to connect to metastore with URI 
thrift://localhost:9083
2015-11-03 11:20:28,471 INFO  [main]: hive.metastore 
(HiveMetaStoreClient.java:open(410)) - Connected to metastore.
2015-11-03 11:20:28,471 INFO  [pool-3-thread-83]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(713)) - 59: source:127.0.0.1 get_table : 
db=asehadoop tbl=t
2015-11-03 11:20:28,476 INFO  [pool-3-thread-83]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(339)) - ugi=hduser  ip=127.0.0.1
cmd=source:127.0.0.1 get_table : db=asehadoop tbl=t
2015-11-03 11:20:28,476 INFO  [pool-3-thread-83]: metastore.HiveMetaStore 
(HiveMetaStore.java:newRawStore(556)) - 59: Opening raw store with 
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-11-03 11:20:28,476 INFO  [pool-3-thread-83]: metastore.ObjectStore 
(ObjectStore.java:initialize(264)) - ObjectStore, initialize called
2015-11-03 11:20:28,478 INFO  [pool-3-thread-83]: metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:(109)) - MySQL check failed, assuming we are not 
on mysql: ORA-00922: missing or invalid option


My metastore is Oracle and this indicates

ORA-00922:

missing or invalid option

Cause:

An invalid option was specified in defining a column or storage clause. The 
valid option in specifying a column is NOT NULL to specify that the column 
cannot contain any NULL values. Only constraints may follow the datatype. 
Specifying a maximum length on a DATE or LONG datatype also causes this error.


So I may have to hack the metadata manually.


Mich Talebzadeh

Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

NOTE: The information in

Re: SemanticException Unable to fetch table t. null

2015-11-03 Thread Sergey Shelukhin
Nice catch! Can you file a JIRA?

From: Mich Talebzadeh mailto:m...@peridale.co.uk>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, November 3, 2015 at 13:17
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: RE: SemanticException Unable to fetch table t. null

The cause of the problem was passing a null value in table creation attribute 
"orc.bloom.filter.columns"="", as blank

create table t (
owner   varchar(30)
,object_name varchar(30)
,subobject_name  varchar(30)
,object_id   bigint
,data_object_id  bigint
,-
,op_time timestamp
)
CLUSTERED BY (object_id) INTO 256 BUCKETS
STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY",
"orc.create.index"="true",
"orc.bloom.filter.columns"="",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )

That creates the table on the face of it OK but crucially any further operation 
on the table comes back with an error message including DROP TABLE

FAILED: SemanticException Unable to fetch table t. null

I believe the table should not be created in the first place with hive.

HTH

Mich Talebzadeh

Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: 03 November 2015 20:08
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: SemanticException Unable to fetch table t. null

Thanks where is it fixed? Is there any patch available.



Mich Talebzadeh

Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

From: Sergey Shelukhin [mailto:ser...@hortonworks.com]
Sent: 03 November 2015 19:52
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: SemanticException Unable to fetch table t. null

This error is not related to the issue, it’s just an old DB type check that 
runs db-specific queries and sees which ones don’t fail. It has been fixed to 
not use db-specific queries lately.

From: Mich Talebzadeh mailto:m...@peridale.co.uk>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, November 3, 2015 at 07:59
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: RE: SemanticException Unable to fetch table t. null

Correct.

I am getting this 

Re: [VOTE] Hive 2.0 release plan

2015-11-16 Thread Sergey Shelukhin
Including the user list.

On 15/11/13, 17:54, "Lefty Leverenz"  wrote:

>The Hive bylaws require this to be submitted on the user@hive mailing list
>(even though users don't get to vote).  See Release Plan in Actions
><https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions>.
>
>-- Lefty
>
>On Fri, Nov 13, 2015 at 7:33 PM, Thejas Nair 
>wrote:
>
>> +1
>>
>> On Fri, Nov 13, 2015 at 2:26 PM, Vaibhav Gumashta
>>  wrote:
>> > +1
>> >
>> > Thanks,
>> > --Vaibhav
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Nov 13, 2015 at 2:24 PM -0800, "Tristram de Lyones" <
>> delyo...@gmail.com<mailto:delyo...@gmail.com>> wrote:
>> >
>> > +1
>> >
>> > On Fri, Nov 13, 2015 at 1:38 PM, Sergey Shelukhin <
>> ser...@hortonworks.com>
>> > wrote:
>> >
>> >> Hi.
>> >> With no strong objections on DISCUSS thread, some issues raised and
>> >> addressed, and a reminder from Carl about the bylaws for the release
>> >> process, I propose we release the first version of Hive 2 (2.0), and
>> >> nominate myself as release manager.
>> >> The goal is to have the first release of Hive with aggressive set of
>>new
>> >> features, some of which are ready to use and some are at experimental
>> >> stage and will be developed in future Hive 2 releases, in line with
>>the
>> >> Hive-1-Hive-2 split discussion.
>> >> If the vote passes, the timeline to create a branch should be around
>>the
>> >> end of next week (to minimize merging in the wake of the release),
>>and
>> the
>> >> timeline to release would be around the end of November, depending on
>> the
>> >> issues found during the RC cutting process, as usual.
>> >>
>> >> Please vote:
>> >> +1 proceed with the release plan
>> >> +-0 don’t care
>> >> -1 don’t proceed with the release plan, for such and such reasons
>> >>
>> >> The vote will run for 3 days.
>> >>
>> >>
>>



Re: [VOTE] Hive 2.0 release plan

2015-11-16 Thread Sergey Shelukhin
With 8 binding +1s and 0 -1s the vote passes.
The release activities will now proceed according to the plan. I will look
at the features that are targeted at 2.0 release and create the branch
~EOW balancing  the waiting for large commits and avoiding too much delay.

On 15/11/16, 10:32, "Sergey Shelukhin"  wrote:

>Including the user list.
>
>On 15/11/13, 17:54, "Lefty Leverenz"  wrote:
>
>>The Hive bylaws require this to be submitted on the user@hive mailing
>>list
>>(even though users don't get to vote).  See Release Plan in Actions
>><https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions>.
>>
>>-- Lefty
>>
>>On Fri, Nov 13, 2015 at 7:33 PM, Thejas Nair 
>>wrote:
>>
>>> +1
>>>
>>> On Fri, Nov 13, 2015 at 2:26 PM, Vaibhav Gumashta
>>>  wrote:
>>> > +1
>>> >
>>> > Thanks,
>>> > --Vaibhav
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Fri, Nov 13, 2015 at 2:24 PM -0800, "Tristram de Lyones" <
>>> delyo...@gmail.com<mailto:delyo...@gmail.com>> wrote:
>>> >
>>> > +1
>>> >
>>> > On Fri, Nov 13, 2015 at 1:38 PM, Sergey Shelukhin <
>>> ser...@hortonworks.com>
>>> > wrote:
>>> >
>>> >> Hi.
>>> >> With no strong objections on DISCUSS thread, some issues raised and
>>> >> addressed, and a reminder from Carl about the bylaws for the release
>>> >> process, I propose we release the first version of Hive 2 (2.0), and
>>> >> nominate myself as release manager.
>>> >> The goal is to have the first release of Hive with aggressive set of
>>>new
>>> >> features, some of which are ready to use and some are at
>>>experimental
>>> >> stage and will be developed in future Hive 2 releases, in line with
>>>the
>>> >> Hive-1-Hive-2 split discussion.
>>> >> If the vote passes, the timeline to create a branch should be around
>>>the
>>> >> end of next week (to minimize merging in the wake of the release),
>>>and
>>> the
>>> >> timeline to release would be around the end of November, depending
>>>on
>>> the
>>> >> issues found during the RC cutting process, as usual.
>>> >>
>>> >> Please vote:
>>> >> +1 proceed with the release plan
>>> >> +-0 don’t care
>>> >> -1 don’t proceed with the release plan, for such and such reasons
>>> >>
>>> >> The vote will run for 3 days.
>>> >>
>>> >>
>>>
>



Re: [VOTE] Hive 2.0 release plan

2015-11-19 Thread Sergey Shelukhin
Hmm. I looked at the JIRAs targeting the release and it looks like there’s
large number of features still pending.
I am going to postpone creating the branch to next week.
I am also going to unassign JIRAs from the release at that time.

On 15/11/16, 18:09, "Sergey Shelukhin"  wrote:

>With 8 binding +1s and 0 -1s the vote passes.
>The release activities will now proceed according to the plan. I will look
>at the features that are targeted at 2.0 release and create the branch
>~EOW balancing  the waiting for large commits and avoiding too much delay.
>
>On 15/11/16, 10:32, "Sergey Shelukhin"  wrote:
>
>>Including the user list.
>>
>>On 15/11/13, 17:54, "Lefty Leverenz"  wrote:
>>
>>>The Hive bylaws require this to be submitted on the user@hive mailing
>>>list
>>>(even though users don't get to vote).  See Release Plan in Actions
>>><https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions>
>>>.
>>>
>>>-- Lefty
>>>
>>>On Fri, Nov 13, 2015 at 7:33 PM, Thejas Nair 
>>>wrote:
>>>
>>>> +1
>>>>
>>>> On Fri, Nov 13, 2015 at 2:26 PM, Vaibhav Gumashta
>>>>  wrote:
>>>> > +1
>>>> >
>>>> > Thanks,
>>>> > --Vaibhav
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Nov 13, 2015 at 2:24 PM -0800, "Tristram de Lyones" <
>>>> delyo...@gmail.com<mailto:delyo...@gmail.com>> wrote:
>>>> >
>>>> > +1
>>>> >
>>>> > On Fri, Nov 13, 2015 at 1:38 PM, Sergey Shelukhin <
>>>> ser...@hortonworks.com>
>>>> > wrote:
>>>> >
>>>> >> Hi.
>>>> >> With no strong objections on DISCUSS thread, some issues raised and
>>>> >> addressed, and a reminder from Carl about the bylaws for the
>>>>release
>>>> >> process, I propose we release the first version of Hive 2 (2.0),
>>>>and
>>>> >> nominate myself as release manager.
>>>> >> The goal is to have the first release of Hive with aggressive set
>>>>of
>>>>new
>>>> >> features, some of which are ready to use and some are at
>>>>experimental
>>>> >> stage and will be developed in future Hive 2 releases, in line with
>>>>the
>>>> >> Hive-1-Hive-2 split discussion.
>>>> >> If the vote passes, the timeline to create a branch should be
>>>>around
>>>>the
>>>> >> end of next week (to minimize merging in the wake of the release),
>>>>and
>>>> the
>>>> >> timeline to release would be around the end of November, depending
>>>>on
>>>> the
>>>> >> issues found during the RC cutting process, as usual.
>>>> >>
>>>> >> Please vote:
>>>> >> +1 proceed with the release plan
>>>> >> +-0 don’t care
>>>> >> -1 don’t proceed with the release plan, for such and such reasons
>>>> >>
>>>> >> The vote will run for 3 days.
>>>> >>
>>>> >>
>>>>
>>
>



Re: [ANNOUNCE] New PMC Member : John Pullokkaran

2015-11-24 Thread Sergey Shelukhin
Congrats!

From: Jimmy Xiang mailto:jxi...@cloudera.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, November 24, 2015 at 15:07
To: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Cc: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: [ANNOUNCE] New PMC Member : John Pullokkaran

Congrats!!

On Tue, Nov 24, 2015 at 3:04 PM, Szehon Ho 
mailto:sze...@cloudera.com>> wrote:
Congratulations!

On Tue, Nov 24, 2015 at 3:02 PM, Xuefu Zhang 
mailto:xzh...@cloudera.com>> wrote:

> Congratulations, John!
>
> --Xuefu
>
> On Tue, Nov 24, 2015 at 3:01 PM, Prasanth J 
> mailto:j.prasant...@gmail.com>>
> wrote:
>
>> Congratulations and Welcome John!
>>
>> Thanks
>> Prasanth
>>
>> On Nov 24, 2015, at 4:59 PM, Ashutosh Chauhan 
>> mailto:hashut...@apache.org>>
>> wrote:
>>
>> On behalf of the Hive PMC I am delighted to announce John Pullokkaran is
>> joining Hive PMC.
>> John is a long time contributor in Hive and is focusing on compiler and
>> optimizer areas these days.
>> Please give John a warm welcome to the project!
>>
>> Ashutosh
>>
>>
>>
>



Re: [VOTE] Hive 2.0 release plan

2015-11-30 Thread Sergey Shelukhin
So far there’s no plan for Hive 1.3. If someone from committer (PMC? I don’t 
recall) volunteers they can do the release, should not be difficult, although 
there are some JIRAs that would need backporting I think (I marked them with 
target version recently).

In fact 2.0 branch creation is delayed because we are waiting for replacement 
branch for master :(


From: John Omernik mailto:j...@omernik.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Monday, November 30, 2015 at 09:25
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Cc: Gopal Vijayaraghavan mailto:go...@hortonworks.com>>, 
"d...@hive.apache.org<mailto:d...@hive.apache.org>" 
mailto:d...@hive.apache.org>>
Subject: Re: Re: [VOTE] Hive 2.0 release plan

Agreed, any plans for Hive 1.3?  Will Hive 2.0 be a breaking release for those 
running 1.x?




On Sun, Nov 15, 2015 at 7:07 PM, Wangwenli 
mailto:wangwe...@huawei.com>> wrote:
Good News,   Any release plan for hive 1.3  ???


Wangwenli

From: Gopal Vijayaraghavan<mailto:go...@hortonworks.com>
Date: 2015-11-14 14:21
To: d...@hive.apache.org<mailto:d...@hive.apache.org>
CC: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: [VOTE] Hive 2.0 release plan

(+user@)

+1.

Cheers,
Gopal

On 11/13/15, 5:54 PM, "Lefty Leverenz" 
mailto:leftylever...@gmail.com>> wrote:

>The Hive bylaws require this to be submitted on the user@hive mailing list
>(even though users don't get to vote).  See Release Plan in Actions
><https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions>.
>
>-- Lefty
...
>> > On Fri, Nov 13, 2015 at 1:38 PM, Sergey Shelukhin <
>> ser...@hortonworks.com<mailto:ser...@hortonworks.com>>
>> > wrote:
>> >
>> >> Hi.
>> >> With no strong objections on DISCUSS thread, some issues raised and
>> >> addressed, and a reminder from Carl about the bylaws for the release
>> >> process, I propose we release the first version of Hive 2 (2.0), and
>> >> nominate myself as release manager.
>> >> The goal is to have the first release of Hive with aggressive set of
>>new
>> >> features, some of which are ready to use and some are at experimental
>> >> stage and will be developed in future Hive 2 releases, in line with
>>the
>> >> Hive-1-Hive-2 split discussion.
>> >> If the vote passes, the timeline to create a branch should be around
>>the
>> >> end of next week (to minimize merging in the wake of the release),
>>and
>> the
>> >> timeline to release would be around the end of November, depending on
>> the
>> >> issues found during the RC cutting process, as usual.
>> >>
>> >> Please vote:
>> >> +1 proceed with the release plan
>> >> +-0 don1t care
>> >> -1 don1t proceed with the release plan, for such and such reasons
>> >>
>> >> The vote will run for 3 days.
>> >>
>> >>
>>





Re: query execution

2015-12-03 Thread Sergey Shelukhin
If you are using Tez, you can set hive.tez.exec.print.summary=true; in CLI to 
see the breakdown.

From: Shirley Cohen 
mailto:shirley.co...@rackspace.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, December 3, 2015 at 08:06
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: query execution

Hi,

I want to characterize the overhead for each step of a Hive query. The explain 
output doesn’t give me the actual execution times, so how would I find those 
out?

Thanks in advance,

Shirley


Re: Cannot drop a table after creating an index and then renaming to a different database

2015-12-04 Thread Sergey Shelukhin
That looks like a bug in rename. Can you please file a JIRA?

From: Toby Allsopp 
mailto:toby.alls...@wherescape.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, December 3, 2015 at 18:24
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Cannot drop a table after creating an index and then renaming to a 
different database

Hi, a sequence of commands should make things clearer. I'm using the 
Hortonworks Sandbox VM with HDP 2.3.

Connected to: Apache Hive (version 1.2.1.2.3.0.0-2557)
Driver: Hive JDBC (version 1.2.1.2.3.0.0-2557)
0: jdbc:hive2://localhost:1> create database db1;
No rows affected (0.997 seconds)
0: jdbc:hive2://localhost:1> create database db2;
No rows affected (0.968 seconds)
0: jdbc:hive2://localhost:1> create table db1.test (col1 int);
No rows affected (1.758 seconds)
0: jdbc:hive2://localhost:1> create index idx1 on table db1.test(col1) as 
'compact' with deferred rebuild;
No rows affected (0.287 seconds)
0: jdbc:hive2://localhost:1> alter index idx1 on db1.test rebuild;
INFO  : Tez session hasn't been created yet. Opening session
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id 
application_1449025977131_0007)

INFO  : Map 1: -/-  Reducer 2: 0/1
INFO  : Map 1: -/-  Reducer 2: 0(+1)/1
INFO  : Map 1: -/-  Reducer 2: 0/1
INFO  : Map 1: -/-  Reducer 2: 1/1
INFO  : Loading data to table db1.db1__test_idx1__ from 
hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/db1.db/b1__test_idx1__/.hive-staging_hive_2015-12-04_02-02-47_278_3621654884902999047-10/-ext-1
INFO  : Table db1.db1__test_idx1__ stats: [numFiles=1, numRows=0, totalSize=0, 
rawDataSize=0]
No rows affected (7.792 seconds)
0: jdbc:hive2://localhost:1> alter table db1.test rename to db2.test;
No rows affected (0.261 seconds)
0: jdbc:hive2://localhost:1> drop table db2.test;
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLask. 
MetaException(message:db2.db1__test_idx1__ table not found) (state=08S01,code=1)

Basically it looks like the rename to a different database left the index table 
in the old database.

Is this a known issue? Should I be dropping the indexes before renaming tables 
to different databases?

Cheers,
Toby.


Re: Hive UDF accessing https request

2016-01-08 Thread Sergey Shelukhin
To start with, you can remove the try-catch so that the exception is not 
swallowed and you can see if an error occurs.
However, note that this is an anti-pattern for any reasonable-sized dataset.

From: Prabhu Joseph 
mailto:prabhujose.ga...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, January 8, 2016 at 00:51
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Subject: Hive UDF accessing https request

Hi Experts,

   I am trying to write a Hive UDF which access https request and based on the 
response return the result. From Plain Java, the https response is coming but 
the https accessed from UDF is null.

Can anyone review the below and share the correct steps to do this.


create temporary function profoundIP as 'com.network.logs.udf.ProfoundIp';

select ip,profoundIP(ip) as info from r_distinct_ips_temp;
 //returns NULL


//Below UDF program

package com.network.logs.udf;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

import javax.net.ssl.HttpsURLConnection;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class ProfoundNew extends UDF {

private Text evaluate(Text input) {

String url = "https://api2.profound.net/ip/"; + input.toString() 
+"?view=enterprise";

URL obj;
try {
obj = new URL(url);

HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();

con.setRequestMethod("GET");
con.setRequestProperty("Authorization","ProfoundAuth 
apikey=cisco-065ccfec619011e38f");

int responseCode = con.getResponseCode();

BufferedReader in = new BufferedReader(new 
InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();

while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
return new Text(response.toString());
} catch (Exception e) {
e.printStackTrace();
}
return null;

}
}



Thanks,
Prabhu Joseph



Re: Hive UDF accessing https request

2016-01-11 Thread Sergey Shelukhin
aker.serverCertificate(ClientHandshaker.java:1323)
... 35 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable 
to find valid certification path to requested target
at 
sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:196)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:268)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:380)
... 41 more




Thanks,
Prabhu Joseph

On Sat, Jan 9, 2016 at 12:33 AM, Sergey Shelukhin 
mailto:ser...@hortonworks.com>> wrote:
To start with, you can remove the try-catch so that the exception is not 
swallowed and you can see if an error occurs.
However, note that this is an anti-pattern for any reasonable-sized dataset.

From: Prabhu Joseph 
mailto:prabhujose.ga...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Friday, January 8, 2016 at 00:51
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org<mailto:d...@hive.apache.org>" 
mailto:d...@hive.apache.org>>
Subject: Hive UDF accessing https request

Hi Experts,

   I am trying to write a Hive UDF which access https request and based on the 
response return the result. From Plain Java, the https response is coming but 
the https accessed from UDF is null.

Can anyone review the below and share the correct steps to do this.


create temporary function profoundIP as 'com.network.logs.udf.ProfoundIp';

select ip,profoundIP(ip) as info from r_distinct_ips_temp;
 //returns NULL


//Below UDF program

package com.network.logs.udf;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

import javax.net.ssl.HttpsURLConnection;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class ProfoundNew extends UDF {

private Text evaluate(Text input) {

String url = "https://api2.profound.net/ip/"; + input.toString() 
+"?view=enterprise";

URL obj;
try {
obj = new URL(url);

HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();

con.setRequestMethod("GET");
con.setRequestProperty("Authorization","ProfoundAuth 
apikey=cisco-065ccfec619011e38f");

int responseCode = con.getResponseCode();

BufferedReader in = new BufferedReader(new 
InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();

while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
return new Text(response.toString());
} catch (Exception e) {
e.printStackTrace();
}
return null;

}
}



Thanks,
Prabhu Joseph




Re: [VOTE] Hive 2.0 release plan

2016-01-19 Thread Sergey Shelukhin
Hi.
There are 2 blockers for Hive 2.0 currently. One is about to be committed, and 
another is in progress, or may be pushed out soon.
I am planning to cut an RC for Hive 2.0 this week.

From: Hanish Bansal 
mailto:hanish.bansal.agar...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, January 19, 2016 at 10:08
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: [VOTE] Hive 2.0 release plan

Hi,

I would like to know any update about release plan for Hive 1.3.0 or 2.0.0 ??

On Tue, Dec 1, 2015 at 12:56 AM, Alan Gates 
mailto:alanfga...@gmail.com>> wrote:
Hive 2.0 will not be 100% backwards compatible with 1.x.  The following JIRA 
link shows JIRAs already committed to 2.0 that break compatibility:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%202.0.0%20AND%20%22Hadoop%20Flags%22%20%3D%20%22Incompatible%20change%22

HIVE-12429 is not yet committed but may also make the list.

In summary, the biggest changes are that Hadoop 1.x is no longer supported, 
MapReduce as an engine is deprecated (though still supported for now), and 
HIVE-12429 proposes to switch the standard authorization model to SQL Standard 
Auth instead of the current default.

The goal from the beginning was for 2.0 to be allowed to break compatibility 
where necessary while branch-1 and subsequent 1.x releases would maintain 
backwards compatibility with the 1.x line.

Alan.

[cid:part1.07010802.08040703@gmail.com]
John Omernik<mailto:j...@omernik.com>
November 30, 2015 at 9:25
Agreed, any plans for Hive 1.3?  Will Hive 2.0 be a breaking release for those 
running 1.x?





[cid:part1.07010802.08040703@gmail.com]
Wangwenli<mailto:wangwe...@huawei.com>
November 15, 2015 at 17:07
Good News,   Any release plan for hive 1.3  ???


Wangwenli
[cid:part1.07010802.08040703@gmail.com]
Gopal Vijayaraghavan<mailto:gop...@apache.org>
November 13, 2015 at 22:21

(+user@)

+1.

Cheers,
Gopal

On 11/13/15, 5:54 PM, "Lefty Leverenz" 
<mailto:leftylever...@gmail.com> wrote:



The Hive bylaws require this to be submitted on the user@hive mailing list
(even though users don't get to vote).  See Release Plan in Actions
<https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions><https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions>.

-- Lefty


...


On Fri, Nov 13, 2015 at 1:38 PM, Sergey Shelukhin <


ser...@hortonworks.com<mailto:ser...@hortonworks.com>>


wrote:



Hi.
With no strong objections on DISCUSS thread, some issues raised and
addressed, and a reminder from Carl about the bylaws for the release
process, I propose we release the first version of Hive 2 (2.0), and
nominate myself as release manager.
The goal is to have the first release of Hive with aggressive set of


new


features, some of which are ready to use and some are at experimental
stage and will be developed in future Hive 2 releases, in line with


the


Hive-1-Hive-2 split discussion.
If the vote passes, the timeline to create a branch should be around


the


end of next week (to minimize merging in the wake of the release),


and
the


timeline to release would be around the end of November, depending on


the


issues found during the RC cutting process, as usual.

Please vote:
+1 proceed with the release plan
+-0 don¹t care
-1 don¹t proceed with the release plan, for such and such reasons

The vote will run for 3 days.




[cid:part1.07010802.08040703@gmail.com]
Lefty Leverenz<mailto:leftylever...@gmail.com>
November 13, 2015 at 17:54
The Hive bylaws require this to be submitted on the user@hive mailing list
(even though users don't get to vote). See Release Plan in Actions
<https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions><https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions>.

-- Lefty


[cid:part1.07010802.08040703@gmail.com]
Thejas Nair<mailto:thejas.n...@gmail.com>
November 13, 2015 at 16:33
+1

On Fri, Nov 13, 2015 at 2:26 PM, Vaibhav Gumashta



--
Thanks & Regards
Hanish Bansal


Re: [VOTE] Hive 2.0 release plan

2016-01-26 Thread Sergey Shelukhin
Yeah I will send an update. There were a few more blockers and now again one 
remains.

From: Hanish Bansal 
mailto:hanish.bansal.agar...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, January 19, 2016 at 22:00
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: [VOTE] Hive 2.0 release plan


Thanks sergey for quick response.

I would request you to update on this group when you cut out RC for 2.0.

Regards,
Hanish Bansal

On 20-Jan-2016 12:21 am, "Sergey Shelukhin" 
mailto:ser...@hortonworks.com>> wrote:
Hi.
There are 2 blockers for Hive 2.0 currently. One is about to be committed, and 
another is in progress, or may be pushed out soon.
I am planning to cut an RC for Hive 2.0 this week.

From: Hanish Bansal 
mailto:hanish.bansal.agar...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, January 19, 2016 at 10:08
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: [VOTE] Hive 2.0 release plan

Hi,

I would like to know any update about release plan for Hive 1.3.0 or 2.0.0 ??

On Tue, Dec 1, 2015 at 12:56 AM, Alan Gates 
mailto:alanfga...@gmail.com>> wrote:
Hive 2.0 will not be 100% backwards compatible with 1.x.  The following JIRA 
link shows JIRAs already committed to 2.0 that break compatibility:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%202.0.0%20AND%20%22Hadoop%20Flags%22%20%3D%20%22Incompatible%20change%22

HIVE-12429 is not yet committed but may also make the list.

In summary, the biggest changes are that Hadoop 1.x is no longer supported, 
MapReduce as an engine is deprecated (though still supported for now), and 
HIVE-12429 proposes to switch the standard authorization model to SQL Standard 
Auth instead of the current default.

The goal from the beginning was for 2.0 to be allowed to break compatibility 
where necessary while branch-1 and subsequent 1.x releases would maintain 
backwards compatibility with the 1.x line.

Alan.

[cid:part1.07010802.08040703@gmail.com]
John Omernik<mailto:j...@omernik.com>
November 30, 2015 at 9:25
Agreed, any plans for Hive 1.3?  Will Hive 2.0 be a breaking release for those 
running 1.x?





[cid:part1.07010802.08040703@gmail.com]
Wangwenli<mailto:wangwe...@huawei.com>
November 15, 2015 at 17:07
Good News,   Any release plan for hive 1.3  ???


Wangwenli
[cid:part1.07010802.08040703@gmail.com]
Gopal Vijayaraghavan<mailto:gop...@apache.org>
November 13, 2015 at 22:21

(+user@)

+1.

Cheers,
Gopal

On 11/13/15, 5:54 PM, "Lefty Leverenz" 
<mailto:leftylever...@gmail.com> wrote:



The Hive bylaws require this to be submitted on the user@hive mailing list
(even though users don't get to vote).  See Release Plan in Actions
<https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions><https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions>.

-- Lefty


...


On Fri, Nov 13, 2015 at 1:38 PM, Sergey Shelukhin <


ser...@hortonworks.com<mailto:ser...@hortonworks.com>>


wrote:



Hi.
With no strong objections on DISCUSS thread, some issues raised and
addressed, and a reminder from Carl about the bylaws for the release
process, I propose we release the first version of Hive 2 (2.0), and
nominate myself as release manager.
The goal is to have the first release of Hive with aggressive set of


new


features, some of which are ready to use and some are at experimental
stage and will be developed in future Hive 2 releases, in line with


the


Hive-1-Hive-2 split discussion.
If the vote passes, the timeline to create a branch should be around


the


end of next week (to minimize merging in the wake of the release),


and
the


timeline to release would be around the end of November, depending on


the


issues found during the RC cutting process, as usual.

Please vote:
+1 proceed with the release plan
+-0 don¹t care
-1 don¹t proceed with the release plan, for such and such reasons

The vote will run for 3 days.




[cid:part1.07010802.08040703@gmail.com]
Lefty Leverenz<mailto:leftylever...@gmail.com>
November 13, 2015 at 17:54
The Hive bylaws require this to be submitted on the user@hive mailing list
(even though users don't get to vote). See Release Plan in Actions
<https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions><https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Actions>.

-- Lefty


[cid:part1.07010802.08040703@gmail.com]
Thejas Nair<mailto:thejas.n...@gmail.com>
November 13, 2015 at 16:33
+1

On Fri, Nov 13, 2015 at 2:26 PM, Vaibhav Gumashta



--
Thanks & Regards
Hanish Bansal


Re: NPE from simple nested ANSI Join

2016-02-04 Thread Sergey Shelukhin
The stack below looks like a bug; Hive should support joins like these, or
it should fail with a parse error, not an NPE. Can you open a JIRA?

On 16/2/4, 15:15, "Nicholas Hakobian" 
wrote:

>I'm only aware of this:
>https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
>but its unclear if it supports your syntax or not.
>
>Nicholas Szandor Hakobian
>Data Scientist
>Rally Health
>nicholas.hakob...@rallyhealth.com
>
>On Thu, Feb 4, 2016 at 12:57 PM, Dave Nicodemus
> wrote:
>> Thanks Nick,
>>
>> I did a few experiments and found that the version of the query below
>>does
>> work. So I'm not sure about your theory. Do you know if there is a
>>document
>> that spells out the exact accepted syntax ?
>>
>> SELECT COUNT(*)
>> FROM (nation n INNER JOIN customer c ON n.n_nationkey = c.c_nationkey)
>>INNER
>> JOIN orders o ON c.c_custkey = o.o_custkey;
>>
>>
>>
>>
>> On Thu, Feb 4, 2016 at 3:45 PM, Nicholas Hakobian
>>  wrote:
>>>
>>> I don't believe Hive supports that join format. Its expecting either a
>>> table name or a subquery. If its a subquery, it usually requires it to
>>> have a table name alias so it can be referenced in an outer statement.
>>>
>>> -Nick
>>>
>>> Nicholas Szandor Hakobian
>>> Data Scientist
>>> Rally Health
>>> nicholas.hakob...@rallyhealth.com
>>>
>>> On Thu, Feb 4, 2016 at 11:28 AM, Dave Nicodemus
>>>  wrote:
>>> > Using hive 1.2.1.2.3  Connecting using JDBC, issuing the following
>>>query
>>> > :
>>> >
>>> > SELECT COUNT(*)
>>> > FROM nation n
>>> > INNER JOIN (customer c
>>> >  INNER JOIN orders o ON c.c_custkey =
>>> > o.o_custkey)
>>> >  ON n.n_nationkey = c.c_nationkey;
>>> >
>>> > Generates the NPE and stack below. Fields are integer data type
>>> >
>>> > Does anyone know if this is a known issue  and whether it's fixed
>>> > someplace
>>> > ?
>>> >
>>> > Thanks,
>>> > Dave
>>> >
>>> > Stack
>>> > 
>>> > Caused by: java.lang.NullPointerExcEeption: Remote
>>> > java.lang.NullPointerException: null
>>> >
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPresent(SemanticAnaly
>>>zer.java:2046)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.parseJoinCondPopulateAl
>>>ias(SemanticAnalyzer.java:2109)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.parseJoinCondPopulateAl
>>>ias(SemanticAnalyzer.java:2185)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.parseJoinCondition(Sema
>>>nticAnalyzer.java:2445)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.parseJoinCondition(Sema
>>>nticAnalyzer.java:2386)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinTree(SemanticAna
>>>lyzer.java:8192)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genJoinTree(SemanticAna
>>>lyzer.java:8131)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyze
>>>r.java:9709)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyze
>>>r.java:9636)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnaly
>>>zer.java:10109)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.
>>>java:329)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(Semanti
>>>cAnalyzer.java:10120)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePl
>>>anner.java:211)
>>> > at
>>> >
>>> > 
>>>org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanti
>>>cAnalyzer.java:227)
>>> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:454)
>>> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:314)
>>> > at
>>> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1164)
>>> > at
>>> > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1158)
>>> > at
>>> >
>>> > 
>>>org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.
>>>java:110)
>>> >
>>> >
>>> >
>>> >
>>
>>
>



Re: list bucketing join

2016-02-15 Thread Sergey Shelukhin
It’s probably a bug. Can you file a JIRA with the full callstack? As far as I 
know, list bucketing is not widely used, so the bug might have been introduced 
unwittingly, but it’s hard to tell without seeing the callstack.

From: Shangzhong zhu mailto:shanzh...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, February 12, 2016 at 18:59
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: list bucketing join

Hi All,

List bucketing seems to be a very nice feature to further partition on skewed 
values. However when I am testing this feature out, seems to me it doesn't work 
well to join a list bucketing table with a non-list bucketing table.

List bucketing tables: Partitioned table. I defined two skewed columns. RCFile 
format.
Non list bucketing tables: Non-partitioned table, RCFile format.

When join these two tables, I got the following error, even just doing a 
"explain":

FAILED: NullPointerException null

Anybody has any clues here? Can someone give me a working join example of list 
bucketing table.

Appreciate your help!

Thanks,

Shanzhong


[ANNOUNCE] Apache Hive 2.0.0 Released

2016-02-16 Thread Sergey Shelukhin
The Apache Hive team is proud to announce the the release of Apache Hive
version 2.0.0.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top of
Apache Hadoop (TM), it provides:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce and Apache Tez frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.0.0 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332641&pro
jectId=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team




Re: Hive-2.0.1 Release date

2016-02-29 Thread Sergey Shelukhin
Hi. It will be released when some critical mass of bugfixes is accumulated. We 
already found some issues that would be nice to fix, so it may be some time in 
March. Is there a particular fix that interests you?

From: Oleksiy MapR 
mailto:osayankin.maprt...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Monday, February 29, 2016 at 00:43
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Hive-2.0.1 Release date

Hi all!

Are you planing to release Hive-2.0.1? If yes, when it probably may be?

Thanks,
Oleksiy.


Re: Hive-2.0.1 Release date

2016-02-29 Thread Sergey Shelukhin
HPLSQL is available as part of Hive 2.0. I am not sure to which extent the 
integration goes as I wasn’t involved in that work.
As far as I understand HPLSQL and Hive on Spark are kind of orthogonal…

Hive 2.0.1 is purely a bug fix release for Hive 2.0; Hive 2.1 will be the next 
feature release if some major feature is missing.

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Monday, February 29, 2016 at 15:53
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: Hive-2.0.1 Release date

Hi Sergey,

Will HPLSQL be part of 2.0.1.release?

I am using 2.0 and found Hive on Spark to be much more stable.

Thanks


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 29 February 2016 at 23:46, Sergey Shelukhin 
mailto:ser...@hortonworks.com>> wrote:
Hi. It will be released when some critical mass of bugfixes is accumulated. We 
already found some issues that would be nice to fix, so it may be some time in 
March. Is there a particular fix that interests you?

From: Oleksiy MapR 
mailto:osayankin.maprt...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Monday, February 29, 2016 at 00:43
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Hive-2.0.1 Release date

Hi all!

Are you planing to release Hive-2.0.1? If yes, when it probably may be?

Thanks,
Oleksiy.



Re: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join

2016-03-01 Thread Sergey Shelukhin
Can you please open a Hive JIRA? It is a bug.

On 16/3/1, 10:28, "Gopal Vijayaraghavan"  wrote:

>(Bcc: Tez, Cross-post to hive)
>
>> I added ³set hive.execution.engine=mr;² at top of the script, seems the
>>result is correctŠ
>
>Pretty sure it's due to the same table aliases for both dummy tables
>(they're both called _dummy_table) auto join conversion.
>
>hive> set hive.auto.convert.join=false;
>
>
>Should go back to using slower tagged joins even in Tez, which will add a
>table-tag i.e first table will be (, 0) amd 2nd table will be
>(, 1).
>
>I suspect the difference between the MR and Tez runs are lookup between
>the table-name + expr (both equal for _dummy_table.11).
>
>> per Jeff Zhang's thinking if you were to set the exec engine to 'mr'
>>would it still fail?   if so, then its not Tez . :)
>
>Hive has a a whole set of join algorithms which can only work on Tez, so
>it's not always that easy.
>
>Considering this is on hive-2.0.0, I recommend filing a JIRA on 2.0.0 and
>marking it with 2.0.1 as a target version.
>
>Cheers,
>Gopal
>
>
>
>
> 
>
>
>
>
>
>
>
>
> 
>
>
>



Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-09 Thread Sergey Shelukhin
Congrats!

From: Szehon Ho mailto:sze...@cloudera.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, March 9, 2016 at 17:40
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>, 
"w...@apache.org" 
mailto:w...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

Congratulations Wei!

On Wed, Mar 9, 2016 at 5:26 PM, Vikram Dixit K 
mailto:vik...@apache.org>> wrote:
The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache Hive 
Project. Please join me in congratulating Wei.

Thanks
Vikram.



Re: [VOTE] Bylaws change to allow some commits without review

2016-04-21 Thread Sergey Shelukhin
+1

From: Tim Robertson 
mailto:timrobertson...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, April 20, 2016 at 06:17
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: [VOTE] Bylaws change to allow some commits without review

+1

On Wed, Apr 20, 2016 at 1:24 AM, Jimmy Xiang 
mailto:jxi...@apache.org>> wrote:
+1

On Tue, Apr 19, 2016 at 2:58 PM, Alpesh Patel 
mailto:alpeshrpa...@gmail.com>> wrote:
> +1
>
> On Tue, Apr 19, 2016 at 1:29 PM, Lars Francke 
> mailto:lars.fran...@gmail.com>>
> wrote:
>>
>> Thanks everyone! Vote runs for at least one more day. I'd appreciate it if
>> you could ping/bump your colleagues to chime in here.
>>
>> I'm not entirely sure how many PMC members are active and how many votes
>> we need but I think a few more are probably needed.
>>
>> On Mon, Apr 18, 2016 at 8:02 PM, Thejas Nair 
>> mailto:the...@hortonworks.com>>
>> wrote:
>>>
>>> +1
>>>
>>> 
>>> From: Wei Zheng mailto:wzh...@hortonworks.com>>
>>> Sent: Monday, April 18, 2016 10:51 AM
>>> To: user@hive.apache.org
>>> Subject: Re: [VOTE] Bylaws change to allow some commits without review
>>>
>>> +1
>>>
>>> Thanks,
>>> Wei
>>>
>>> From: Siddharth Seth mailto:ss...@apache.org>>
>>> Reply-To: "user@hive.apache.org" 
>>> mailto:user@hive.apache.org>>
>>> Date: Monday, April 18, 2016 at 10:29
>>> To: "user@hive.apache.org" 
>>> mailto:user@hive.apache.org>>
>>> Subject: Re: [VOTE] Bylaws change to allow some commits without review
>>>
>>> +1
>>>
>>> On Wed, Apr 13, 2016 at 3:58 PM, Lars Francke 
>>> mailto:lars.fran...@gmail.com>>
>>> wrote:

 Hi everyone,

 we had a discussion on the dev@ list about allowing some forms of
 contributions to be committed without a review.

 The exact sentence I propose to add is: "Minor issues (e.g. typos, code
 style issues, JavaDoc changes. At committer's discretion) can be committed
 after soliciting feedback/review on the mailing list and not receiving
 feedback within 2 days."

 The proposed bylaws can also be seen here
 

 This vote requires a 2/3 majority of all Active PMC members so I'd love
 to get as many votes as possible. The vote will run for at least six days.

 Thanks,
 Lars
>>>
>>>
>>
>



Re: Jira Hive-13574 raised to resolve Standard deviation calculation in Hive

2016-04-22 Thread Sergey Shelukhin
Patches welcome ;)

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, April 21, 2016 at 10:46
To: user mailto:user@hive.apache.org>>, Alan Gates 
mailto:alanfga...@gmail.com>>
Subject: Jira Hive-13574 raised to resolve Standard deviation calculation in 
Hive

Hi,

Jira HIVE-13574   is raised 
to resolve Hive standard deviation function STTDEV() which is incorrect at the 
moment.

Please vote for it.

Thanks


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com




Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Sergey Shelukhin
This parameter has indeed been removed; it is treated as always true now, 
because setting it to false just produced incorrect tables.

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, April 29, 2016 at 02:51
To: user mailto:user@hive.apache.org>>
Subject: Hive configuration parameter hive.enforce.bucketing does not exist in 
Hive 2

Is the parameter

--set hive.enforce.bucketing = true;

depreciated in Hive 2 as it causes hql code not to work?

hive> set hive.enforce.bucketing = true;
Query returned non-zero code: 1, cause: hive configuration 
hive.enforce.bucketing does not exists.



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com




Re: Hive configuration parameter hive.enforce.bucketing does not exist in Hive 2

2016-04-29 Thread Sergey Shelukhin
You can set hive.conf.validation to false to disable this :)

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Friday, April 29, 2016 at 11:16
To: user mailto:user@hive.apache.org>>
Subject: Re: Hive configuration parameter hive.enforce.bucketing does not exist 
in Hive 2

Well having it in the old code causes the query to crash as well!


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 29 April 2016 at 18:33, Sergey Shelukhin 
mailto:ser...@hortonworks.com>> wrote:
This parameter has indeed been removed; it is treated as always true now, 
because setting it to false just produced incorrect tables.

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Friday, April 29, 2016 at 02:51
To: user mailto:user@hive.apache.org>>
Subject: Hive configuration parameter hive.enforce.bucketing does not exist in 
Hive 2

Is the parameter

--set hive.enforce.bucketing = true;

depreciated in Hive 2 as it causes hql code not to work?

hive> set hive.enforce.bucketing = true;
Query returned non-zero code: 1, cause: hive configuration 
hive.enforce.bucketing does not exists.



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>





Re: NullPointerException when dropping database backed by S3

2016-05-06 Thread Sergey Shelukhin
Hi. Do you have access to logs? A callstack would be helpful.

From: Marcin Tustin mailto:mtus...@handybook.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, May 6, 2016 at 09:29
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: NullPointerException when dropping database backed by S3

Hi All,

I have a database backed by an s3 bucket. When I try to drop that database, I 
get a NullPointerException:


hive> drop database services_csvs cascade;

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:java.lang.NullPointerException)


hive> describe database services_csvs;

OK

services_csvs s3a://ID:SECRETWITHOUTSLASHES@services-csvs/mtustin USER

→ hive --version

WARNING: Use "yarn jar" to launch YARN applications.

Hive 1.2.1.2.3.4.0-3485

Subversion 
git://c66-slave-20176e25-2/grid/0/jenkins/workspace/HDP-build-centos6/bigtop/build/hive/rpm/BUILD/hive-1.2.1.2.3.4.0
 -r efb067075854961dfa41165d5802a62ae334a2db

Compiled by jenkins on Wed Dec 16 04:01:39 UTC 2015

From source with checksum 4ecc763ed826fd070121da702cbd17e9

Any ideas or suggestions would be greatly appreciated.


Thanks,

Marcin

Want to work at Handy? Check out our culture deck and open 
roles
Latest news at Handy
Handy just raised 
$50m
 led by Fidelity

[http://marketing-email-assets.handybook.com/smalllogo.png]


[ANNOUNCE] Apache Hive 2.0.1 Released

2016-05-31 Thread Sergey Shelukhin
The Apache Hive team is proud to announce the the release of Apache Hive
version 2.0.1.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top of
Apache Hadoop (TM), it provides:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce and Apache Tez frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.0.1 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334886&sty
leName=Text&projectId=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team




Re: [ANNOUNCE] Apache Hive 2.0.1 Released

2016-05-31 Thread Sergey Shelukhin
Oh. I just copy-pasted the Wiki text, perhaps it should be updated.

From: Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Tuesday, May 31, 2016 at 14:01
To: user mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org<mailto:d...@hive.apache.org>" 
mailto:d...@hive.apache.org>>, 
"annou...@apache.org<mailto:annou...@apache.org>" 
mailto:annou...@apache.org>>
Subject: Re: [ANNOUNCE] Apache Hive 2.0.1 Released

Thanks Sergey,

Congratulations.

May I add that Hive 0.14 and above can also deploy Spark as its executions 
engine and with Spark on Hive on Spark execution engine you have a winning 
combination.

BTW we are just discussing the merits of TEZ + LLAP versus Spark as the 
execution engine for Spark. With Hive on Spark vs Hive on MapReduce the 
performance gains are order of magnitude.

HTH




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 31 May 2016 at 21:39, Sergey Shelukhin 
mailto:ser...@apache.org>> wrote:
The Apache Hive team is proud to announce the the release of Apache Hive
version 2.0.1.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top of
Apache Hadoop (TM), it provides:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce and Apache Tez frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.0.1 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12334886&sty
leName=Text&projectId=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team





Re: hive concurrency not working

2016-08-03 Thread Sergey Shelukhin
Can you elaborate on not working? Is it giving an error, or hanging (and if so, 
does it queue and eventually execute); are you using HS2; what commands/actions 
do the users perform?
Also, what version of Hive is this?

From: Raj hadoop mailto:raj.had...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, August 3, 2016 at 06:14
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: hive concurrency not working

Dear All,

In need or your help,

we have horton works 4 node cluster,and the problem is hive is allowing only 
one user at a time,

if any second resource need to login hive is not working,

could someone please help me in this

Thanks,
Rajesh


Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key

2016-08-25 Thread Sergey Shelukhin
I can repro this on master. I’ll file a bug...

From: Stephen Sprague mailto:sprag...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, August 25, 2016 at 13:34
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key

Hi Gopal,
Thank you for this insight.  good stuff.   The thing is there is no 'foo' for 
etl_database_source so that filter if anything should be short-circuited to 
'true'.  ie. double nots.   1. not in  2. and foo not present.

it doesn't matter what what i put in that "not in" clause the filter always 
comes back false if the column is a partition_key of course.

thanks for the tip on explain extended that's some crazy output so i'm 
sifting for clues in that now.   i hear you though - something in there with 
the metastore is at play.

Cheers,
Stephen.

On Thu, Aug 25, 2016 at 1:12 PM, Gopal Vijayaraghavan 
mailto:gop...@apache.org>> wrote:

> anybody run up against this one?  hive 2.1.0 + using a  "not in" on a
>list + the column is a partition key participant.

The partition filters are run before the plan is generated.

>AND etl_source_database not in ('foo')

Is there a 'foo' in etl_source_database?

> predicate: false (type: boolean)   this kills any hope
>of the query returning anything.
...
>  Select Operator###doesn't even mention a filter

This is probably good news, because that's an optimization.

PrunedPartitionList getPartitionsFromServer(Table tab, final
ExprNodeGenericFuncDesc compactExpr ...) {
...
  hasUnknownPartitions = Hive.get().getPartitionsByExpr(
  tab, compactExpr, conf, partitions);
}


goes into the metastore and evaluates the IN and NOT IN for partitions
ahead of time.


So, this could mean that the partition pruning evaluation returned no
partitions at all (or just exactly matched partitions only, skipping the
filter per-row).

In 2.x, you might notice it does a bit fancier things there as well, like

select count(1) from table where year*1 + month*100 + day >= 20160101;

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hi
ve/ql/optimizer/ppr/PartitionPruner.java#L468


You can try "explain extended" and see which partitions are selected (&
validate that the filter removed was applied already).

Cheers,
Gopal







Re: hive 2.1.0 + drop view

2016-08-29 Thread Sergey Shelukhin
An alternative workaround in the Postgres metastore DB is to replace literal 
string values 'NULL::character varying' that are were inserted w/o the setting 
with the actual null-s, in TBLS and SDS tables (and potentially others but I 
don’t know if there are any).

From: Stephen Sprague mailto:sprag...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Friday, August 26, 2016 at 21:08
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: hive 2.1.0 + drop view

just to cap this discussion...  thank you Ashutosh for that link that was very 
helpful.

I did the following based on my reading of it.


1. added the following to hive-site.xml


  datanucleus.rdbms.initializeColumnInfo
  NONE



this allows one to create views and drop views however it does not allow you to 
drop views previously created w/o that setting.

so...

2. did a show create table on all the views and saved to file.


3. surgically went into the hive metastore and deleted the views from table 
"TBLS" (but first had to delete from "TABLE_PARAMS" and "TBL_PRIVS" due to ref 
constraints.)


4. recreated the views as best as possible but given some views are dependent 
on other views need to make multiple passes


That was my workaround anyway.


Cheers,
Stephen
PS. altering the table to 'varchar' did nothing on postgres - thats just a 
synonym for 'character varying'

On Fri, Aug 26, 2016 at 1:40 PM, Ashutosh Chauhan 
mailto:hashut...@apache.org>> wrote:
Its a bug in DataNucleus. See discussion on : 
https://issues.apache.org/jira/browse/HIVE-14322

On Fri, Aug 26, 2016 at 1:34 PM, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
Actually I don't understand why they have defined TBL_NAME and TBL_TYPE as 
NVARCHAR (this is from Sybase similar to yours)

[Inline images 1]

Oracle seems to be correct.

And if we look further

Use the fixed-length datatype, nchar(n) , and the variable-length datatype, 
nvarchar(n), for both single-byte and multibyte character sets, such as 
Japanese. The difference between nchar(n) and char(n) and nvarchar(n) and 
varchar(n) is that both nchar(n) and nvarchar(n) allocate storage based on n 
times the number of bytes per character (based on the default character set). 
char(n) and varchar(n) allocate n bytes of storage.

What character set are you using for your server/database?



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk.Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 26 August 2016 at 21:03, Stephen Sprague 
mailto:sprag...@gmail.com>> wrote:
thanks.  what i gotta try is altering the table and changing "character 
varying(767)" to "varchar(767)" - I think.

On Fri, Aug 26, 2016 at 12:59 PM, Mich Talebzadeh 
mailto:mich.talebza...@gmail.com>> wrote:
You don't really want to mess around with the schema.

This is what I have in Oracle 12c schema for TBLS. The same as yours


[Inline images 1]

But this is Oracle, a serious database :)

HTH


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk.Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 26 August 2016 at 20:32, Stephen Sprague 
mailto:sprag...@gmail.com>> wrote:
yeah... so after the hive upgrade scripts ran we have this in pg for table 
"TABLS"

{quote}
dwr_prod_2_1_0=> \d "TBLS"
  Table "public.TBLS"
   Column   |  Type  |Modifiers
++-
 TBL_ID | bigint | not null
 CREATE_TIME| bigint | not null
 DB_ID  | bigint |
 LAST_ACCESS_TIME   | bigint | not null
 OWNER  | character varying(767) | default NULL::character varying
 RETENTION  | bigint | not null
 SD_ID  | bigint |
 TBL_NAME   | character varying(128) | default NULL::character varying
 TBL_TYPE   | character varying(128) | default NULL::character varying
 VIEW_EXPANDED_TEXT | text   |
 VIEW_ORIGINAL_TEXT | text   |

{quote}

wonder if i can perform some surgery here. :o 

Re: Hive on Tez CTAS query breaks

2016-11-10 Thread Sergey Shelukhin
Can you try specifying an explicit name for the COUNT() column in the union 
(and any other columns that are not just plain columns already)?
I wonder if CBO is just generating a weird name for it that cannot be used in 
CTAS.

From: Premal Shah mailto:premal.j.s...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, November 9, 2016 at 23:16
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive on Tez CTAS query breaks

Hi Gopal,
Thanx for the suggestion. It works with the setting you suggested.

What does this mean? Do I need to special case this query.

Also, I am trying different things to see what is breaking. Looks like I have a 
UNION ALL and both sides have a query with a GROUP BY.

This breaks.

CREATE TABLE unique_ip_tmp AS
SELECT DISTINCT
new.ip
FROM
(
SELECT COUNT(0) , ip
FROM t1
WHERE dt BETWEEN '2016-11-08' AND '2016-11-08'
GROUP BY ip

UNION ALL

SELECT COUNT(0) , ip
FROM t2
WHERE dt BETWEEN '2016-11-08' AND '2016-11-08'
GROUP BY ip
) new
LEFT JOIN unique_ip old
ON old.ip = new.ip
WHERE
old.ip IS NULL
;


If I remove one of the queries in the UNION, it works

CREATE TABLE unique_ip_tmp AS
SELECT DISTINCT
new.ip
FROM
(
SELECT
COUNT(0)
, ip
FROM
map_activity
WHERE
dt BETWEEN '2016-11-08' AND '2016-11-08'
GROUP BY
ip
) new
LEFT JOIN unique_ip old
ON old.ip = new.ip
WHERE
old.ip IS NULL
;


If I create tmp tables from the group by queries and use them, that works too

CREATE TABLE unique_ip_tmp AS
SELECT DISTINCT
new.ip
FROM
(
SELECT * FROM dropme_t1
UNION ALL
SELECT * FROM dropme_t2
) new
LEFT JOIN unique_ip old
ON old.ip = new.ip
WHERE
old.ip IS NULL
;


Turning off CBO cluster-wide won't be the right thing to do, would it?



On Wed, Nov 9, 2016 at 10:49 PM, Gopal Vijayaraghavan 
mailto:gop...@apache.org>> wrote:


> If I run a query with CREATE TABLE AS, it breaks with the error below. 
> However, just running the query works if I don't try to create a table from 
> the results. It does not happen to all CTAS queries.

Not sure if that's related to Tez at all.

Can try running it with

set hive.cbo.enable=false;

Cheers,
Gopal





--
Regards,
Premal Shah.


Re: Hive Runtime Error processing row

2016-11-11 Thread Sergey Shelukhin
Hi. Can you file a JIRA with exception callstack? Seems to be a bug. Thanks!

From: George Liaw mailto:george.a.l...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, November 10, 2016 at 17:27
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive Runtime Error processing row

Gopal,

Tested it out and that seemed to resolve the issue. Guess it'll have to be kept 
disabled for the time being.

Thanks!

On Thu, Nov 10, 2016 at 3:58 PM, George Liaw 
mailto:george.a.l...@gmail.com>> wrote:
I'll give it a try. This is Hive 2.0.1

On Thu, Nov 10, 2016 at 3:26 PM, Gopal Vijayaraghavan 
mailto:gop...@apache.org>> wrote:

> I'm running into the below error occasionally and I'm not 100% certain what's 
> going on. Does anyone have a hunch what might be happening here or where we 
> can dig for more ideas? Removed row contents but there are multiple columns.

You can try a repro run by doing

set hive.mapjoin.hybridgrace.hashtable=false;

to dig into the issue.

> at 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.setFromOutput(HybridHashTableContainer.java:844)
> at 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$GetAdaptor.setFromRow(HybridHashTableContainer.java:725)

Which version of Hive is this?

Cheers,
Gopal





--
George A. Liaw

(408) 318-7920
george.a.l...@gmail.com
LinkedIn



--
George A. Liaw

(408) 318-7920
george.a.l...@gmail.com
LinkedIn


Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan

2016-12-14 Thread Sergey Shelukhin
Congratulations!

From: Chao Sun mailto:sunc...@apache.org>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, December 14, 2016 at 10:52
To: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Cc: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, 
"rbalamo...@apache.org" 
mailto:rbalamo...@apache.org>>
Subject: Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan

Congrats Rajesh!

On Wed, Dec 14, 2016 at 9:26 AM, Vihang Karajgaonkar 
mailto:vih...@cloudera.com>> wrote:
Congrats Rajesh!

On Wed, Dec 14, 2016 at 1:54 AM, Jesus Camacho Rodriguez <
jcamachorodrig...@hortonworks.com> 
wrote:

> Congrats Rajesh, well deserved! :)
>
> --
> Jesús
>
>
>
>
> On 12/14/16, 8:41 AM, "Lefty Leverenz" 
> mailto:leftylever...@gmail.com>> wrote:
>
> >Congratulations Rajesh!
> >
> >-- Lefty
> >
> >
> >On Tue, Dec 13, 2016 at 11:58 PM, Rajesh Balamohan 
> >mailto:rbalamo...@apache.org>
> >
> >wrote:
> >
> >> Thanks a lot for providing this opportunity and to all for their
> messages.
> >> :)
> >>
> >> ~Rajesh.B
> >>
> >> On Wed, Dec 14, 2016 at 11:33 AM, Dharmesh Kakadia 
> >> mailto:dhkaka...@gmail.com>
> >
> >> wrote:
> >>
> >> > Congrats Rajesh !
> >> >
> >> > Thanks,
> >> > Dharmesh
> >> >
> >> > On Tue, Dec 13, 2016 at 7:37 PM, Vikram Dixit K <
> vikram.di...@gmail.com>
> >> > wrote:
> >> >
> >> >> Congrats Rajesh! :)
> >> >>
> >> >> On Tue, Dec 13, 2016 at 9:36 PM, Pengcheng Xiong 
> >> >> mailto:pxi...@apache.org>>
> >> >> wrote:
> >> >>
> >> >>> Congrats Rajesh! :)
> >> >>>
> >> >>> On Tue, Dec 13, 2016 at 6:51 PM, Prasanth Jayachandran <
> >> >>> prasan...@apache.org
> >> >>> > wrote:
> >> >>>
> >> >>> > The Apache Hive PMC has voted to make Rajesh Balamohan a
> committer on
> >> >>> the
> >> >>> > Apache Hive Project. Please join me in congratulating Rajesh.
> >> >>> >
> >> >>> > Congratulations Rajesh!
> >> >>> >
> >> >>> > Thanks
> >> >>> > Prasanth
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Nothing better than when appreciated for hard work.
> >> >> -Mark
> >> >>
> >> >
> >> >
> >>
>



Re: tez session timesout?

2017-01-12 Thread Sergey Shelukhin
That should only happen when InetAddress.getLocalHost().getHostName()
throws UnknownHostException… do you have any other suspicious logs or
activity around that time?

On 17/1/12, 07:54, "Brotanek, Jan"  wrote:

>Hello, I am running insert statement via CLI interface under Hive on Tez
>on HDP 2.4.0.:
>
>hive -hiveconf hive.cli.errors.ignore=true -v -f hive_pl_new7.sql
>
>hive_pl_new7.sql consists of couple of insert into partition statements
>which take quite long time - about 1200s each.
>
>insert into table table (part_col = '2015-12')
>select col1, col2
>from table
>where col2 >= '2015-12-01 00:00:00'
>and col2 <= '2015-12-31 23:59:59';
>
>insert into table table (part_col = '2016-01')
>select col1, col2
>from table
>where col2 >= '2016-01-01 00:00:00'
>and col2 <= '2016-01-31 23:59:59';
>
>insert into table table (part_col = '2016-02')
>select col1, col2
>from table
>where col2 >= '2016-02-01 00:00:00'
>and col2 <= '2016-02-31 23:59:59';
>
>First two statements run just fine. When 3rd is launched, I get following
>error. There are no syntax/semantic errors in statements, I tested that.
>When using execution engine MR, it runs just fine. This is serious issue
>for running automatical batch jobs. Can anyone explain?
>
>Versions:
>Hive 1.2.1000.2.4.0.0-169
>HDP: 2.4.0
>Hadoop 2.7.1
>Hcatalog: 1.2.1
>Hbase: 1.1.2
>
>Exception in thread "main" java.lang.RuntimeException: Unable to
>determine our local host!
>   at 
>org.apache.hadoop.hive.metastore.LockRequestBuilder.build(LockRequestBuild
>er.java:56)
>   at 
>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.j
>ava:227)
>   at 
>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.j
>ava:92)
>   at 
>org.apache.hadoop.hive.ql.Driver.acquireLocksAndOpenTxn(Driver.java:1047)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1244)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>   at 
>org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
>   at 
>org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
>   at 
>org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379)
>   at 
>org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:314)
>   at 
>org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:412)
>   at 
>org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:428)
>   at 
>org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:717)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>62)
>   at 
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
>pl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
>
>-Original Message-
>From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of
>Gopal Vijayaraghavan
>Sent: čtvrtek 12. ledna 2017 0:20
>To: user@hive.apache.org
>Subject: Re: Vectorised Queries in Hive
>
>
>
>> I have also noticed that this execution mode is only applicable to
>>single predicate search. It does not work with multiple predicates
>>searches. Can someone confirms this please?
>
>Can you explain what you mean?
>
>Vectorization supports multiple & nested AND+OR predicates - with some
>extra SIMD efficiencies in place for constants or repeated values.
>
>Cheers,
>Gopal
>
>



Re: Hive LLAP

2017-02-22 Thread Sergey Shelukhin
Hi.
While it’s theoretically possible it’s not really a supported scenario, esp. 
not for production use where we rely on Slider for packaging, failure recovery, 
etc.
Is there a reason why you don’t want to use Slider?

From: Oleksiy S 
mailto:osayankin.superu...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, February 22, 2017 at 00:34
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive LLAP

Hi all!

It is a good question. Please help!

Oleksiy.

On Tue, Feb 21, 2017 at 5:19 PM, Vlad Gudikov 
mailto:vgoo...@gmail.com>> wrote:
Hi everyone,

Recently I was looking for some guides to configure LLAP. I've found out that 
it's possible to setup LLAP using Apache Slider. Is there any possibility of 
setuping LLAP without using Slider? Maybe configuration guide or something like 
this.

Thanks in advance,
Vlad



--
Oleksiy


Re: Hive LLAP

2017-02-22 Thread Sergey Shelukhin
It is possible to set up LLAP using YARN/Slider directly, bypassing Ambari, if 
that’s what you want to avoid.

From: Sergey Shelukhin mailto:ser...@hortonworks.com>>
Date: Wednesday, February 22, 2017 at 17:03
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: Hive LLAP

Hi.
While it’s theoretically possible it’s not really a supported scenario, esp. 
not for production use where we rely on Slider for packaging, failure recovery, 
etc.
Is there a reason why you don’t want to use Slider?

From: Oleksiy S 
mailto:osayankin.superu...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Wednesday, February 22, 2017 at 00:34
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: Hive LLAP

Hi all!

It is a good question. Please help!

Oleksiy.

On Tue, Feb 21, 2017 at 5:19 PM, Vlad Gudikov 
mailto:vgoo...@gmail.com>> wrote:
Hi everyone,

Recently I was looking for some guides to configure LLAP. I've found out that 
it's possible to setup LLAP using Apache Slider. Is there any possibility of 
setuping LLAP without using Slider? Maybe configuration guide or something like 
this.

Thanks in advance,
Vlad



--
Oleksiy


Re: TezSessionPoolManager session null exception

2017-03-02 Thread Sergey Shelukhin
Can you file a JIRA?
The lines are
boolean doAsEnabled =
conf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_ENABLE_DOAS);
// either variables will never be null because a default value is
returned in case of absence
if (doAsEnabled !=
session.getConf().getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_ENABLE_DOAS)) {

session is checked for null so somehow session.getConf() is probably null.



On 17/3/2, 00:06, "邓志华"  wrote:

>Hue 3.11.0
>
>Hive: apache-hive-2.1.1
>the hs2 stack:
>2017-03-02T10:50:31,986 ERROR [HiveServer2-Background-Pool: Thread-982]
>exec.Task: Failed to execute tez graph.
>java.lang.NullPointerException
>at 
>org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.canWorkWithSameSe
>ssion(TezSessionPoolManager.java:430)
>at 
>org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSes
>sionPoolManager.java:451)
>at 
>org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSes
>sionPoolManager.java:396)
>at 
>org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:134)
>at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>at 
>org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:10
>0)
>at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
>at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
>at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
>at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
>at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1166)
>at 
>org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.j
>ava:242)
>at 
>org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation
>.java:91)
>at 
>org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQ
>LOperation.java:334)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>java:1660)
>at 
>org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLO
>peration.java:347)
>at 
>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1142)
>at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:617)
>at java.lang.Thread.run(Thread.java:745)
>
>code in TezSessionPoolManager :
>
>  if (doAsEnabled !=
>session.getConf().getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_ENABLE_DOAS))
>
>it seems that the session got from threadlocal has not been opened yet,
>So the session.getConf() returns null.
>
>Am i right?What may the root cause?
>
>
>
>
>
>



Re: Hive 2.1 and 1.2 fails on insert queries when metastore db is MSSQL SERVER > 2008

2017-05-23 Thread Sergey Shelukhin
This is fixed in https://issues.apache.org/jira/browse/HIVE-16106 which is 
unfortunately not yet in any release.

From: Артем Великородный mailto:artem@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Tuesday, May 23, 2017 at 05:13
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Hive 2.1 and 1.2 fails on insert queries when metastore db is MSSQL 
SERVER > 2008

ENV:
MS SQL 2012 on WINDOWS 10
HIVE 1.2.1, 2.1.1
Java 8

Metastore created by schematool (without any error)

Insert query using MS SQL 2012 as metastore fails with:
hive> CREATE TABLE test(i int);
hive> INSERT INTO TABLE test values (1), (2);


2017-05-23T19:54:03,172 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: 
HMSHandler Fatal error: javax.jdo.JDOException: Exception thrown when executing 
query : SELECT 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS 
NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID
 FROM SDS A0 WHERE A0.CD_ID = ? OFFSET 0 ROWS FETCH NEXT ROW ONLY
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:677)
at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:388)
at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:225)
at 
org.apache.hadoop.hive.metastore.ObjectStore.listStorageDescriptorsWithCD(ObjectStore.java:3420)
at 
org.apache.hadoop.hive.metastore.ObjectStore.removeUnusedColumnDescriptor(ObjectStore.java:3364)
at org.apache.hadoop.hive.metastore.ObjectStore.copyMSD(ObjectStore.java:3330)
at 
org.apache.hadoop.hive.metastore.ObjectStore.alterTable(ObjectStore.java:3185)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy21.alterTable(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTableUpdateTableColumnStats(HiveAlterHandler.java:706)
at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:242)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3704)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3675)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy22.alter_table_with_environment_context(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:11238)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:11222)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
NestedThrowablesStackTrace:
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near 'OFFSET'.
at 
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:217)
at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1655)
at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:440)
at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:385)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7505)
at 
com.microsoft.sqlserver.jdbc.SQLServerConnec

Re: Wildcard and Automatic partitioning

2017-08-03 Thread Sergey Shelukhin
The typical, although not technically intended for the purpose, approach is to 
use msck to “repair” the table and create the partitions. The partitions have 
to be in the standard Hive format (key=value/key=value etc.) and the table must 
be created with the corresponding partition keys.
It may actually be good to have a feature to do it in a standard manner for 
external tables only, however it would probably be restricted to the same 
format. So, the example below probably won’t work because of the star in the 
middle.


From: Nirav Patel mailto:npa...@xactlycorp.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Thursday, August 3, 2017 at 11:00
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Wildcard and Automatic partitioning

Hi, is there a way in hive when I create an external table I can specify wild 
card in LOCATION and have hive automatically identify partitions.

I have opened HIVE-17236 for 
wildcard support. same time I also have issue of specifying partitions.  I can 
use tedious ALTER TABLE to add partition directory. But since data already 
exist in separate partition why can't hive identify it ?

Here's such example of a directory:
/user/mycomp/customers/*/departments/partition/*

I can have n number of customer and for each m number of partition for 
departments object.

I think if I use following sql to create external table then it should be able 
to identify all the partitions.

CREATE EXTERNAL TABLE testTable (val map)
PARTITIONED BY (period string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/*/departments/partition/*';

If my partition directory (lets say p-12345) have multiple files insides it 
that doesn't start with "part-" prefix then I should be able to specify that 
prefix so hive can find the right filesets.

Thanks



[What's New with Xactly]

[https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]
  [LinkedIn]    [Twitter] 
   [Facebook]  
  [YouTube] 


Re: Wildcard and Automatic partitioning

2017-08-03 Thread Sergey Shelukhin
How would Hive determine partition keys for partitioning from arbitrary 
directory structure? There has to be some format, and there already is. Also 
columns for keys need to be added to the table, with types.
For reading it without partitions, Hive already supports 
mapred.input.dir.recursive, which would read all the nested directories. In 
fact iirc it’s on by default if Tez is used.

From: Nirav Patel mailto:npa...@xactlycorp.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Thursday, August 3, 2017 at 11:41
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: Wildcard and Automatic partitioning

What is the point if I have to rename hdfs directories to hive format  
(key=value/key=value etc.).  Why it can't just be normal directory like 
everyone has. That entire directory can be considered as a key ("period" in my 
example) and hive add all values as partitions.

On Thu, Aug 3, 2017 at 11:25 AM, Sergey Shelukhin 
mailto:ser...@hortonworks.com>> wrote:
The typical, although not technically intended for the purpose, approach is to 
use msck to “repair” the table and create the partitions. The partitions have 
to be in the standard Hive format (key=value/key=value etc.) and the table must 
be created with the corresponding partition keys.
It may actually be good to have a feature to do it in a standard manner for 
external tables only, however it would probably be restricted to the same 
format. So, the example below probably won’t work because of the star in the 
middle.


From: Nirav Patel mailto:npa...@xactlycorp.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Thursday, August 3, 2017 at 11:00
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Wildcard and Automatic partitioning

Hi, is there a way in hive when I create an external table I can specify wild 
card in LOCATION and have hive automatically identify partitions.

I have opened HIVE-17236<https://issues.apache.org/jira/browse/HIVE-17236> for 
wildcard support. same time I also have issue of specifying partitions.  I can 
use tedious ALTER TABLE to add partition directory. But since data already 
exist in separate partition why can't hive identify it ?

Here's such example of a directory:
/user/mycomp/customers/*/departments/partition/*

I can have n number of customer and for each m number of partition for 
departments object.

I think if I use following sql to create external table then it should be able 
to identify all the partitions.

CREATE EXTERNAL TABLE testTable (val map)
PARTITIONED BY (period string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/mycomp/customers/*/departments/partition/*';

If my partition directory (lets say p-12345) have multiple files insides it 
that doesn't start with "part-" prefix then I should be able to specify that 
prefix so hive can find the right filesets.

Thanks



[What's New with Xactly]<http://www.xactlycorp.com/email-click/>

[https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]<https://www.nyse.com/quote/XNYS:XTLY>
  [LinkedIn] <https://www.linkedin.com/company/xactly-corporation>   [Twitter] 
<https://twitter.com/Xactly>   [Facebook] <https://www.facebook.com/XactlyCorp> 
  [YouTube] <http://www.youtube.com/xactlycorporation>




[What's New with Xactly]<http://www.xactlycorp.com/email-click/>

[https://www.xactlycorp.com/wp-content/uploads/2015/07/nyse_xtly_alt_24.png]<https://www.nyse.com/quote/XNYS:XTLY>
  [LinkedIn] <https://www.linkedin.com/company/xactly-corporation>   [Twitter] 
<https://twitter.com/Xactly>   [Facebook] <https://www.facebook.com/XactlyCorp> 
  [YouTube] <http://www.youtube.com/xactlycorporation>


Re: Hive query starts own session for LLAP

2017-09-25 Thread Sergey Shelukhin
Hello.
Hive would create a new Tez AM to coordinate the query (or use an existing
one if HS2 session pool is used). However, the YARN app for Tez should
only have a single container. Is this not the case?
If it’s running additional containers, what is hive.llap.execution.mode
set to? It should be set to all or only by default (“all” means run
everything in LLAP if at all possible; “only” is the same with fallback to
containers disabled - so the query would fail if it cannot run in LLAP).

From:  Rajesh Narayanan  on behalf of
Rajesh Narayanan 
Reply-To:  "user@hive.apache.org" 
Date:  Friday, September 22, 2017 at 11:59
To:  "user@hive.apache.org" 
Subject:  Hive query starts own session for LLAP


HI All,
When I execute the hive query , that  starts its own session and creates
new yarn jobs rather than using the llap enabled job
Can you please provide some suggestion?
 
Thanks
Rajesh



does anyone care about list bucketing stored as directories?

2017-10-03 Thread Sergey Shelukhin
1) There seem to be some bugs and limitations in LB (e.g. incorrect cleanup - 
https://issues.apache.org/jira/browse/HIVE-14886) and nobody appears to as much 
as watch JIRAs ;) Does anyone actually use this stuff? Should we nuke it in 
3.0, and by 3.0 I mean I’ll remove it from master in a few weeks? :)

2) I actually wonder, on top of the same SQL syntax, wouldn’t it be much easier 
to add logic to partitioning to write skew values into partitions and non-skew 
values into a new type of default partition? It won’t affect nearly as many low 
level codepaths in obscure and unobvious ways, instead keeping all the logic in 
metastore and split generation, and would integrate with Hive features like PPD 
automatically.
Esp. if we are ok with the same limitations - e.g. if you add a new skew value 
right now, I’m not sure what happens to the rows with that value already 
sitting in the non-skew directories, but I don’t expect anything reasonable...



Re: does anyone care about list bucketing stored as directories?

2017-10-06 Thread Sergey Shelukhin
Looks like nobody does… I’ll file a ticket to remove it shortly.

From: Sergey Shelukhin mailto:ser...@hortonworks.com>>
Date: Tuesday, October 3, 2017 at 12:59
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>, 
"d...@hive.apache.org<mailto:d...@hive.apache.org>" 
mailto:d...@hive.apache.org>>
Subject: does anyone care about list bucketing stored as directories?

1) There seem to be some bugs and limitations in LB (e.g. incorrect cleanup - 
https://issues.apache.org/jira/browse/HIVE-14886) and nobody appears to as much 
as watch JIRAs ;) Does anyone actually use this stuff? Should we nuke it in 
3.0, and by 3.0 I mean I’ll remove it from master in a few weeks? :)

2) I actually wonder, on top of the same SQL syntax, wouldn’t it be much easier 
to add logic to partitioning to write skew values into partitions and non-skew 
values into a new type of default partition? It won’t affect nearly as many low 
level codepaths in obscure and unobvious ways, instead keeping all the logic in 
metastore and split generation, and would integrate with Hive features like PPD 
automatically.
Esp. if we are ok with the same limitations - e.g. if you add a new skew value 
right now, I’m not sure what happens to the rows with that value already 
sitting in the non-skew directories, but I don’t expect anything reasonable...



Re: does anyone care about list bucketing stored as directories?

2017-10-09 Thread Sergey Shelukhin
Ok, here’s synopsis that is hopefully clearer.
1) LB, when stored as directories, adds a lot of low-level complexity to Hive 
tables that has to be accounted for in many places in the code where the files 
are written or modified - from FSOP to ACID/replication/export.
2) While working on some FSOP code I noticed that some of that logic is broken 
- e.g. the duplicate file removal from tasks, a pretty fundamental correctness 
feature in Hive, may be broken. LB also doesn’t appear to be compatible with 
e.g. regular bucketing.
3) The feature hasn’t seen development activity in a while; it also doesn’t 
appear to be used a lot.

Keeping with the theme of cleaning up “legacy” code for 3.0, I was proposing we 
remove it.

(2) also suggested that, if needed, it might be easier to implement similar 
functionality by adding some flexibility to partitions (which LB directories 
look like anyway); that would also keep the logic on a higher level of 
abstraction (split generation, partition pruning) as opposed to many low-level 
places like FSOP, etc.



From: Xuefu Zhang mailto:xu...@apache.org>>
Date: Sunday, October 8, 2017 at 20:56
To: "d...@hive.apache.org<mailto:d...@hive.apache.org>" 
mailto:d...@hive.apache.org>>
Cc: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>, Sergey Shelukhin 
mailto:ser...@hortonworks.com>>
Subject: Re: does anyone care about list bucketing stored as directories?

Lack a response doesn't necessarily means "don't care". Maybe you can have a 
good description of the problem and proposed solution. Frankly I cannot make 
much sense out of the previous email.

Thanks,
Xuefu

On Fri, Oct 6, 2017 at 5:05 PM, Sergey Shelukhin 
mailto:ser...@hortonworks.com>> wrote:
Looks like nobody does… I’ll file a ticket to remove it shortly.

From: Sergey Shelukhin 
mailto:ser...@hortonworks.com><mailto:ser...@hortonworks.com<mailto:ser...@hortonworks.com>>>
Date: Tuesday, October 3, 2017 at 12:59
To: 
"user@hive.apache.org<mailto:user@hive.apache.org><mailto:user@hive.apache.org<mailto:user@hive.apache.org>>"
 
mailto:user@hive.apache.org><mailto:user@hive.apache.org<mailto:user@hive.apache.org>>>,
 
"d...@hive.apache.org<mailto:d...@hive.apache.org><mailto:d...@hive.apache.org<mailto:d...@hive.apache.org>>"
 
mailto:d...@hive.apache.org><mailto:d...@hive.apache.org<mailto:d...@hive.apache.org>>>
Subject: does anyone care about list bucketing stored as directories?

1) There seem to be some bugs and limitations in LB (e.g. incorrect cleanup - 
https://issues.apache.org/jira/browse/HIVE-14886) and nobody appears to as much 
as watch JIRAs ;) Does anyone actually use this stuff? Should we nuke it in 
3.0, and by 3.0 I mean I’ll remove it from master in a few weeks? :)

2) I actually wonder, on top of the same SQL syntax, wouldn’t it be much easier 
to add logic to partitioning to write skew values into partitions and non-skew 
values into a new type of default partition? It won’t affect nearly as many low 
level codepaths in obscure and unobvious ways, instead keeping all the logic in 
metastore and split generation, and would integrate with Hive features like PPD 
automatically.
Esp. if we are ok with the same limitations - e.g. if you add a new skew value 
right now, I’m not sure what happens to the rows with that value already 
sitting in the non-skew directories, but I don’t expect anything reasonable...