Re: [VOTE] Should we release Hive Storage API 2.7.3-rc2?

2021-08-02 Thread Alan Gates
Centos, java-1.8.0-openjdk-devel

Alan.

On Mon, Aug 2, 2021 at 5:20 PM Owen O'Malley  wrote:

> Ok, I was able to reproduce it in ORC's Ubuntu 18 docker image. I'll track
> it down.
>
> On Tue, Aug 3, 2021 at 12:06 AM Owen O'Malley 
> wrote:
>
> > Hmm, I just reran the build both from git and the tar ball and
> > TestBytesColumnVector passed both times. Which jvm were you using?
> >
> > On Tue, Aug 3, 2021 at 12:02 AM Pavan Lanka 
> > wrote:
> >
> >> +1 (non-binding)
> >>
> >> * Built Hive Storage API (did not get the failure that Alan called out)
> >> using the tag
> >> * Ran the individual test TestBytesColumnVector successfully
> >> * Built ORC using 2.7.3 successfully
> >>
> >> Regards,
> >> Pavan
> >>
> >> > On Aug 2, 2021, at 4:03 PM, Alan Gates  wrote:
> >> >
> >> > +1.  Checked the keys and the sha checksum.  Did a build and ran the
> >> > tests.  Ran rat.  I did get one failing test, but it doesn't look
> >> crucial:
> >> >
> >> > TestBytesColumnVector.testConcatAndPadding:109 row 0 expected:<[?]>
> but
> >> > was:<[???]>
> >> >
> >> > Alan.
> >> >
> >> > On Thu, Jul 29, 2021 at 3:14 PM Owen O'Malley  >
> >> > wrote:
> >> >
> >> >> Hello all,
> >> >>
> >> >> Following on previous discussions, I would like to propose a new
> >> >> storage-api release including HIVE-25400.
> >> >>
> >> >> Shall we release the following artifacts as Hive Storage API 2.7.3?
> >> >>
> >> >> tar: http://home.apache.org/~omalley/hive-storage-2.7.3/
> >> >> tag:
> >> https://github.com/apache/hive/releases/tag/storage-release-2.7.3-rc2
> >> >> jiras:
> https://issues.apache.org/jira/projects/HIVE/versions/12350287
> >> >>
> >> >> .. Owen
> >> >>
> >>
> >>
>


Re: [VOTE] Should we release Hive Storage API 2.8.1 rc2?

2021-08-02 Thread Alan Gates
+1.  Built it, ran rat, checked the signature and the hash.

Alan.

On Thu, Jul 29, 2021 at 3:13 PM Owen O'Malley 
wrote:

> Hello all,
>
> I would like to propose a new storage-api release including HIVE-25400.
>
> Shall we release the following artifacts as Hive Storage API 2.8.1?
>
> tar: http://home.apache.org/~omalley/hive-storage-2.8.1/
> tag: https://github.com/apache/hive/releases/tag/storage-release-2.8.1-rc2
> jiras: https://issues.apache.org/jira/projects/HIVE/versions/12350456
>
> .. Owen
>


Re: [VOTE] Should we release Hive Storage API 2.7.3-rc2?

2021-08-02 Thread Alan Gates
+1.  Checked the keys and the sha checksum.  Did a build and ran the
tests.  Ran rat.  I did get one failing test, but it doesn't look crucial:

TestBytesColumnVector.testConcatAndPadding:109 row 0 expected:<[?]> but
was:<[???]>

Alan.

On Thu, Jul 29, 2021 at 3:14 PM Owen O'Malley 
wrote:

> Hello all,
>
> Following on previous discussions, I would like to propose a new
> storage-api release including HIVE-25400.
>
> Shall we release the following artifacts as Hive Storage API 2.7.3?
>
> tar: http://home.apache.org/~omalley/hive-storage-2.7.3/
> tag: https://github.com/apache/hive/releases/tag/storage-release-2.7.3-rc2
> jiras: https://issues.apache.org/jira/projects/HIVE/versions/12350287
>
> .. Owen
>


Re: Time to Remove Hive-on-Spark

2020-07-22 Thread Alan Gates
An important point here is I don't believe David is proposing to remove
Hive on Spark from the 2 or 3 lines, but only from trunk.  Continuing to
support it in existing 2 and 3 lines makes sense, but since no one has
maintained it on trunk for some time and it does not work with many of the
newer features it should be removed from trunk.

Alan.

On Tue, Jul 21, 2020 at 4:10 PM Chao Sun  wrote:

> Thanks David. FWIW Uber is still running Hive on Spark (2.3.4) on a very
> large scale in production right now and I don't think we have any plan to
> change it soon.
>
>
>
> On Tue, Jul 21, 2020 at 11:28 AM David  wrote:
>
> > Hello,
> >
> > Thanks for the feedback.
> >
> > Just a quick recap: I did propose this @dev and I received unanimous +1's
> > from the community.  After a couple months, I created the PR.
> >
> > Certainly open to discussion, but there hasn't been any discussion thus
> far
> > because there have been no objections until this point.
> >
> > HoS has low adoption, heavy technical debt, and the manner in which its
> > build process is setup is impeding some other work that is not even
> related
> > to HoS.
> >
> > We can deprecate in Hive 3.x and remove in Hive 4.x.  The plan would be
> to
> > use Tez moving forward.
> >
> > My point about the vendor's move to Tez is that HoS adoption is very low,
> > it's only going lower, and while I don't know the specifics of it, there
> > must be some migration plan in place there (i.e., it must be possible to
> do
> > it already).
> >
> > Thanks,
> > David
> >
> > On Tue, Jul 21, 2020 at 12:23 PM Xuefu Zhang  wrote:
> >
> > > Hi David,
> > >
> > > While a vendor may not support a component in an open source project,
> > > removing it or not is a decision by and for the community. I certainly
> > > understand that the vendor you mentioned has contributed a great deal
> > > (including my personal effort while working there), it's not up to the
> > > vendor to make a call like what is proposed here.
> > >
> > > As a community, we should have gone through a thorough discussion and
> > > reached a consensus before actually making such a big change, in my
> > > opinion.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Tue, Jul 21, 2020 at 8:49 AM David  wrote:
> > >
> > > > Hey,
> > > >
> > > > Thanks for the input.
> > > >
> > > > FYI. Cloudera (Cloudera + Hortonworks) have removed HoS from their
> > latest
> > > > offering.
> > > >
> > > > "Tez is now the only supported execution engine, existing queries
> that
> > > > change execution mode to Spark or MapReduce within a session, for
> > > example,
> > > > fail."
> > > >
> > > >
> > > >
> > >
> >
> https://docs.cloudera.com/cdp/latest/upgrade-post/topics/ug_hive_configuration_changes.html
> > > >
> > > >
> > > > So I don't know who will be supporting this feature moving forward,
> but
> > > > there has been a lot of work done to make this change as painless as
> > > > possible.  Simply set the engine to 'tez' and remove the HoS-related
> > > > settings should address many use cases.
> > > >
> > > > Thanks.
> > > >
> > > > On Tue, Jul 21, 2020 at 11:36 AM Xuefu Z  wrote:
> > > >
> > > > > Sorry for chiming in late. However, I don't think we should remove
> > Hive
> > > > on
> > > > > Spark just because of a technical problem. This is rather a big
> > > decision
> > > > > that we need to be careful about. There are users that will be left
> > > high
> > > > > and dry by this move.
> > > > >
> > > > > If the community decides to desupport and eventually remove it, I
> > think
> > > > we
> > > > > need to have a due process. We also need a deprecation plan if
> that's
> > > we
> > > > > decide to do. Before that, I'm -1 on this proposal.
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > > On Tue, Jul 21, 2020 at 7:57 AM David  wrote:
> > > > >
> > > > > > Hello Team,
> > > > > >
> > > > > > https://github.com/apache/hive/pull/1285
> > > > > >
> >

Re: Time to Remove Hive-on-Spark

2020-06-03 Thread Alan Gates
+1.

Alan.

On Wed, Jun 3, 2020 at 1:40 PM Prasanth Jayachandran
 wrote:

> +1
>
> > On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan 
> wrote:
> >
> > +1
> >
> > On Wed, Jun 3, 2020 at 1:23 PM David Mollitor  wrote:
> >
> >> Hello Gang,
> >>
> >> I have spent some time working on upgrading Avro (far less than others):
> >>
> >> https://issues.apache.org/jira/browse/HIVE-21737
> >>
> >> This should be a relatively easy thing to do, but is blocked by
> >> Hive-on-Spark.  HoS has a weird thing where it downloads some
> >> cloud-storage-hosted file of Spark-Hadoop as part of its maven run.
> >>
> >> Since HoS is not going to receive updates from the major vendors, is it
> >> time to simply remove it?
> >>
> >> Tests are currently disabled:
> >> https://issues.apache.org/jira/browse/HIVE-23137
> >>
> >> Thanks.
> >>
>
>


Re: What's master key in hive metastore?

2020-05-04 Thread Alan Gates
It is used in storing delegation tokens, which HS2 and the megastore need
to act on the user's behalf when reading files, launching jobs, etc.  See
DelegationTokenStore.

Alan.

On Mon, May 4, 2020 at 2:00 AM Xi Chen  wrote:

> Dear all,
>
> I find several APIs about master key in hive metastore and
> IMetastoreClient. I searched around but didn't find any usage in hive DDL
> tasks. Any idea where this master key is designed for?
>
> Thanks in advance,
> bargitta
>
> int addMasterKey(String key) throws MetaException, TException;
>
> void updateMasterKey(Integer seqNo, String key)
> throws NoSuchObjectException, MetaException, TException;
>
> boolean removeMasterKey(Integer keySeq) throws TException;
>
> String[] getMasterKeys() throws TException;
>


[ANNOUNCE] Apache Hive 2.3.7 Released

2020-04-19 Thread Alan Gates
The Apache Hive team is proud to announce the release of Apache Hive
version 2.3.7.


The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)
* A mechanism to impose structure on a variety of data formats
* Access to files stored either directly in Apache HDFS (TM) or in
other data storage systems such as Apache HBase (TM) or Amazon's S3
(TM).
* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
Spark frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.3.7 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346056=Text=12310843

We would like to thank the many contributors who made this release possible.

Regards,

The Apache Hive Team


Re: [RESULT][VOTE] Apache Hive 2.3.7 Release Candidate 0

2020-04-17 Thread Alan Gates
With three +1s (Alan, Owen, and Peter) and no other votes this vote
passes.  Thanks Owen and Peter for voting.

Alan.

On Fri, Apr 17, 2020 at 12:05 AM Peter Vary 
wrote:

> +1 for the release.
> - Downloaded the artifacts
> - Verified the signatures
> - Build the project (failed 2 times with parallel compilation - I was
> afraid that I have found a problem, but finally compiled without parallel
> :) )
> - Run some tests
>
> > On Apr 16, 2020, at 22:33, Owen O'Malley  wrote:
> >
> > I'm +1 for the release.
> >
> >   - I checked the signature & sha.
> >   - I built the project.
> >   - I ran a handful of unit tests.
> >
> > .. Owen
> >
> > On Tue, Apr 7, 2020 at 8:34 PM Hyukjin Kwon  wrote:
> >
> >> Thank you so much Alan for doing this.
> >>
> >> 2020년 4월 8일 (수) 오전 9:26, Alan Gates 님이 작성:
> >>
> >>> Apache Hive 2.3.7 Release Candidate 0 is available here:
> >>> https://people.apache.org/~gates/apache-hive-2.3.7-rc0/
> >>>
> >>> Maven artifacts are available here:
> >>> https://repository.apache.org/content/repositories/orgapachehive-1100/
> >>>
> >>> The tag release-2.3.7-rc0 has been applied to the source for this
> >>> release in github, you can see it
> >>> athttps://github.com/apache/hive/tree/release-2.3.7-rc0
> >>>
> >>> Voting will conclude in 72 hours (or whenever I scrounge together
> enough
> >>> votes).
> >>>
> >>> Hive PMC Members: Please test and vote.
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> Alan.
> >>>
> >>
>
>


[VOTE] Apache Hive 2.3.7 Release Candidate 0

2020-04-07 Thread Alan Gates
Apache Hive 2.3.7 Release Candidate 0 is available here:
https://people.apache.org/~gates/apache-hive-2.3.7-rc0/

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1100/

The tag release-2.3.7-rc0 has been applied to the source for this
release in github, you can see it
athttps://github.com/apache/hive/tree/release-2.3.7-rc0

Voting will conclude in 72 hours (or whenever I scrounge together enough votes).

Hive PMC Members: Please test and vote.

Thanks.


Alan.


Re: HIVE-21508 and Hive 2.3.7 question

2020-04-07 Thread Alan Gates
I agree with Adrian that HIVE-21469 is beyond the scope of 2.3.7 for now.
If you want to backport it to the 2 branch I think it makes sense to put it
there in anticipation of a 2.4 release.

Alan.

On Wed, Apr 1, 2020 at 4:40 AM Mass Dosage  wrote:

> I think, given that we're so close to potentially cutting a 2.3.7 release
> (see Alan's separate post to the mailing list) that we shouldn't add
> anything else at this this stage. This could potentially be of interest for
> a 2.3.8 or 2.4.0 release if the rest of the Hive community agrees.
>
> Thanks,
>
> Adrian
>
> On Tue, 31 Mar 2020 at 13:24, David Mollitor  wrote:
>
>> Hello Team,
>>
>> Just to throw one more thing in there, awhile ago I put a good chunk of
>> time into shoring up the ZK Lock Manager because I worked with a lot of
>> folks on locking issues. HDP/CLDR moved away from ZK and is using a RDBMS
>> and therefore never paid it much mind. Any interest in rolling it into
>> Hive
>> 2?
>>
>> HIVE-21469
>>
>> On Tue, Mar 31, 2020, 5:20 AM Mass Dosage  wrote:
>>
>> > Hey all,
>> >
>> > We've made some progress on this and are getting closer to a 2.3.7
>> release.
>> > Alan has identified 2 tests failing on the 2.3 branch that are fixed in
>> > newer versions of Hive so he is proposing to backport the fixes for
>> them.
>> > The ticket for that is https://issues.apache.org/jira/browse/HIVE-23086
>> if
>> > you want to watch it and vote it up. Hopefully we can get that merged
>> soon
>> > and then we'll be good to go.
>> >
>> > Thanks,
>> >
>> > Adrian
>> >
>> > On Sun, 8 Mar 2020 at 02:41, Hyukjin Kwon  wrote:
>> >
>> > > Thank you so much, Alan and all.
>> > >
>> > > 2020년 3월 8일 (일) 오전 10:36, Yuming Wang 님이 작성:
>> > >
>> > >> Great, thank you Alan and Adrian.
>> > >>
>> > >> On Sun, Mar 8, 2020 at 8:13 AM Alan Gates 
>> wrote:
>> > >>
>> > >>> I'm working with Adrian on getting a 2.3.7 release out.  That will
>> pick
>> > >>> up everything that is already on the 2.3 branch.
>> > >>>
>> > >>> Alan.
>> > >>>
>> > >>> On Sat, Mar 7, 2020 at 6:02 AM Yuming Wang 
>> wrote:
>> > >>>
>> > >>>> Hi Alan and Owen,
>> > >>>>
>> > >>>> Is there any plans to release Hive 2.3.7 or Hive 2.4.0? It may be
>> the
>> > >>>> only one that supports Java 11. Hive 3.x can not support it
>> because of
>> > >>>> HIVE-22097 <https://issues.apache.org/jira/browse/HIVE-22097>.
>> > >>>>
>> > >>>> On Tue, Feb 11, 2020 at 7:32 PM Mass Dosage 
>> > >>>> wrote:
>> > >>>>
>> > >>>>> +1.
>> > >>>>>
>> > >>>>> At Expedia Group  we are big users of Hive and are also
>> experiencing
>> > >>>>> issues with not being able to use Hive 2.3.x on Java >8 which is
>> > starting
>> > >>>>> to seriously impact some of our applications which require Java
>> 11.
>> > We
>> > >>>>> worked on HIVE-21508 in order to get it merged into the various
>> > branches
>> > >>>>> and have been asking for a Hive 2.3.7 release for months with no
>> > replies to
>> > >>>>> our questions on this mailing list.
>> > >>>>>
>> > >>>>> Could someone from the Hive community please answer and let us
>> know
>> > if
>> > >>>>> there is the possibility of a Hive 2.3.7 release? I've seen at
>> least
>> > two
>> > >>>>> other requests for this on the list over the past few months.
>> > >>>>>
>> > >>>>> If not we will be forced to fork the current 2.3 branch and
>> release
>> > >>>>> our own version of Hive 2.3.7 to Maven Central (with a different
>> > group id)
>> > >>>>> so that we can use it (it sounds like this would be useful to
>> others
>> > out
>> > >>>>> there too). We'd really rather not do this but I don't see any
>> other
>> > >>>>> solutions.
>> > >>>>>
>> > >>>>> Thanks,
>> > >>>>>
&g

Re: [DISCUSS] I propose a Hive 2.3.7 release

2020-04-07 Thread Alan Gates
I've seen no replies on this, so I'll assume people are ok with me going
ahead with the release.

Alan.

On Tue, Mar 31, 2020 at 6:20 PM Alan Gates  wrote:

> Hope everyone is well.
>
> There have been several requests on the Hive mailing list for a 2.3.7
> release[1]  I've been working with Adrian Woodhead (aka Mass Dosage) on
> testing the 2.3 branch.  We found two test failures there, which I've
> posted a patch for at HIVE-23086 [2] (thanks Jésus for the review).
>
> If everyone is agreeable I'll commit the changes to HIVE-23086 and put up
> a release candidate for Hive 2.3.7.
>
> Alan.
>
> 1.
> https://lists.apache.org/thread.html/r1b5040489451e668275b94f2d2f67fca9050f83e94907674d19f864f%40%3Cdev.hive.apache.org%3E
>  ,
>
> https://lists.apache.org/thread.html/8473e04420b70d90502e221e663b341c877224cbc617515e40e8e787%40%3Cdev.hive.apache.org%3E
>
> 2. https://issues.apache.org/jira/browse/HIVE-23086
>


[DISCUSS] I propose a Hive 2.3.7 release

2020-03-31 Thread Alan Gates
Hope everyone is well.

There have been several requests on the Hive mailing list for a 2.3.7
release[1]  I've been working with Adrian Woodhead (aka Mass Dosage) on
testing the 2.3 branch.  We found two test failures there, which I've
posted a patch for at HIVE-23086 [2] (thanks Jésus for the review).

If everyone is agreeable I'll commit the changes to HIVE-23086 and put up a
release candidate for Hive 2.3.7.

Alan.

1.
https://lists.apache.org/thread.html/r1b5040489451e668275b94f2d2f67fca9050f83e94907674d19f864f%40%3Cdev.hive.apache.org%3E
 ,
https://lists.apache.org/thread.html/8473e04420b70d90502e221e663b341c877224cbc617515e40e8e787%40%3Cdev.hive.apache.org%3E

2. https://issues.apache.org/jira/browse/HIVE-23086


[jira] [Created] (HIVE-23086) Two tests fail on branch-2.3

2020-03-26 Thread Alan Gates (Jira)
Alan Gates created HIVE-23086:
-

 Summary: Two tests fail on branch-2.3
 Key: HIVE-23086
 URL: https://issues.apache.org/jira/browse/HIVE-23086
 Project: Hive
  Issue Type: Bug
  Components: Test
Affects Versions: 2.3.6
Reporter: Alan Gates
Assignee: Alan Gates


TestPerfCli query88.q and TestMiniLlapLocalCliDriver union_fast_stats.q fail on 
the 2.3 branch.  

The TestMiniLlapLocalCliDriver failure is fixed in HIVE-14977 where 
union_fast_stats.q is removed from the list of tests run with 
TestMiniLlapLocalCliDriver.

The TestPerfCli failure is fixed HIVE-16602 where one line is added to 
query88.q to allow cartesian products:

{{set hive.strict.checks.cartesian.product=false;}}

I propose to backport these two changes to branch-2.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: HIVE-21508 and Hive 2.3.7 question

2020-03-07 Thread Alan Gates
I'm working with Adrian on getting a 2.3.7 release out.  That will pick up
everything that is already on the 2.3 branch.

Alan.

On Sat, Mar 7, 2020 at 6:02 AM Yuming Wang  wrote:

> Hi Alan and Owen,
>
> Is there any plans to release Hive 2.3.7 or Hive 2.4.0? It may be the only
> one that supports Java 11. Hive 3.x can not support it because of
> HIVE-22097 .
>
> On Tue, Feb 11, 2020 at 7:32 PM Mass Dosage  wrote:
>
>> +1.
>>
>> At Expedia Group  we are big users of Hive and are also experiencing
>> issues with not being able to use Hive 2.3.x on Java >8 which is starting
>> to seriously impact some of our applications which require Java 11. We
>> worked on HIVE-21508 in order to get it merged into the various branches
>> and have been asking for a Hive 2.3.7 release for months with no replies to
>> our questions on this mailing list.
>>
>> Could someone from the Hive community please answer and let us know if
>> there is the possibility of a Hive 2.3.7 release? I've seen at least two
>> other requests for this on the list over the past few months.
>>
>> If not we will be forced to fork the current 2.3 branch and release our
>> own version of Hive 2.3.7 to Maven Central (with a different group id) so
>> that we can use it (it sounds like this would be useful to others out there
>> too). We'd really rather not do this but I don't see any other solutions.
>>
>> Thanks,
>>
>> Adrian
>> --
>> Adrian Woodhead
>> Principal Engineer
>> Expedia Group - 407 St John Street, London, EC1V 4EX
>>
>>
>> On Thu, 30 Jan 2020 at 07:34, Hyukjin Kwon  wrote:
>>
>>> Hi Hive dev team,
>>>
>>> As informed earlier, I, Yuming and many people from spark dev have made
>>> huge efforts
>>> to let Spark use official Hive release. Thanks Alan and all Hive dev for
>>> all the efforts for Hive 2.3.6 to make Spark support JDK 11.
>>>
>>> Few months ago, an unexpected problem was found. Spark throws
>>> ClassCastException when
>>> initializing HiveMetaStoreClient.
>>> Please see SPARK-29245 <
>>> https://issues.apache.org/jira/browse/SPARK-29245> for
>>> more details. This has fixed by HIVE-21508
>>> .
>>> We postponed the Hive release request to Spark code freeze schedule to
>>> avoid multiple requests.
>>>
>>> Spark is going to freeze code 31st January (tomorrow), and I currently
>>> foresee the RC starts around March. So, this will be hopefully the last
>>> request for Hive release for Spark 3.0.
>>>
>>> I was wondering if we could release Hive 2.3.7 soon so Spark can uses it.
>>>
>>> Thanks.
>>>
>>


Re: Datanucleus questions

2020-01-28 Thread Alan Gates
I don't have answers to all of your questions, but I'll reply to the ones I
know or have opinions on.

On Tue, Jan 28, 2020 at 6:20 AM Zoltan Chovan 
wrote:

> Hey,
>
> Additionally, I've been looking at TxnHandler and CompactionTxnHandler
> classes, these only use directSql, which makes testing any changes to them
> quite difficult. Are there any plans to introduce JDO/Datanucleus in those
> parts?
>
I didn't use JDO when I wrote this because an object model is a really bad
abstraction for transaction and locks.  The solution for this is not to
move it into the object model but rather to move it out of constant
database operations and into something that will be faster and lighter
weight.  I believe Olli Draese has some thoughts on how to do this,
something along the lines of an in memory cache for reads and a WAL for
writing out locks and transaction open, commit, abort.

>
> Also how committed are we to Datanucleus as the JDO/ORM lib? Is there or
> were there any plans to switch? I know about Hibernate and MyBatis as
> possible alternatives. Sidenote: afaik Hibernate has LGPL license which
> might be a problem, is that correct? MyBatis on the other hand is Apache
> License 2.0.
>
LGPL excludes Hibernate as an option.


>
> I am aware that these changes would be quite enormous, but on the long run
> it might give us better performance and testability, also most likely
> easier/quicker development in the related parts. My intention is to start a
> discussion on the topic.
>
Given the huge impact this change would have on developers and the existing
user base, the expected benefit would have to be very large for it to make
sense to pay the cost.

Alan.

>
> What are your thoughts?
>
> Best regards,
> Zoltan
>
> [1] https://github.com/apache/hive/blob/master/pom.xml#L131
> [2] http://www.datanucleus.org/documentation/products.html
> [3] http://www.datanucleus.org/products/accessplatform_5_1/migration.html
>


Re: [VOTE] Apache Hive 3.1.3 Release Candidate 0

2020-01-20 Thread Alan Gates
+1.  I checked the signature and hash, did a build, and checked the rat
output.

Alan.

On Wed, Jan 15, 2020 at 2:08 PM Naveen Gangam 
wrote:

> Apache Hive 3.1.3 Release Candidate 0 is available here:
> https://people.apache.org/~ngangam/hive-3.1.3-rc-0
>
>
> Maven artifacts are available
> here:
> https://repository.apache.org/content/repositories/orgapachehive-1099/
>
>
> The tag release-3.1.3-rc0 has been applied to the source for this
> release in github, you can see it
> athttps://github.com/apache/hive/tree/release-3.1.3-rc0
>
> Voting will conclude in 72 hours.
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>


Re: HMS Hive database schema+upgrade scripts

2020-01-13 Thread Alan Gates
The hive directory isn't for installing new instances of HMS, since the HMS
can't use Hive itself as its database.  It's for the information schema
support.  So when I moved the HMS schemas over into standalone-metastore I
didn't move the hive directory, as it served a different purpose.  I agree
it would be better to make that clear in the source tree.

Alan.

On Mon, Jan 13, 2020 at 1:51 AM Zoltan Chovan 
wrote:

> Hi all,
>
> I've recently done some work on the Hive database schema/upgrade files and
> found the following:
>
>- there are two locations where we store the schema/upgrade files:
>   - hive/metastore/scripts/upgrade (old)
>   - hive/standalone-metastore/metastore-server/src/main/sql (new)
>- the old location includes the 'hive' folder and the hive database
>schema (this schema includes the sys and infromation_schema databases)
>- the new location is missing this entirely
>- looking at the packaging target directory the hive schema makes it's
>way in there as the directory
>
>  
> 'hive/packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/scripts/metastore/upgrade'
>includes the hive folder with the schema and upgrade scripts
>
>
> With all this I'm a bit confused why the hive folder is not in the
> standalone-metastore module. Could someone shed some light on this?
>
> Thanks,
> Zoltan
>


Re: [VOTE] Should we release Hive Storage API 2.7.1rc0?

2019-11-27 Thread Alan Gates
+1.  Did a build, check the signature and hash, ran rat.  At some point we
should update the copyright in the NOTICE file, it still says 2017.

Alan.

On Tue, Nov 26, 2019 at 11:57 PM Jesus Camacho Rodriguez <
jcama...@apache.org> wrote:

> +1
>
> Downloaded tar, built storage-api and ran test suite, ran rat check, and
> verified checksum and signature.
>
> Thanks Owen!
>
> On Tue, Nov 26, 2019 at 4:39 PM Owen O'Malley 
> wrote:
>
> > All,
> > I'd like to make a storage-api release with HIVE-22405
> >  in it.
> >
> > Should we release the following artifacts as Hive Storage API 2.7.1?
> >
> > tar: http://home.apache.org/~omalley/storage-api-2.7.1/
> > tag:
> https://github.com/apache/hive/releases/tag/storage-release-2.7.1rc0
> > jiras: https://issues.apache.org/jira/projects/HIVE/versions/12346553
> >
> > Thanks!
> >
>


Re: Submitting a Patch Against Branch 3

2019-10-31 Thread Alan Gates
My thought would be to not worry about Yetus for branches, since it doesn't
work.  As long as it passes the regression tests for the branch it should
be fine.

Alan.

On Thu, Oct 31, 2019 at 10:05 AM David Mollitor  wrote:

> Hello Peter,
>
> Is there a way then to build against branch-3?
>
> Directions I got are from here:
>
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CreatingaPatch
>
>
> Thanks!
>
> On Thu, Oct 31, 2019 at 1:03 PM Peter Vary 
> wrote:
>
> > Hi David,
> >
> > Unfortunately Yetus as it is now, does not understand the concept of
> > branches based on the patch names. :(
> >
> > Thanks,
> > Peter
> >
> > > On Oct 29, 2019, at 00:22, David Mollitor  wrote:
> > >
> > > Hello Gang,
> > >
> > > I have attempted a couple of times now to submit a patch for branch-3
> of
> > > Hive.  None of my attempts have been successful and I'm not sure why
> they
> > > are failing.  The following JIRA is a very trivial change but YETUS
> won't
> > > build it.
> > >
> > > Any thoughts?
> > >
> > > https://issues.apache.org/jira/browse/HIVE-18415
> > >
> > > Thanks!
> >
> >
>


Re: Query about itests profile

2019-10-26 Thread Alan Gates
Yes, Hive CI runs the itest tests.  But it does not run them with just 'mvn
install' on the command line.  There are a number of things that happen
under the hood in the CI infrastructure to control which .q files are run
for which driver.  You can get an idea by looking in
the itests/src/test/resources/testconfiguration.properties file.

Alan.

On Fri, Oct 25, 2019 at 5:25 PM Sandeep Katta <
sandeep0102.opensou...@gmail.com> wrote:

> Hey guys sorry for the silly question. Does itests run the Hive CI , when I
> ran in my local system I see there are failures.
>
> So wanted to know whether it’s normal or what
>
> Regards
> Sandeep Katta
>


[ANNOUNCE] Apache Hive 3.1.2 released

2019-08-27 Thread Alan Gates
The Apache Hive team is proud to announce the release of Apache Hive
version 3.1.2.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
Spark frameworks.

For Hive release details and downloads, please
visit:https://hive.apache.org/downloads.html

Hive 3.1.2 Release Notes are available
here:https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12344397=Html=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team


Re: [RESULT][VOTE] Apache Hive 3.1.2 Release Candidate 1

2019-08-26 Thread Alan Gates
With 3 +1s (Alan, Zoltan, and Owen) and no -1s the vote passes.  Thank you
Owen and Zoltan for voting.  I'll roll the release.

Alan.

On Mon, Aug 26, 2019 at 1:04 PM Owen O'Malley 
wrote:

> +1
>
> * checked source checksum & signature
> * built the code
> * ran some orc tests
> * ran rat
>
> .. Owen
>
> On Mon, Aug 26, 2019 at 4:06 AM Zoltan Haindrich  wrote:
>
> > * deployed the binary in a single node environment (tez-0.9.1;
> > hadoop-3.1.1)
> > * run some simple queries
> > * checked signatures
> > * compiled sources without using any cache servers
> >
> > I haven't encountered any issues
> >
> > +1
> >
> >
> > On 8/23/19 1:21 AM, Alan Gates wrote:
> > > NOTE:  This is a separate release from the recently voted on Hive
> 2.3.6.
> > > Apache Hive 3.1.2 Release Candidate 1 is available here:
> > >
> > > https://people.apache.org/~gates/apache-hive-3.1.2-rc-1/
> > >
> > > Maven artifacts are available here:
> > >
> > > https://repository.apache.org/content/repositories/orgapachehive-1097/
> > >
> > > The tag release-3.1.2-rc1 has been applied to the source for this
> release
> > > in github, you can see it at
> > > https://github.com/apache/hive/tree/release-3.1.2-rc1
> > >
> > > Voting will conclude in 72 hours.
> > >
> > > Hive PMC Members: Please test and vote.
> > >
> > > Thanks.
> > >
> >
>


[ANNOUNCE] Apache Hive 2.3.6 Released

2019-08-23 Thread Alan Gates
The Apache Hive team is proud to announce the release of Apache Hive
version 2.3.6.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:
* Tools to enable easy data extract/transform/load (ETL)
* A mechanism to impose structure on a variety of data formats
* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)
* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark
frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.3.6 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345603=Text=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team


[VOTE] Apache Hive 3.1.2 Release Candidate 1

2019-08-22 Thread Alan Gates
NOTE:  This is a separate release from the recently voted on Hive 2.3.6.
Apache Hive 3.1.2 Release Candidate 1 is available here:

https://people.apache.org/~gates/apache-hive-3.1.2-rc-1/

Maven artifacts are available here:

https://repository.apache.org/content/repositories/orgapachehive-1097/

The tag release-3.1.2-rc1 has been applied to the source for this release
in github, you can see it at
https://github.com/apache/hive/tree/release-3.1.2-rc1

Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks.


Re: [RESULT][VOTE] Apache Hive 2.3.6 Release Candidate 0

2019-08-22 Thread Alan Gates
With 3 +1s (Alan, Owen, and Zoltan) and no -1s this vote passes.  Thank you
Owen and Zoltan for reviewing the release.

Alan.

On Wed, Aug 21, 2019 at 4:28 PM Alan Gates  wrote:

> Unless that fix is really important I say we don't push it into the 2.3.6
> as this release is mainly for Spark and they've already tested with it.  My
> take is HIVE-21980 isn't a crucial fix, so I say we leave it on the 2.3
> branch and it will come out in a 2.3.7 release if someone needs it.
>
> Alan.
>
> On Wed, Aug 21, 2019 at 9:24 AM Zoltan Haindrich  wrote:
>
>> +1
>> * checked signatures
>> * downloaded the binary release; run some queries in a small installation
>>
>> I'm undecided about the following: for branch-3 there was an ask to
>> include HIVE-18624; which was already present on branch-2 which is good;
>> but I know about another similar issue HIVE-21980; I've just landed it on
>> all relevant branches - including branch-2.3
>> I'm not sure if we want to roll another RC to include it - since it needs
>> a specifically constructed large query.
>>
>> Thank you Alan for taking up releasing 2.3.6!
>>
>>
>> On 8/19/19 6:49 PM, Alan Gates wrote:
>> > Apache Hive 2.3.6 Release Candidate 0 is available here:
>> >
>> > https://people.apache.org/~gates/apache-hive-2.3.6-rc-0/
>> >
>> > Maven artifacts are available here:
>> > https://repository.apache.org/content/repositories/orgapachehive-1096/
>> >
>> > The tag release-2.3.6-rc0 has been applied to the source for this
>> release
>> > in github, you can see it at
>> > https://github.com/apache/hive/tree/release-2.3.6-rc0
>> >
>> > Voting will stay open for at least 72 hours.
>> >
>> > Hive PMC Members: Please test and vote.
>> >
>> > Thanks.
>> >
>>
>


Re: [VOTE] Apache Hive 2.3.6 Release Candidate 0

2019-08-21 Thread Alan Gates
Unless that fix is really important I say we don't push it into the 2.3.6
as this release is mainly for Spark and they've already tested with it.  My
take is HIVE-21980 isn't a crucial fix, so I say we leave it on the 2.3
branch and it will come out in a 2.3.7 release if someone needs it.

Alan.

On Wed, Aug 21, 2019 at 9:24 AM Zoltan Haindrich  wrote:

> +1
> * checked signatures
> * downloaded the binary release; run some queries in a small installation
>
> I'm undecided about the following: for branch-3 there was an ask to
> include HIVE-18624; which was already present on branch-2 which is good;
> but I know about another similar issue HIVE-21980; I've just landed it on
> all relevant branches - including branch-2.3
> I'm not sure if we want to roll another RC to include it - since it needs
> a specifically constructed large query.
>
> Thank you Alan for taking up releasing 2.3.6!
>
>
> On 8/19/19 6:49 PM, Alan Gates wrote:
> > Apache Hive 2.3.6 Release Candidate 0 is available here:
> >
> > https://people.apache.org/~gates/apache-hive-2.3.6-rc-0/
> >
> > Maven artifacts are available here:
> > https://repository.apache.org/content/repositories/orgapachehive-1096/
> >
> > The tag release-2.3.6-rc0 has been applied to the source for this release
> > in github, you can see it at
> > https://github.com/apache/hive/tree/release-2.3.6-rc0
> >
> > Voting will stay open for at least 72 hours.
> >
> > Hive PMC Members: Please test and vote.
> >
> > Thanks.
> >
>


[VOTE] Apache Hive 2.3.6 Release Candidate 0

2019-08-19 Thread Alan Gates
Apache Hive 2.3.6 Release Candidate 0 is available here:

https://people.apache.org/~gates/apache-hive-2.3.6-rc-0/

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1096/

The tag release-2.3.6-rc0 has been applied to the source for this release
in github, you can see it at
https://github.com/apache/hive/tree/release-2.3.6-rc0

Voting will stay open for at least 72 hours.

Hive PMC Members: Please test and vote.

Thanks.


Re: Question about feasibility of porting HIVE-21584 and minor release 2.3.6

2019-08-13 Thread Alan Gates
I've committed the patch for Hive 22096 and created a snapshot for Hive
2.3.6-rc0, which should be in the Apache repo.  Let me know if this works,
and if so I'll push out the 2.3.6 release.

Alan.

On Sun, Aug 11, 2019 at 10:28 PM Hyukjin Kwon  wrote:

> I sincerely appreciate, Alan, for a quick positive response.
> Thanks Yuming for quickly starting to work on that.
>
> For quick updates:
>   - Seems Yuming already started to investigate it at
> https://issues.apache.org/jira/browse/HIVE-22096 - I will try to reach
> him and get synced.
>   (He works in a different company and lives in a different country
> from me so some delays are expected)
>   - Spark side, it's being investigated under
> https://issues.apache.org/jira/browse/SPARK-28684
>   - .. and .. actually I am now synced with Yuming, almost of all tests
> pass only with HIVE-22096
>
> Alan, would you mind if I ask to make a snapshot after HIVE-22096 fix if
> it makes sense?
>
> Sounds like, in his local, almost all Spark test cases pass except one
> test case with HIVE-22096 (with JDK 11).
> Looks like he wants to verify via Spark CI as well to show official test
> results to show up which can be easily done if there's a snapshot.
>
>
> 2019년 8월 10일 (토) 오전 6:35, Alan Gates 님이 작성:
>
>> I think it's fine to backport this and do a release.  Are we sure that's
>> enough to make it run on JDK11?  As noted in the bug there isn't an
>> umbrella issue to make it JDK11 compatible.  I don't know if anyone has
>> tested Hive 2 on JDK11 or not.
>>
>> Are you available to do the backport?  If so, and we don't find any other
>> JDK11 related issues, I can create a 2.3.6 release once you're done.
>>
>> Alan.
>>
>> On Thu, Aug 8, 2019 at 6:23 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>>
>>> I am from Spark dev and had a question about feasibility of porting
>>> HIVE-21584
>>> and minor release 2.3.x.
>>>
>>> Just to share full context, please take a look at
>>>
>>> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16822802=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822802
>>> TL;DR: with Hive 2.3.5, Spark is now very close to get through the
>>> complicated
>>> situation that has been locked for years. Thank you Yuming Wang and Alan
>>> Gates
>>> - it would have been impossible without all the efforts from you guys.
>>>
>>>
>>> One problem found lately though,
>>> Spark is trying to support JDK 11 but this seems blocked by HIVE-21584.
>>> So, I and Spark
>>> community are trying to find a way to get through.
>>>
>>> Firstly, I (and presumably some Spark community guys) thought Spark
>>> should find its
>>> own workaround or try to upgrade Hive to 4.0.0 in the upcoming release.
>>> This possibility was checked and seems difficult.
>>> It was already pretty a radical change to upgrade Hive 1.2.x to 2.3.5,
>>> seems it's difficult to target upgrade to 4.0.0 in this release at this
>>> moment.
>>>
>>> I understand HIVE-21584 was fixed in 4.0.0 and it might be difficult to
>>> backport
>>> through branch-3.x and 2.x; however, wanted to at least ask and now
>>> because
>>> I thought porting HIVE-21584 into Hive 2.3.x branch is an option that
>>> needs the
>>> minimised efforts to permanently resolve all related issues, apparently,
>>> blocked
>>> for multiple years ..
>>>
>>>
>>> Thanks for consideration in advance.
>>>
>>


Heads up for two Hive releases

2019-08-12 Thread Alan Gates
All, this has been covered in other email chains here, but just to
summarize, we have proposals for two point releases:

1. Hive 3.1.2.  This will include only commits already on the 3.1 branch.
See
https://lists.apache.org/thread.html/722f6eb65e1ab8add8fcd938192098eb1426cb7b74a62c0252e06ea1@%3Cdev.hive.apache.org%3E
for details

2. Hive 2.3.6 This includes one (so far) backport that Spark needs to
update to Hive 2.  See
https://lists.apache.org/thread.html/e8f7406d046bd94b4c4a6108c7ae66d8f35014fb9132a6042c0b83f4@%3Cdev.hive.apache.org%3E
for details.

I've agreed to help with both of these.

Alan.


Re: Question about feasibility of porting HIVE-21584 and minor release 2.3.6

2019-08-09 Thread Alan Gates
I think it's fine to backport this and do a release.  Are we sure that's
enough to make it run on JDK11?  As noted in the bug there isn't an
umbrella issue to make it JDK11 compatible.  I don't know if anyone has
tested Hive 2 on JDK11 or not.

Are you available to do the backport?  If so, and we don't find any other
JDK11 related issues, I can create a 2.3.6 release once you're done.

Alan.

On Thu, Aug 8, 2019 at 6:23 PM Hyukjin Kwon  wrote:

> Hi all,
>
>
> I am from Spark dev and had a question about feasibility of porting
> HIVE-21584
> and minor release 2.3.x.
>
> Just to share full context, please take a look at
>
> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16822802=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822802
> TL;DR: with Hive 2.3.5, Spark is now very close to get through the
> complicated
> situation that has been locked for years. Thank you Yuming Wang and Alan
> Gates
> - it would have been impossible without all the efforts from you guys.
>
>
> One problem found lately though,
> Spark is trying to support JDK 11 but this seems blocked by HIVE-21584.
> So, I and Spark
> community are trying to find a way to get through.
>
> Firstly, I (and presumably some Spark community guys) thought Spark should
> find its
> own workaround or try to upgrade Hive to 4.0.0 in the upcoming release.
> This possibility was checked and seems difficult.
> It was already pretty a radical change to upgrade Hive 1.2.x to 2.3.5,
> seems it's difficult to target upgrade to 4.0.0 in this release at this
> moment.
>
> I understand HIVE-21584 was fixed in 4.0.0 and it might be difficult to
> backport
> through branch-3.x and 2.x; however, wanted to at least ask and now because
> I thought porting HIVE-21584 into Hive 2.3.x branch is an option that
> needs the
> minimised efforts to permanently resolve all related issues, apparently,
> blocked
> for multiple years ..
>
>
> Thanks for consideration in advance.
>


Re: Release timing for 3.1.2?

2019-08-09 Thread Alan Gates
I'm not aware of any discussions to push a 3.1.2 release.  I'm can work on
putting together a release of what's currently in the 3.1.2 line.  If we
hit issues with tests not passing or other such things are you available to
help?

Alan.

On Mon, Jul 29, 2019 at 6:18 PM Kevin Marr  wrote:

> Hello Hive Dev Community,
>
> I'm Kevin, a Product Manager at Looker (recently acquired by Google). We
> make business intelligence software and connect to 40+ relational data
> warehouses, including Hive.
>
> Recently we've been struggling with a Hive bug (HIVE-18624
> ), where parsing time is
> extremely high for complex SELECT expressions. This is problematic for us
> because Looker can sometimes generate very complex SQL to represent our
> customer's reporting logic (we do querying in-database rather than
> ingesting data and manipulating it in proprietary systems).
>
> It appears that the bug has been fixed for version 3.1.2 but has not yet
> been released. Would it be possible to make a release of 3.1.2 so that we
> and our customers can take advantage of the bug fix?
>
> Thank you,
> Kevin
>


[jira] [Created] (HIVE-21850) branch-3 metastore installation installs wrong version

2019-06-07 Thread Alan Gates (JIRA)
Alan Gates created HIVE-21850:
-

 Summary: branch-3 metastore installation installs wrong version
 Key: HIVE-21850
 URL: https://issues.apache.org/jira/browse/HIVE-21850
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.2.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 3.2.0


hive.version.shortname in standalone-metastore/pom.xml was not properly updated 
in branch-3.  It is still set to 3.1.0, which causes the MetastoreSchemaTool to 
install the wrong version.  Part of this Jira should include updating the 
HowToRelease doc to include updating this value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21826) Backport HIVE-21786 to branch-3

2019-06-03 Thread Alan Gates (JIRA)
Alan Gates created HIVE-21826:
-

 Summary: Backport HIVE-21786 to branch-3
 Key: HIVE-21826
 URL: https://issues.apache.org/jira/browse/HIVE-21826
 Project: Hive
  Issue Type: Bug
Reporter: Alan Gates
Assignee: Alan Gates


Missed branch-3 in the original fix.  Need to apply the patch here as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21809) Backport HIVE-21786 to branch-2.3

2019-05-30 Thread Alan Gates (JIRA)
Alan Gates created HIVE-21809:
-

 Summary: Backport HIVE-21786 to branch-2.3
 Key: HIVE-21809
 URL: https://issues.apache.org/jira/browse/HIVE-21809
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.5
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 2.3.6


Need to update the URLs in the pom on branch-2.3 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21808) Backport HIVE-21786 to branch-3.1

2019-05-30 Thread Alan Gates (JIRA)
Alan Gates created HIVE-21808:
-

 Summary: Backport HIVE-21786 to branch-3.1
 Key: HIVE-21808
 URL: https://issues.apache.org/jira/browse/HIVE-21808
 Project: Hive
  Issue Type: Bug
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 3.1.2


Need to update the URLs in the poms for branch 3 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21786) Update repo URLs in poms

2019-05-23 Thread Alan Gates (JIRA)
Alan Gates created HIVE-21786:
-

 Summary: Update repo URLs in poms
 Key: HIVE-21786
 URL: https://issues.apache.org/jira/browse/HIVE-21786
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 2.3.5, 3.1.1, 4.0.0
Reporter: Alan Gates
Assignee: Alan Gates


Need to update repo URLs in the poms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21758) DBInstall tests broken on master and branch-3.1

2019-05-20 Thread Alan Gates (JIRA)
Alan Gates created HIVE-21758:
-

 Summary: DBInstall tests broken on master and branch-3.1
 Key: HIVE-21758
 URL: https://issues.apache.org/jira/browse/HIVE-21758
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Tests
Affects Versions: 3.1.1
Reporter: Alan Gates
Assignee: Alan Gates


The Oracle and SqlServer install and upgrade tests in standalone-metastore fail 
in master and branch-3.1.  In the Oracle case it appears the docker container 
that was used no longer exists.  For SqlServer the cause of the failures is not 
immediately clear.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Apache Hive 2.3.5 Released

2019-05-16 Thread Alan Gates
It appears I made some mistake in the upload process, as I didn't see it
even in Apache's repo.  I've redone the upload and it's showing up in
Apache's repo now.  Not sure how long before it replicates to maven central.

Alan.

On Wed, May 15, 2019 at 7:24 PM Wang, Yuming 
wrote:

> Thank you, Alan.
>
> The Hive 2.3.5 jars is not in the maven repository.<
> https://repo.maven.apache.org/maven2/org/apache/hive/hive-exec/> Do we
> need to wait for a while?
>
>
>
>
>
> On 16/05/2019, 02:43, "Alan Gates"  wrote:
>
>
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 2.3.5.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * Interactive query over terabytes sized datasets
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in
> other
>
>   data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
> Spark
>
> frameworks.
>
>
>
> For Hive release details and downloads, please visit:
>
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhive.apache.org%2Fdownloads.htmldata=02%7C01%7Cyumwang%40ebay.com%7Cdfa8e1a7af9c4c2e68e008d6d965316b%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636935425976809307sdata=bfAaYbr%2FWyiuVQlAz0pfWG%2Bh%2FB9%2Fn0v1N4GaGseN9mU%3Dreserved=0
>
>
>
> Hive 2.3.5 Release Notes are available here:
>
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%3Fversion%3D12345394%26styleName%3DText%26projectId%3D12310843data=02%7C01%7Cyumwang%40ebay.com%7Cdfa8e1a7af9c4c2e68e008d6d965316b%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636935425976809307sdata=14Xgmb70H1A6CovUzwA2Koyz%2FNHGCejfPIFDQyQRR6g%3Dreserved=0
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team
>
>
>


[ANNOUNCE] Apache Hive 2.3.5 Released

2019-05-15 Thread Alan Gates
The Apache Hive team is proud to announce the release of Apache Hive
version 2.3.5.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* Interactive query over terabytes sized datasets

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark
frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.3.5 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345394=Text=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team


Re: [RESULT][VOTE] Apache Hive 2.3.5 Release Candidate 0

2019-05-14 Thread Alan Gates
With 3 +1 votes (Alan, Owen, and Peter) and no -1 votes the vote passes.
Thanks to Owen and Peter for voting.  I'll push out the release.

Alan.

On Mon, May 13, 2019 at 5:15 AM Peter Vary 
wrote:

> +1 LGTM
>
> Checked:
> - Signatures
> - Source is building
> - Run some unit tests
>
>
> > On May 10, 2019, at 00:42, Owen O'Malley  wrote:
> >
> > +1
> >
> > For the source release:
> > * check signatures of tarball
> > * asked Alan to update the KEYS file with his current key
> > * built code
> > * ran a set of the unit tests
> >
> > .. Owen
> >
> > On Tue, May 7, 2019 at 4:57 PM Alan Gates  wrote:
> >
> >> Apache Hive 2.3.5 Release Candidate 0 is available
> >> here:http://people.apache.org/~gates/apache-hive-2.3.5-rc-0/
> >>
> >> Maven artifacts are available here:
> >> https://repository.apache.org/content/repositories/orgapachehive-1094/
> >>
> >> Source tag for RC0 is release-2.3.5-rc0
> >>
> >> Voting will conclude in 72 hours.
> >>
> >> Hive PMC Members: Please test and vote.
> >>
> >> Thanks.
> >>
> >>
> >> Alan.
> >>
>
>


Re: HIVE-21639 and Hive 2.3.5 release

2019-05-08 Thread Alan Gates
I've rolled a Hive 2.3.5 RC0 and posted it to
http://people.apache.org/~gates/apache-hive-2.3.5-rc-0/

Alan.

On Tue, May 7, 2019 at 8:02 AM Yuming Wang  wrote:

> Thank you Alan. We're fine with this version. I think we could release Hive
> 2.3.5 now.
>
> On Tue, May 7, 2019 at 3:11 AM Alan Gates  wrote:
>
>> I've pushed another version of 2.3.5-SNAPSHOT, so you can now test with
>> the latest patch in.
>>
>> Alan.
>>
>> On Sat, May 4, 2019 at 6:42 AM Hyukjin Kwon  wrote:
>>
>>> Awesome, thanks!
>>>
>>> On Fri, 3 May 2019, 22:07 Alan Gates,  wrote:
>>>
>>>> Ok, I'll do a new build with that patch and post another snapshot so
>>>> you can test with it.
>>>>
>>>> Alan.
>>>>
>>>> On Fri, May 3, 2019 at 2:15 AM Yuming Wang  wrote:
>>>>
>>>>> Yes. This is the last one.
>>>>>
>>>>> On Fri, May 3, 2019 at 4:15 PM Hyukjin Kwon 
>>>>> wrote:
>>>>>
>>>>>> Thank you so much Alan.
>>>>>>
>>>>>> As far as I can tell, we're all good except HIVE-21680 for now.
>>>>>> Yuming, is the last one? or are you still investigating?
>>>>>>
>>>>>> 2019년 4월 26일 (금) 오전 6:56, Alan Gates 님이 작성:
>>>>>>
>>>>>>> Yuming, and Hjukjin, I've committed HIVE-21639 and HIVE-21536 to
>>>>>>> branch-2.3.  Let me know once you've tested this against Spark 3 and are
>>>>>>> ready to start the release process.
>>>>>>>
>>>>>>> Alan.
>>>>>>>
>>>>>>> On Thu, Apr 25, 2019 at 9:47 AM Alan Gates 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yuming Wang and Hjukjin Kwon have proposed releasing a Hive 2.3.5
>>>>>>>> that Spark can use.  They need to push a couple of back ports into
>>>>>>>> branch-2.3 first.  See [1] and [2] for details.
>>>>>>>>
>>>>>>>> I'm willing to work with them to get this done.
>>>>>>>>
>>>>>>>> Alan.
>>>>>>>>
>>>>>>>> 1.
>>>>>>>> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16822802=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16822802
>>>>>>>> 2.
>>>>>>>> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16826113=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826113
>>>>>>>>
>>>>>>>>


[VOTE] Apache Hive 2.3.5 Release Candidate 0

2019-05-07 Thread Alan Gates
Apache Hive 2.3.5 Release Candidate 0 is available
here:http://people.apache.org/~gates/apache-hive-2.3.5-rc-0/

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1094/

Source tag for RC0 is release-2.3.5-rc0

Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks.


Alan.


Re: HIVE-21639 and Hive 2.3.5 release

2019-05-06 Thread Alan Gates
I've pushed another version of 2.3.5-SNAPSHOT, so you can now test with the
latest patch in.

Alan.

On Sat, May 4, 2019 at 6:42 AM Hyukjin Kwon  wrote:

> Awesome, thanks!
>
> On Fri, 3 May 2019, 22:07 Alan Gates,  wrote:
>
>> Ok, I'll do a new build with that patch and post another snapshot so you
>> can test with it.
>>
>> Alan.
>>
>> On Fri, May 3, 2019 at 2:15 AM Yuming Wang  wrote:
>>
>>> Yes. This is the last one.
>>>
>>> On Fri, May 3, 2019 at 4:15 PM Hyukjin Kwon  wrote:
>>>
>>>> Thank you so much Alan.
>>>>
>>>> As far as I can tell, we're all good except HIVE-21680 for now.
>>>> Yuming, is the last one? or are you still investigating?
>>>>
>>>> 2019년 4월 26일 (금) 오전 6:56, Alan Gates 님이 작성:
>>>>
>>>>> Yuming, and Hjukjin, I've committed HIVE-21639 and HIVE-21536 to
>>>>> branch-2.3.  Let me know once you've tested this against Spark 3 and are
>>>>> ready to start the release process.
>>>>>
>>>>> Alan.
>>>>>
>>>>> On Thu, Apr 25, 2019 at 9:47 AM Alan Gates 
>>>>> wrote:
>>>>>
>>>>>> Yuming Wang and Hjukjin Kwon have proposed releasing a Hive 2.3.5
>>>>>> that Spark can use.  They need to push a couple of back ports into
>>>>>> branch-2.3 first.  See [1] and [2] for details.
>>>>>>
>>>>>> I'm willing to work with them to get this done.
>>>>>>
>>>>>> Alan.
>>>>>>
>>>>>> 1.
>>>>>> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16822802=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16822802
>>>>>> 2.
>>>>>> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16826113=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826113
>>>>>>
>>>>>>


Re: HIVE-21639 and Hive 2.3.5 release

2019-05-03 Thread Alan Gates
Ok, I'll do a new build with that patch and post another snapshot so you
can test with it.

Alan.

On Fri, May 3, 2019 at 2:15 AM Yuming Wang  wrote:

> Yes. This is the last one.
>
> On Fri, May 3, 2019 at 4:15 PM Hyukjin Kwon  wrote:
>
>> Thank you so much Alan.
>>
>> As far as I can tell, we're all good except HIVE-21680 for now.
>> Yuming, is the last one? or are you still investigating?
>>
>> 2019년 4월 26일 (금) 오전 6:56, Alan Gates 님이 작성:
>>
>>> Yuming, and Hjukjin, I've committed HIVE-21639 and HIVE-21536 to
>>> branch-2.3.  Let me know once you've tested this against Spark 3 and are
>>> ready to start the release process.
>>>
>>> Alan.
>>>
>>> On Thu, Apr 25, 2019 at 9:47 AM Alan Gates  wrote:
>>>
>>>> Yuming Wang and Hjukjin Kwon have proposed releasing a Hive 2.3.5 that
>>>> Spark can use.  They need to push a couple of back ports into branch-2.3
>>>> first.  See [1] and [2] for details.
>>>>
>>>> I'm willing to work with them to get this done.
>>>>
>>>> Alan.
>>>>
>>>> 1.
>>>> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16822802=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16822802
>>>> 2.
>>>> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16826113=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826113
>>>>
>>>>


Re: Question about 2.3.5-SNAPSHOT availability

2019-04-26 Thread Alan Gates
I've pushed a 2.3.5-SNAPSHOT to Apache's maven repository.  You should be
able to use that for testing.  I did not restart the nightly build, so if
we add more patches we'll need to re-upload a new snapshot.

Alan.

On Fri, Apr 26, 2019 at 12:25 AM Hyukjin Kwon  wrote:

> Hi Hive team,
>
> I and Yuming are trying for Apache Spark to upgrade Hive to 2.3.5, which I
> suspect to be released soon.
> So, we wanted to test this via automated CI tool with SNAPSHOT if
> available.
>
> Is it possible to revive
>
> https://repository.apache.org/content/repositories/snapshots/org/apache/hive/hive/2.3.5-SNAPSHOT
>  ?
>
> Or would it be better to use RC when the vote is open?
>


Re: HIVE-21639 and Hive 2.3.5 release

2019-04-25 Thread Alan Gates
Yuming, and Hjukjin, I've committed HIVE-21639 and HIVE-21536 to
branch-2.3.  Let me know once you've tested this against Spark 3 and are
ready to start the release process.

Alan.

On Thu, Apr 25, 2019 at 9:47 AM Alan Gates  wrote:

> Yuming Wang and Hjukjin Kwon have proposed releasing a Hive 2.3.5 that
> Spark can use.  They need to push a couple of back ports into branch-2.3
> first.  See [1] and [2] for details.
>
> I'm willing to work with them to get this done.
>
> Alan.
>
> 1.
> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16822802=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16822802
> 2.
> https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16826113=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826113
>
>


Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-25 Thread Alan Gates
My suggestion does require a change to your ETL process, but it doesn't
require you to copy the data into HDFS or to create storage clusters.  Hive
managed tables can reside in S3 with no problem.

Alan.

On Thu, Apr 25, 2019 at 2:18 PM Thai Bui  wrote:

> Your suggested workflow will work and it would require us to re-ETL data
> from S3 to all over the place to multiple clusters. This is a cumbersome
> approach since most of our data reside on S3 and clusters are somewhat
> transient in nature (in the order of a few months for a redeployment &
> don't have large HDFS capacity).
>
> We do scale clusters up and down for compute but not for storage since HDFS
> is not easy to be scaled down on demand. It would be much more preferable
> in this architecture to have Hive behaves as a pure compute engine that can
> be accelerated through query result caching and materialized views.
>
> I'm not that familiar with Hive 3 implementation to know if this feature
> would be simple to make. I was hoping to change only the front-end of Hive
> and keep the ACID back-end implementation intact. For example, we could
> reuse the transactional_properties and add 'read_only' as a new value. With
> read-only tables, all INSERT, UPDATE, DELETE statements will fail at Hive
> front-end. Thus, it ensures that the ACID properties are guaranteed and the
> rest of ACID assumptions on the backend could continue to work. For DDL
> operations, since it has to go through the metastore I think it would
> automatically work with the current ACID code base and the only thing we
> need to do is to enable (where it was disabled) and test it.
>
> On Wed, Apr 24, 2019 at 6:05 PM Alan Gates  wrote:
>
> > Would a workflow like the following work then:
> > 1. Non-Hive tool produces data
> > 2. Do a Hive load into a managed table.  This effectively takes a
> snapshot
> > of the data.
> > 3. Now you still have the data for Non-Hive tools to operate on, and in
> > Hive you get all the Hive 3 goodness.
> >
> > This would introduce an additional copy of the data.  It would be
> > interesting to look at adding a copy on write semantic to a partition to
> > avoid this copy, but you don't need that to get going.
> >
> > I'm not opposed to what you're suggesting, I'm just wondering if there
> are
> > other ways that will save you work and that will keep Hive more simple.
> >
> > Alan.
> >
> > On Wed, Apr 24, 2019 at 2:07 PM Thai Bui  wrote:
> >
> > > As I understand, read-only ACID tables only work if your table is a
> > managed
> > > table (so you'll have to create your table with CREATE TABLE
> > > .. TBLPROPERTIES ('transactional_properties'='insert_only') ) and Hive
> > will
> > > control the data layout.
> > >
> > > Unfortunately, in my case, I'm concerned with external tables where
> data
> > is
> > > written by other tools such as Spark, PySpark, Sqoop or older Hive
> > clusters
> > > and Hadoop-based systems to cloud storage such as S3. My wish is to
> have
> > > materialized views and query result caching work directly on those data
> > if
> > > and only if the table is registered as an external, read-only table in
> > Hive
> > > 3 via the same ACID mechanism.
> > >
> > > On Wed, Apr 24, 2019 at 3:35 PM Alan Gates 
> wrote:
> > >
> > > > Have you looked at the insert only ACID tables in Hive 3 (
> > > > https://issues.apache.org/jira/browse/HIVE-14535 )?  These were
> > designed
> > > > specifically with the cloud in mind, since the way Hive traditionally
> > > adds
> > > > new data doesn't work well in the cloud.  And they do not require
> ORC,
> > > they
> > > > work with any file format.
> > > >
> > > > Alan.
> > > >
> > > > On Wed, Apr 24, 2019 at 12:04 PM Thai Bui 
> wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > Hive 3 has brought significant changes to the community with the
> > > support
> > > > > for ACID tables as default managed tables. With ACID tables, we can
> > use
> > > > > features such as materialized views, query result caching for BI
> > tools
> > > > and
> > > > > more. But without ACID tables such as external tables, Hive doesn't
> > > > support
> > > > > any of these advanced features which makes a majority of
> cloud-native
> > > > users
> > > > > like me sad :(.
> > > > >
> > > > > I propose we shoul

HIVE-21639 and Hive 2.3.5 release

2019-04-25 Thread Alan Gates
Yuming Wang and Hjukjin Kwon have proposed releasing a Hive 2.3.5 that
Spark can use.  They need to push a couple of back ports into branch-2.3
first.  See [1] and [2] for details.

I'm willing to work with them to get this done.

Alan.

1.
https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16822802=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16822802
2.
https://issues.apache.org/jira/browse/HIVE-21639?focusedCommentId=16826113=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16826113


Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-24 Thread Alan Gates
Would a workflow like the following work then:
1. Non-Hive tool produces data
2. Do a Hive load into a managed table.  This effectively takes a snapshot
of the data.
3. Now you still have the data for Non-Hive tools to operate on, and in
Hive you get all the Hive 3 goodness.

This would introduce an additional copy of the data.  It would be
interesting to look at adding a copy on write semantic to a partition to
avoid this copy, but you don't need that to get going.

I'm not opposed to what you're suggesting, I'm just wondering if there are
other ways that will save you work and that will keep Hive more simple.

Alan.

On Wed, Apr 24, 2019 at 2:07 PM Thai Bui  wrote:

> As I understand, read-only ACID tables only work if your table is a managed
> table (so you'll have to create your table with CREATE TABLE
> .. TBLPROPERTIES ('transactional_properties'='insert_only') ) and Hive will
> control the data layout.
>
> Unfortunately, in my case, I'm concerned with external tables where data is
> written by other tools such as Spark, PySpark, Sqoop or older Hive clusters
> and Hadoop-based systems to cloud storage such as S3. My wish is to have
> materialized views and query result caching work directly on those data if
> and only if the table is registered as an external, read-only table in Hive
> 3 via the same ACID mechanism.
>
> On Wed, Apr 24, 2019 at 3:35 PM Alan Gates  wrote:
>
> > Have you looked at the insert only ACID tables in Hive 3 (
> > https://issues.apache.org/jira/browse/HIVE-14535 )?  These were designed
> > specifically with the cloud in mind, since the way Hive traditionally
> adds
> > new data doesn't work well in the cloud.  And they do not require ORC,
> they
> > work with any file format.
> >
> > Alan.
> >
> > On Wed, Apr 24, 2019 at 12:04 PM Thai Bui  wrote:
> >
> > > Hello all,
> > >
> > > Hive 3 has brought significant changes to the community with the
> support
> > > for ACID tables as default managed tables. With ACID tables, we can use
> > > features such as materialized views, query result caching for BI tools
> > and
> > > more. But without ACID tables such as external tables, Hive doesn't
> > support
> > > any of these advanced features which makes a majority of cloud-native
> > users
> > > like me sad :(.
> > >
> > > I propose we should support a more limited version of read-only
> external
> > > tables such that materialized views and query result caching would
> work.
> > > For example:
> > >
> > > CREATE EXTERNAL TABLE table_name (..) STORED AS ORC
> > > LOCATION 's3://some-bucket/some-dir'
> > > TBLPROPERTIES ('read-only': "true");
> > >
> > > In such tables, any data modification operations such as INSERT and
> > UPDATE
> > > would fail and DDL operations that "add" or "remove" partitions to the
> > > table would succeed such as "ALTER TABLE ... ADD PARTITION". This would
> > > make it possible for Hive to invalidate the cache and materialized
> views
> > > even when the table is an external table.
> > >
> > > Let me know what do you guys think and maybe I can start writing a wiki
> > > document describing the approach in greater details.
> > >
> > > Thanks,
> > > Thai
> > >
> >
>
>
> --
> Thai
>


Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-24 Thread Alan Gates
Have you looked at the insert only ACID tables in Hive 3 (
https://issues.apache.org/jira/browse/HIVE-14535 )?  These were designed
specifically with the cloud in mind, since the way Hive traditionally adds
new data doesn't work well in the cloud.  And they do not require ORC, they
work with any file format.

Alan.

On Wed, Apr 24, 2019 at 12:04 PM Thai Bui  wrote:

> Hello all,
>
> Hive 3 has brought significant changes to the community with the support
> for ACID tables as default managed tables. With ACID tables, we can use
> features such as materialized views, query result caching for BI tools and
> more. But without ACID tables such as external tables, Hive doesn't support
> any of these advanced features which makes a majority of cloud-native users
> like me sad :(.
>
> I propose we should support a more limited version of read-only external
> tables such that materialized views and query result caching would work.
> For example:
>
> CREATE EXTERNAL TABLE table_name (..) STORED AS ORC
> LOCATION 's3://some-bucket/some-dir'
> TBLPROPERTIES ('read-only': "true");
>
> In such tables, any data modification operations such as INSERT and UPDATE
> would fail and DDL operations that "add" or "remove" partitions to the
> table would succeed such as "ALTER TABLE ... ADD PARTITION". This would
> make it possible for Hive to invalidate the cache and materialized views
> even when the table is an external table.
>
> Let me know what do you guys think and maybe I can start writing a wiki
> document describing the approach in greater details.
>
> Thanks,
> Thai
>


[jira] [Created] (HIVE-21616) Implement JSON_VALUE, JSON_QUERY, and IS [NOT] JSON

2019-04-15 Thread Alan Gates (JIRA)
Alan Gates created HIVE-21616:
-

 Summary: Implement JSON_VALUE, JSON_QUERY, and IS [NOT] JSON
 Key: HIVE-21616
 URL: https://issues.apache.org/jira/browse/HIVE-21616
 Project: Hive
  Issue Type: Sub-task
  Components: UDF
Affects Versions: 4.0.0
Reporter: Alan Gates
Assignee: Alan Gates






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21615) Implement SQL:2016 JSON functions

2019-04-15 Thread Alan Gates (JIRA)
Alan Gates created HIVE-21615:
-

 Summary: Implement SQL:2016 JSON functions
 Key: HIVE-21615
 URL: https://issues.apache.org/jira/browse/HIVE-21615
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 4.0.0
Reporter: Alan Gates
Assignee: Alan Gates


SQL:2016 specifies several new functions for processing JSON in SQL.  Hive 
should implement these functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Running Hive Without Hadoop?

2019-04-06 Thread Alan Gates
It depends on what you mean by run without Hadoop.  You can run HS2 without
a Hadoop cluster.  Take a look at
https://github.com/alanfgates/sqltest/blob/master/dbs/hive/v3_1/Dockerfile
for an example Dockerfile.  But Hadoop code is still needed in the path.
Hive depends on Hadoop code for a number of things and cannot build without
the Hadoop jars.

Alan.

On Fri, Apr 5, 2019 at 12:12 PM David M  wrote:

> All,
>
> I'm trying to run a Hiveserver2 instance locally with the latest code out
> of the repo to test out a few JIRAs. There is some documentation here on
> how to run Hive without Hadoop, but it is based on ant build protocols:
> https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-RunningHiveWithoutaHadoopCluster
>
> I've tried running mvn package, but when I try to run bin/hive, I get the
> following error:
>
> Missing Hive Execution Jar: /root/hive/lib/hive-exec-*.jar
>
> What command do I need to use with maven in order to build the binaries in
> a way that will let me run the binaries in the way I need to do some
> testing?
>
> Thanks!
>
> David McGinnis
>


Re: JSON license listed in NOTICE

2019-03-12 Thread Alan Gates
See https://issues.apache.org/jira/browse/HIVE-15144

That patch neglected to remove it from the NOTICE file.

Alan.

On Tue, Mar 12, 2019 at 1:28 AM Justin Mclean  wrote:

> Hi,
>
> A podling (IoTDB) was looking at including some code form Hive and I
> noticed your NOTICE file has a line about the software including code under
> the JSON license. Given the JSON license is Category X [1] I'm not sure how
> this is possible. Can anyone explain how this came to be?
>
> Please CC me on any replies as I'm not subscribed to this list.
>
> Thanks,
> Justin
>
> 1. https://www.apache.org/legal/resolved.html#category-x
>


Re: [DISCUSS] Move gitbox notification emails to another list?

2019-02-28 Thread Alan Gates
+1 to sending the github mail to a separate list.

I agree with Peter that seeing PR reviews is good.  Wouldn't it be possible
to craft a filter that only allowed through these mails?

Alan.



On Thu, Feb 28, 2019 at 12:11 AM Peter Vary 
wrote:

> The github mails in this form are just white noise, so I have added
> filters to my mailbox to drop every github mail to another folder, just as
> proposed by Jesús. So if we can do it by the infra it would be better. +1
> from me.
>
> On the other hand, I miss the "review request created" messages of the
> review board. These type of messages are lost in the shower of the github
> letters. If anyone has an idea how to detect these that would be awesome.
>
> Thanks,
> Peter
>
> > On Feb 28, 2019, at 08:59, Mani M  wrote:
> >
> > Good.
> >
> > With Regards
> > M.Mani
> > +61 432 461 087
> >
> > On Thu, 28 Feb 2019, 17:39 Jesus Camacho Rodriguez, <
> > jcamachorodrig...@hortonworks.com> wrote:
> >
> >> We have had a similar discussion in the Calcite project too.
> >>
> >> Gitbox emails are being set to the dev@ list. These emails are produced
> >> every time there is activity in the Hive Github repository and are
> creating
> >> quite a lot of noise. The result is that it is really difficult to
> follow
> >> any activity in the list.
> >>
> >> A possible alternative would be to send them to another list, e.g.
> >> gitbox@h.a.o. The idea is that having them in a list may still be
> useful
> >> as they may serve as a searchable archive of activity in the repo.
> >>
> >> What do you think? Should we open an INFRA ticket to request this?
> >>
> >> Thanks,
> >> Jesús
> >>
> >>
>
>


Re: review board

2019-01-23 Thread Alan Gates
Not that I know of, but you could talk to Apache infra,
https://www.apache.org/dev/infra-contact

Alan.

On Fri, Jan 18, 2019 at 11:58 PM Malcolm Taylor  wrote:

> Hi,
> I want to log on to the review board, but can't remember my password. Is
> there a way to reset it ?
> thanks,
> Malcolm
>


Re: Hive Code Contribution

2018-12-17 Thread Alan Gates
Hi Mani, and welcome to the Hive community.  We have a newbie label that we
use in JIRA to mark issues for new contributors.

https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20status%20%3D%20Open%20AND%20labels%20%3D%20newbie

Alan.



On Sat, Dec 15, 2018 at 6:59 PM Mani M  wrote:

> Hi Team,
>
> I'm planning to work as a code contributor for hive project.
>
> To start-with I like to take the jira tickets with easy fixes.
>
> Can anyone help me to find out the jira tickets for the starters.
>
> With Regards
> M.Mani
> +61 432 461 087
>


Re: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-11 Thread Alan Gates
IIUC it's the only step necessary to have one place where we could review
and merge pull requests.  There'd be no more need for attaching patches to
JIRA.  Of course we'd need to update our CI processes to match.

Alan.

On Tue, Dec 11, 2018 at 12:46 PM Slim Bouguerra  wrote:

> +1 on Early move.
>
> Having direct write access to Github is a good first step to move toward
> one place that can be used to review and merge pull requests IMO.
>
> On Dec 11, 2018, at 12:06 PM, Alan Gates  wrote:
>
> This includes the hive project, as we're on the older git-wip-us.  Do we
> want to move sooner or later?
>
> Alan.
>
> -- Forwarded message -
> From: Daniel Gruno 
> Date: Fri, Dec 7, 2018 at 8:52 AM
> Subject: [NOTICE] Mandatory relocation of Apache git repositories on
> git-wip-us.apache.org
> To: us...@infra.apache.org 
>
>
> [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
>  DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
>
> Hello Apache projects,
>
> I am writing to you because you may have git repositories on the
> git-wip-us server, which is slated to be decommissioned in the coming
> months. All repositories will be moved to the new gitbox service which
> includes direct write access on github as well as the standard ASF
> commit access via gitbox.apache.org.
>
> ## Why this move? ##
> The move comes as a result of retiring the git-wip service, as the
> hardware it runs on is longing for retirement. In lieu of this, we
> have decided to consolidate the two services (git-wip and gitbox), to
> ease the management of our repository systems and future-proof the
> underlying hardware. The move is fully automated, and ideally, nothing
> will change in your workflow other than added features and access to
> GitHub.
>
> ## Timeframe for relocation ##
> Initially, we are asking that projects voluntarily request to move
> their repositories to gitbox, hence this email. The voluntary
> timeframe is between now and January 9th 2019, during which projects
> are free to either move over to gitbox or stay put on git-wip. After
> this phase, we will be requiring the remaining projects to move within
> one month, after which we will move the remaining projects over.
>
> To have your project moved in this initial phase, you will need:
>
> - Consensus in the project (documented via the mailing list)
> - File a JIRA ticket with INFRA to voluntarily move your project repos
>   over to gitbox (as stated, this is highly automated and will take
>   between a minute and an hour, depending on the size and number of
>   your repositories)
>
> To sum up the preliminary timeline;
>
> - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
>   relocation
> - January 9th -> February 6th: Mandated (coordinated) relocation
> - February 7th: All remaining repositories are mass migrated.
>
> This timeline may change to accommodate various scenarios.
>
> ## Using GitHub with ASF repositories ##
> When your project has moved, you are free to use either the ASF
> repository system (gitbox.apache.org) OR GitHub for your development
> and code pushes. To be able to use GitHub, please follow the primer
> at: https://reference.apache.org/committer/github
>
>
> We appreciate your understanding of this issue, and hope that your
> project can coordinate voluntarily moving your repositories in a
> timely manner.
>
> All settings, such as commit mail targets, issue linking, PR
> notification schemes etc will automatically be migrated to gitbox as
> well.
>
> With regards, Daniel on behalf of ASF Infra.
>
> PS:For inquiries, please reply to us...@infra.apache.org, not your
> project's dev list :-).
>


Fwd: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-11 Thread Alan Gates
This includes the hive project, as we're on the older git-wip-us.  Do we
want to move sooner or later?

Alan.

-- Forwarded message -
From: Daniel Gruno 
Date: Fri, Dec 7, 2018 at 8:52 AM
Subject: [NOTICE] Mandatory relocation of Apache git repositories on
git-wip-us.apache.org
To: us...@infra.apache.org 


[IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
  DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]

Hello Apache projects,

I am writing to you because you may have git repositories on the
git-wip-us server, which is slated to be decommissioned in the coming
months. All repositories will be moved to the new gitbox service which
includes direct write access on github as well as the standard ASF
commit access via gitbox.apache.org.

## Why this move? ##
The move comes as a result of retiring the git-wip service, as the
hardware it runs on is longing for retirement. In lieu of this, we
have decided to consolidate the two services (git-wip and gitbox), to
ease the management of our repository systems and future-proof the
underlying hardware. The move is fully automated, and ideally, nothing
will change in your workflow other than added features and access to
GitHub.

## Timeframe for relocation ##
Initially, we are asking that projects voluntarily request to move
their repositories to gitbox, hence this email. The voluntary
timeframe is between now and January 9th 2019, during which projects
are free to either move over to gitbox or stay put on git-wip. After
this phase, we will be requiring the remaining projects to move within
one month, after which we will move the remaining projects over.

To have your project moved in this initial phase, you will need:

- Consensus in the project (documented via the mailing list)
- File a JIRA ticket with INFRA to voluntarily move your project repos
   over to gitbox (as stated, this is highly automated and will take
   between a minute and an hour, depending on the size and number of
   your repositories)

To sum up the preliminary timeline;

- December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
   relocation
- January 9th -> February 6th: Mandated (coordinated) relocation
- February 7th: All remaining repositories are mass migrated.

This timeline may change to accommodate various scenarios.

## Using GitHub with ASF repositories ##
When your project has moved, you are free to use either the ASF
repository system (gitbox.apache.org) OR GitHub for your development
and code pushes. To be able to use GitHub, please follow the primer
at: https://reference.apache.org/committer/github


We appreciate your understanding of this issue, and hope that your
project can coordinate voluntarily moving your repositories in a
timely manner.

All settings, such as commit mail targets, issue linking, PR
notification schemes etc will automatically be migrated to gitbox as
well.

With regards, Daniel on behalf of ASF Infra.

PS:For inquiries, please reply to us...@infra.apache.org, not your
project's dev list :-).


Re: Why are TXN IDs not partitioned per database?

2018-11-20 Thread Alan Gates
History.  Originally there were only transaction ids, which were global.
Write ids for tables came later as a way to limit the amount of information
each transaction needed to track and to make it easier to replicate table
changes between Hive instances.

But even if we had put them in from the start, we'd have them span
databases, otherwise transactions couldn't span databases.  Hive has no
restrictions on queries spanning databases so we wouldn't want to restrict
transactions from doing so.

Alan.

On Tue, Nov 20, 2018 at 7:32 AM Granville Barnett <
granvillebarn...@gmail.com> wrote:

> Hi,
>
> Reading the source code of Hive 3.x and I have a question regarding
> transaction IDs which form the span of a transaction: it's begin (TXN ID)
> and commit ID (NEXT_TXN_ID at time of commit).
>
> Why is it that we have a global timeline for transactions rather than a
> timeline partitioned at the granularity of a database, kind of similar to
> how write IDs are partitioned per table but at the database scope?
>
> E.g.,
>
> NEXT_TXN_ID
> +---+---+
> | DB| NTXN_NEXT  |
> +---+---+
> | test1 | 23   |
> | test2 | 4 |
> +---+---+
>
> Same question could also be applied to NEXT_LOCK_ID.
>
> I am just curious because it seems like partitioning the transaction (and
> lock IDs) would reduce the granularity of locking in the various
> transactional methods. For example, openTxn invocations are mutexed with
> all other openTxn invocations even if they are for transactions running in
> distinct database domains.  Similarly for openTxn mutexing with respect to
> commitTxn if there is a write-write conflict, which I would have thought
> would only be the case if they are applicable to the same database. I'm
> sure that this would have the side effect of increasing the complexity of
> other subsystems but I had to ask what the rationale was behind this.
>
> (I'm new to Hive to please forgive me if the answer is obvious.)
>
> Regards,
>
> Granville
>


Re: [VOTE] Apache Hive 2.3.4 Release Candidate 0

2018-11-01 Thread Alan Gates
+1.  Checked the signatures, did a build with a clean maven repo, did a RAT
build.

Alan.

On Wed, Oct 31, 2018 at 2:45 PM Daniel Dai  wrote:

> Apache Hive 2.3.4 Release Candidate 0 is available here:
>
> http://people.apache.org/~daijy/apache-hive-2.3.4-rc-0/
>
> Maven artifacts are available here:
>
> https://repository.apache.org/content/repositories/orgapachehive-1093
>
> Source tag for RCN is at:
>
> https://github.com/apache/hive/tree/release-2.3.4-rc0
>
> Voting will conclude in 72 hours.
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>
>
>


[jira] [Created] (HIVE-20451) Metastore client and server tarball issues

2018-08-23 Thread Alan Gates (JIRA)
Alan Gates created HIVE-20451:
-

 Summary: Metastore client and server tarball issues
 Key: HIVE-20451
 URL: https://issues.apache.org/jira/browse/HIVE-20451
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 4.0.0
Reporter: Alan Gates


With the split of the metastore into common and server there are now two sets 
of tarballs.  There are a couple of issues here.
 # It doesn't make sense to have separate source tarballs for each.  The source 
release should still be done from the standalone-metastore directory and 
include all code for the metastore.
 # The binary tarballs should have separate names.  At the moment both are 
named apache-hive-metastore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Hive 3 - HCatalog/webHcat

2018-08-17 Thread Alan Gates
Hive 3 still supports HCat and webHCat.  Hortonworks shut down command line
access in HDP for security reasons.

Alan.

On Fri, Aug 17, 2018 at 11:48 AM Sri Kumaran Thirupathy <
srikumara...@gmail.com> wrote:

> Hi All,
>
> Is HCatalog/webHCat not supported in Hive 3? If so, what is the
> alternative?
>
>
> https://community.hortonworks.com/questions/212757/why-hcatalog-and-webhcat-is-not-supported-in-hdp-3.html
>
> Thanks,
> Sri
>


Re: Can anyone knows how to build hive 0.12 version?

2018-08-17 Thread Alan Gates
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation#AdminManualInstallation-InstallingfromSourceCode(Hive0.12.0andEarlier)

Alan.

On Fri, Aug 17, 2018 at 9:53 AM Zhang, Liyun  wrote:

> Hi all:
> Now I am using hive-0.12 to build package.  In this version, we seems not
> use mvn.  Can anyone know how to build a package in hive-0.12 or provide a
> link ?
>
>
>
> Best Regards
> ZhangLiyun/Kelly Zhang
>
>


Re: Review Request 67954: HIVE-20194: HiveMetastoreClient should use reflection to instantiate embedded HMS instance

2018-08-10 Thread Alan Gates

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67954/#review207091
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
Lines 110 (patched)
<https://reviews.apache.org/r/67954/#comment290320>

You define these two values both here and in HiveMetaStoreClientPreCatalog. 
 Wouldn't it make more sense to define them once in a shared location (like 
IMetaStoreClient maybe)?


- Alan Gates


On Aug. 9, 2018, 10:35 p.m., Alexander Kolbasov wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67954/
> ---
> 
> (Updated Aug. 9, 2018, 10:35 p.m.)
> 
> 
> Review request for hive, Alan Gates, Peter Vary, Sahil Takiar, and Vihang 
> Karajgaonkar.
> 
> 
> Bugs: HIVE-20194
> https://issues.apache.org/jira/browse/HIVE-20194
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20194: HiveMetastoreClient should use reflection to instantiate embedded 
> HMS instance
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  a53d4be03d695bf2176436967026757391531bc9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  91c86a749c7afb06737c850e57f60820710c51f5 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  34055d2d4d39dc63d505a5ef95d190aa80a49d14 
> 
> 
> Diff: https://reviews.apache.org/r/67954/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Alexander Kolbasov
> 
>



Re: Update protobuf version in pom.xml

2018-08-09 Thread Alan Gates
For info on submitting the change, see
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CreatingaPatch
Note that once you have created the JIRA ticket you can also reference that
JIRA ticket in a PR and github will annotate the JIRA ticket with the PR
link.  But we still need a patch since that's how the current CI system
works.

I'll let others reply to the protoc part of the question, as I don't know
the answer.

Alan.



On Wed, Aug 8, 2018 at 11:42 PM Naresh Bhat  wrote:

> Hi,
>
> I was trying to compile the Hive master branch on AArch64 hardware.  I am
> facing the protoc issue because in pom.xml file use the version 2.5.0 which
> does not have AArch64 binaries available
> https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/ .  After
> updating it to latest version 3.6.1, I could able to compile Hive master
> branch without any issues on ARM64 machine. I have created a patch and
> which is available -
>
> https://git.linaro.org/people/naresh.bhat/apache/hive.git/commit/?id=14410fbd6a3203a39f2503368c5e51dc6d11b432
>
> My questions are
>
> 1. Why still old protoc version is being used in Hive pom.xml files ?  Can
> we update it to latest available version i.e. 3.6.1 ?
> 2. How should I give the patch pull request i.e. through git-mail-send or
> via using github ?
>
> Thanks and Regards
> -Naresh Bhat
>


Re: Question on Class Naming

2018-07-30 Thread Alan Gates
Class naming isn't very consistent across much of Hive.  I'd name it
GenericUDFTimestampAnsi or something that captures the fact that it's a UDF
and not a cast.

Alan.

On Sat, Jul 28, 2018 at 8:23 PM Shawn Weeks 
wrote:

> Hi, I’m working on a proper ansi sql to_timestamp function for a project
> and I’d like it to be done so that I can contribute it. I’m trying to
> figure out what to call the class though. Normally I’d assume
> GenericUDFTimestamp would make sense to match the existing to_date function
> but currently the timestamp casting function is called that while all the
> other casting functions are called GenericUDFToTypeName. Any ideas?
>
> Thanks
> Shawn
>


Re: [VOTE] Should we release storage-api 2.7.0 rc0?

2018-07-09 Thread Alan Gates
+1.  Did a build with a clean maven repo, checked the signature and sha
hash, ran RAT.

Alan.

On Fri, Jul 6, 2018 at 2:21 PM Deepak Jaiswal 
wrote:

> Hi,
>
> I would like to make a new release of the storage-api. It contains changes
> required for Hive 3.1 release.
>
> Artifcats:
> Tag :
> https://github.com/apache/hive/releases/tag/storage-release-2.7.0-rc0
> Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/
>
> Regards,
> Deepak
>


[jira] [Created] (HIVE-20106) Backport HIVE-20060 (HiveSchemaTool and MetastoreSchemaTool refactor) to branch-3

2018-07-06 Thread Alan Gates (JIRA)
Alan Gates created HIVE-20106:
-

 Summary: Backport HIVE-20060 (HiveSchemaTool and 
MetastoreSchemaTool refactor) to branch-3
 Key: HIVE-20106
 URL: https://issues.apache.org/jira/browse/HIVE-20106
 Project: Hive
  Issue Type: Task
  Components: Beeline, Metastore
Reporter: Alan Gates
Assignee: Alan Gates






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20060) Refactor HiveSchemaTool and MetastoreSchemaTool

2018-07-02 Thread Alan Gates (JIRA)
Alan Gates created HIVE-20060:
-

 Summary: Refactor HiveSchemaTool and MetastoreSchemaTool
 Key: HIVE-20060
 URL: https://issues.apache.org/jira/browse/HIVE-20060
 Project: Hive
  Issue Type: Task
  Components: Beeline, Metastore
Reporter: Alan Gates
Assignee: Alan Gates


These two classes are 95% the same.  Now that HIVE-19711 has split 
HiveSchemaTool into multiple components it will be much easier to refactor 
these so that there is only one version of the code that each shares.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20058) Backport HIVE-19711 (HiveSchemaTool refactor) to branch-3

2018-07-02 Thread Alan Gates (JIRA)
Alan Gates created HIVE-20058:
-

 Summary: Backport HIVE-19711 (HiveSchemaTool refactor) to branch-3
 Key: HIVE-20058
 URL: https://issues.apache.org/jira/browse/HIVE-20058
 Project: Hive
  Issue Type: Task
  Components: Beeline
Reporter: Alan Gates
Assignee: Alan Gates






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Cleaning up old version in dist

2018-06-26 Thread Alan Gates
I removed Hive 2.1.1, 2.2.0, and storage-api-2.5.0.  I left 1.2.2 since
some people are still using Hive version 1.  I did remove the stable link
to 1.2.2 as we don't want users to think 1.2.2 is our last stable version.
There is a stable-2 link pointing to 2.3.3, which I left.  I assume once we
have a stable version of 3 (3.1?) we'll create a stable-3 link that points
to it.

Alan.

On Wed, Jun 13, 2018 at 1:40 PM Owen O'Malley 
wrote:

> The more accurate guidance is to keep the latest release from each branch
> that is being actively maintained. At this point, that means 2.3 and 3.0
> for Hive. I'd propose that we keep those versions of storage-api that match
> those versions of Hive. That means we should keep 2.4 and 2.6 for storage
> api.
>
> .. Owen
>
>
> On Fri, Jun 8, 2018 at 1:27 AM, Thejas Nair  wrote:
>
> > +1
> >
> > On Thu, Jun 7, 2018 at 11:13 AM, Alan Gates 
> wrote:
> > > Apache asks that we keep at most 2 current versions in dist, to
> minimize
> > > the space we take up on distribution mirrors.  Since we are running
> > > multiple lines and a have a couple of separately releasable modules
> we'll
> > > have more than 2 versions there.  But we have old versions of Hive 2
> > (2.1,
> > > 2.2) and of the storage-api (2.4, 2.5).  I think we should remove
> these.
> > > That will leave us with the most up to date versions of Hive 1, 2, 3,
> the
> > > storage api, and the standalone metastore.  Note that this does not
> > affect
> > > their availability in maven central or the apache archive.
> > >
> > > Alan.
> >
>


[jira] [Created] (HIVE-19984) Backport HIVE-15976 to branch-3

2018-06-25 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19984:
-

 Summary: Backport HIVE-15976 to branch-3
 Key: HIVE-19984
 URL: https://issues.apache.org/jira/browse/HIVE-19984
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 3.1.0
Reporter: Alan Gates
Assignee: Alan Gates






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19983) Backport HIVE-19769 to branch-3

2018-06-25 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19983:
-

 Summary: Backport HIVE-19769 to branch-3
 Key: HIVE-19983
 URL: https://issues.apache.org/jira/browse/HIVE-19983
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Affects Versions: 3.1.0
Reporter: Alan Gates
Assignee: Alan Gates


This patch will be needed for other catalog related work to be backported to 
branch-3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19982) Make JDBC work with Catalogs

2018-06-25 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19982:
-

 Summary: Make JDBC work with Catalogs
 Key: HIVE-19982
 URL: https://issues.apache.org/jira/browse/HIVE-19982
 Project: Hive
  Issue Type: Sub-task
  Components: JDBC
Affects Versions: 3.0.0
Reporter: Alan Gates


Many JDBC methods include a catalog specifier in the call.  We need to update 
these to work with multiple catalogs.

Also we will need to also support the JDBC calls for catalogs.  This at least 
includes Connection: getCatalog() and setCatalog() 
DatabaseMetaData: getCatalogs(), getCatalogSeparator(), and getCatalogTerm() 
and maybe others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Weird error in a generated thrift file

2018-06-14 Thread Alan Gates
What version of thrift do you have?

Alan.

On Thu, Jun 14, 2018 at 11:48 AM Sanchay Javeria 
wrote:

> Hi,
>
> I'm seeing a weird error in TBinaryColumn.java on line 383 inside the
> toString() method. There is a call being made to
> org.apache.thrift.TBaseHelper.toString(this.values, sb). Looking into this
> method tells that TBaseHelper.toString() assumes ByteBuffer as the first
> argument whereas this.values is of type List.
>
> Building the code with "mvn clean install -Pthriftif -DskipTests
> -Dthrift.home=/usr/local  -Phadoop-2" says:
>
>
> hive/service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java:[383,50]
> incompatible types: java.util.List cannot be converted
> to java.nio.ByteBuffer [ERROR]
>
> under the module "Hive Service".
> TBinaryColumn.java is generated by TCLIService.thrift (line 378) so I don't
> know what I should do to rectify this.
>
> Any help?
>


[jira] [Created] (HIVE-19871) add_partitions does not properly handle client being configured with a non-Hive catalog.

2018-06-12 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19871:
-

 Summary: add_partitions does not properly handle client being 
configured with a non-Hive catalog.
 Key: HIVE-19871
 URL: https://issues.apache.org/jira/browse/HIVE-19871
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


If a client calls 
 {{add_partitions(List parts, boolean ifNotExists, boolean 
needResults)}}
and the catalog name is set to a non-default value in the config file but unset 
in the partitions, the request to add the partition will fail with an error 
message "table not found" even when the table is valid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Getting started with Hive

2018-06-11 Thread Alan Gates
Take a look at DDLTask.describeTable.  That is the task run by the executor
to do a describe table.  The 'Hive db' argument is a handle to the
metastore client.  'DescTableDesc descTbl' contains the information from
the parser on what table to describe.

Are you doing this just to learn the system or do you want to add more
metadata that will be returned in show and describe calls?  If this is
something you plan to contribute you should file a JIRA and discuss your
potential changes there, so that you can get feedback on the design, since
adding additional metadata tables to the system is a change others will
want to review.

Alan.

On Mon, Jun 11, 2018 at 5:10 PM Sanchay Javeria 
wrote:

> Thanks Alan! I think I should've phrased my question better. Essentially
> I'm trying to return an extra field when a user describes a table. Say, you
> run `desc formatted foo`, We get some information back like, `table name`,
> `database` etc..
>
> I'm trying to return extra information about the table `foo`. So, I made a
> dummy SQL table where I, say, have the job name (fooBarJob) which populated
> table foo. Now if you run, `desc formatted foo`, you should get the jobName
> along with other fields.
>
> So basically, how exactly does a query like `desc` pulls its metadata from
> the backing RDBMS?
>
> Also, thanks a lot for the easy to follow execution rundown you gave for
> creating a table. Before contributing further, I feel a simple exercise
> like this one can help me understand things clearly.
>
> Thanks,
> Sanchay
>
> On Mon, Jun 11, 2018 at 12:29 PM, Alan Gates  wrote:
>
>> First, my apologies, I thought your first question was sent to dev@hive,
>> which is the right list.  Hence I've removed dev@community from my
>> reply.  If you haven't already you should subscribe to dev@hive.
>>
>> I'm not 100% sure I understand your question, but here's a place to
>> start.  If you do "create table foo..." in SQL that will eventually end up
>> in HiveMetaStore.create_table.  This handles checking the table and
>> creating any necessary directories, and then calls RawStore.createTable,
>> which will end up in ObjectStore.createTable.  This is where the values you
>> sent in createTable get written down to the RDBMS backing the metadata.
>> I'm not sure that's the answer to the question you were asking or not.
>>
>> Alan.
>>
>> On Mon, Jun 11, 2018 at 12:17 PM Sanchay Javeria 
>> wrote:
>>
>>> Hello,
>>>
>>> Thank you. I've also CC'ed @hive.apache.org
>>>
>>> I went through the dev docs on Hive and got an understanding of the
>>> architecture and the high level overview of how a HiveQL query execution
>>> proceeds. To get a better understanding, I decided to add a new field
>>> from
>>> a SQL table when a user describes a table by tweaking the hive meta
>>> store,
>>> in addition to fields like "Database:", OwnerType:" etc.
>>>
>>> I added a new hook to obtain a connection to a SQL server and placed a
>>> watcher under `startMetaStoreThreads()` in `HiveMetaStore.java`.
>>> I then found `getTableMetaDataInformation()` under
>>> `MetaDataFormatUtils.java` which populates the various fields like
>>> "Database", "OwnerType" etc. by calling getters on the `Table` instance.
>>>
>>> This lead me to `api/Table.java`, auto-generated by the Thrift compiler,
>>> which returns private instances for the getters above. However, I'm
>>> unable
>>> to understand how these private variables in `metastore/api/Table.java`
>>> populated? In other words, when we create a new table in Hive, where
>>> exactly is this metadata generated and populated so that it can be later
>>> fetched when describing a table?
>>>
>>> Please let me know if you need any further clarifications on the
>>> question!
>>>
>>> On Mon, Jun 11, 2018 at 12:13 PM, Alan Gates 
>>> wrote:
>>>
>>> > Yes, this is the place to ask dev questions.
>>> >
>>> > Alan.
>>> >
>>> > On Mon, Jun 11, 2018 at 12:10 PM Sanchay Javeria <
>>> javer...@illinois.edu>
>>> > wrote:
>>> >
>>> > > Hi fellow devs,
>>> > >
>>> > > I'm a computer science student at UIUC who just got started with
>>> Apache
>>> > > Hive, I'd love to contribute more towards the open JIRA tickets.
>>> > >
>>> > > I had some questions if anyone could help :) I was wondering if this
>>> > > mailing list is the right space to ask dev questions?
>>> > >
>>> > > Thank you,
>>> > > Sanchay
>>> > >
>>> >
>>>
>>
>


Re: Getting started with Hive

2018-06-11 Thread Alan Gates
First, my apologies, I thought your first question was sent to dev@hive,
which is the right list.  Hence I've removed dev@community from my reply.
If you haven't already you should subscribe to dev@hive.

I'm not 100% sure I understand your question, but here's a place to start.
If you do "create table foo..." in SQL that will eventually end up in
HiveMetaStore.create_table.  This handles checking the table and creating
any necessary directories, and then calls RawStore.createTable, which will
end up in ObjectStore.createTable.  This is where the values you sent in
createTable get written down to the RDBMS backing the metadata.  I'm not
sure that's the answer to the question you were asking or not.

Alan.

On Mon, Jun 11, 2018 at 12:17 PM Sanchay Javeria 
wrote:

> Hello,
>
> Thank you. I've also CC'ed @hive.apache.org
>
> I went through the dev docs on Hive and got an understanding of the
> architecture and the high level overview of how a HiveQL query execution
> proceeds. To get a better understanding, I decided to add a new field from
> a SQL table when a user describes a table by tweaking the hive meta store,
> in addition to fields like "Database:", OwnerType:" etc.
>
> I added a new hook to obtain a connection to a SQL server and placed a
> watcher under `startMetaStoreThreads()` in `HiveMetaStore.java`.
> I then found `getTableMetaDataInformation()` under
> `MetaDataFormatUtils.java` which populates the various fields like
> "Database", "OwnerType" etc. by calling getters on the `Table` instance.
>
> This lead me to `api/Table.java`, auto-generated by the Thrift compiler,
> which returns private instances for the getters above. However, I'm unable
> to understand how these private variables in `metastore/api/Table.java`
> populated? In other words, when we create a new table in Hive, where
> exactly is this metadata generated and populated so that it can be later
> fetched when describing a table?
>
> Please let me know if you need any further clarifications on the question!
>
> On Mon, Jun 11, 2018 at 12:13 PM, Alan Gates  wrote:
>
> > Yes, this is the place to ask dev questions.
> >
> > Alan.
> >
> > On Mon, Jun 11, 2018 at 12:10 PM Sanchay Javeria 
> > wrote:
> >
> > > Hi fellow devs,
> > >
> > > I'm a computer science student at UIUC who just got started with Apache
> > > Hive, I'd love to contribute more towards the open JIRA tickets.
> > >
> > > I had some questions if anyone could help :) I was wondering if this
> > > mailing list is the right space to ask dev questions?
> > >
> > > Thank you,
> > > Sanchay
> > >
> >
>


[DISCUSS] Catalog feature in 3.x

2018-06-11 Thread Alan Gates
The base of the catalog feature made it into Hive 3, though it's not very
usable yet since it is only in the metastore.  I plan to keep working on
this feature to add it to the rest of Hive.  The (ever growing) set of
tasks for doing this is tracked in HIVE-18685.

I believe this should get pushed into branch-3 since the beginning of the
feature is in 3 and it will likely be a year or more before there's a Hive
4.

Since branch-3 currently has a number of test failures I won't start
pushing patches yet, but once it stabilizes I'd like to push this feature
into branch-3.  Any concerns?

Alan.


Cleaning up old version in dist

2018-06-07 Thread Alan Gates
Apache asks that we keep at most 2 current versions in dist, to minimize
the space we take up on distribution mirrors.  Since we are running
multiple lines and a have a couple of separately releasable modules we'll
have more than 2 versions there.  But we have old versions of Hive 2 (2.1,
2.2) and of the storage-api (2.4, 2.5).  I think we should remove these.
That will leave us with the most up to date versions of Hive 1, 2, 3, the
storage api, and the standalone metastore.  Note that this does not affect
their availability in maven central or the apache archive.

Alan.


Re: [DISCUSS] Release of standalone-metastore

2018-06-07 Thread Alan Gates
I have pushed the standalone metastore src and bin tarballs and their
signatures and hashes into Hive's dist area, so they should soon be
available for download.  Congrats to all who worked on this!

As part of creating a release tag for the standalone metastore I noticed we
didn't have one for release 3.0.0, so I created a tag for that as well.

Alan.

On Tue, Jun 5, 2018 at 10:45 AM Alan Gates  wrote:

> I have put the binary and source objects up at
> https://home.apache.org/~gates/hive-standalone-metastore-3.0.0/ so
> everyone can take a look before I officially push them to dist.
>
> I don't think we need to vote on this as we have already officially
> released these objects, I'm just adding sha and gpg signatures for download
> purposes.  But, please take a look and make sure I did everything
> properly.  I'll push them to dist after a couple of days to give everyone a
> chance to look them over.
>
> Alan.
>
> On Wed, May 30, 2018 at 11:00 AM Vihang Karajgaonkar 
> wrote:
>
>> The proposal to post the source and bin to the distribution sounds good to
>> me. We can do the testing and release standalone-metastore 3.1 like to you
>> suggested above.
>>
>> On Tue, May 29, 2018 at 10:49 PM, Peter Vary  wrote:
>>
>> > What do you think about adding a ne profile, which adds a possibility to
>> > compile the code with one command, until we separate standalone
>> metastore
>> > to a new project? Like -Pitests, but -Pmetastore. So "mvn clean install
>> > -Pmetastore,itests" will compile everything.
>> >
>> > Alan Gates  ezt írta (időpont: 2018. máj. 30.,
>> Sze
>> > 0:42):
>> >
>> > > On Tue, May 29, 2018 at 3:29 PM Vihang Karajgaonkar <
>> vih...@cloudera.com
>> > >
>> > > wrote:
>> > >
>> > > > How about cutting out a branch-3.0.1 and releasing 3.0.1 with the
>> > pom.xml
>> > > > fixed? My concern with above approach is we haven't tested
>> > > > standalone-metastore when deployed independent of Hive.
>> > >
>> > > ​Actually, there is.  The tarballs for source and bin are already out
>> > > there.  If I post them on the distribution site then they'll be
>> easier to
>> > > find.  So we can test that now.  And we can then do a 3.1 release of
>> the
>> > > metastore whenever we want, as long as it's before a 3.1 release of
>> Hive.
>> > >
>> > > Alan.​
>> > >
>> > >
>> > > > So we don't know if
>> > > > there is something is fundamentally broken in that mode and given
>> that
>> > we
>> > > > don't know when 3.1 is going to be released it may remain in that
>> state
>> > > for
>> > > > long time which is not good. I think may be a good approach now
>> would
>> > be
>> > > to
>> > > > test 3.0 standalone-metastore and fix any issues along with the
>> pom.xml
>> > > > changes and do a 3.0.1 release. What do you think?
>> > > >
>> > > > Thanks,
>> > > > Vihang
>> > > >
>> > > > On Tue, May 29, 2018 at 1:57 PM, Alan Gates 
>> > > wrote:
>> > > >
>> > > > > In the thread on releasing Hive 3.0 I wrote
>> > > > > 
>> > > > > We should work on producing a standalone-metastore
>> > > > > release in the same time frame so that the schema's, etc. match. I
>> > can
>> > > RM
>> > > > > that unless someone else wants to.
>> > > > > 
>> > > > > https://lists.apache.org/thread.html/307b281c3742fdf6aeb7fac
>> > > > > 3ee74a98830400b67711755572de15b80@%3Cdev.hive.apache.org%3E
>> > > > >
>> > > > > My thinking was to produce a separate metastore release, like we
>> do
>> > for
>> > > > > storage-api.  However, I missed that I needed to do some work in
>> > > > branch-3.0
>> > > > > to disconnect standalone-metastore from the pom before the release
>> > (in
>> > > > the
>> > > > > same way that storage-api does).  Thus when we released Hive 3.0
>> we
>> > > also
>> > > > > released the standalone-metastore. See
>> > > > > https://search.maven.org/#search%7Cga%7C2%7Cg%3A%22org.
>> > apache.hive%22
>> > > >  So
>> > > > > I can't release another version of standalone-me

[jira] [Created] (HIVE-19822) Make new streaming interface work with catalogs

2018-06-06 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19822:
-

 Summary: Make new streaming interface work with catalogs
 Key: HIVE-19822
 URL: https://issues.apache.org/jira/browse/HIVE-19822
 Project: Hive
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


The new hive-streaming module should respect catalogs.  This will not include 
hive-hcatalog-streaming, since that module is deprecated it doesn't make sense 
to update it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19806) Several tests do not properly sort their output

2018-06-05 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19806:
-

 Summary: Several tests do not properly sort their output
 Key: HIVE-19806
 URL: https://issues.apache.org/jira/browse/HIVE-19806
 Project: Hive
  Issue Type: Bug
  Components: Test
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


A number of the tests produce unsorted output that happens to come out the same 
on people's laptops and the ptest infrastructure.  But when run on a separate 
linux box the sort differences show up.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19803) Make import/export work with catalogs

2018-06-05 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19803:
-

 Summary: Make import/export work with catalogs
 Key: HIVE-19803
 URL: https://issues.apache.org/jira/browse/HIVE-19803
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


Import and export will need changes to handle the catalogs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19802) Modify parser to accept catalog in object names

2018-06-05 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19802:
-

 Summary: Modify parser to accept catalog in object names
 Key: HIVE-19802
 URL: https://issues.apache.org/jira/browse/HIVE-19802
 Project: Hive
  Issue Type: Sub-task
  Components: Parser
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


Objects (databases, tables, functions, etc.) should be addressable via catalog. 
 That is, a database should be addressable as catalog.database or database, 
tables should be addressable as catalog.database.table or database.table or 
table, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Release of standalone-metastore

2018-06-05 Thread Alan Gates
I have put the binary and source objects up at
https://home.apache.org/~gates/hive-standalone-metastore-3.0.0/ so everyone
can take a look before I officially push them to dist.

I don't think we need to vote on this as we have already officially
released these objects, I'm just adding sha and gpg signatures for download
purposes.  But, please take a look and make sure I did everything
properly.  I'll push them to dist after a couple of days to give everyone a
chance to look them over.

Alan.

On Wed, May 30, 2018 at 11:00 AM Vihang Karajgaonkar 
wrote:

> The proposal to post the source and bin to the distribution sounds good to
> me. We can do the testing and release standalone-metastore 3.1 like to you
> suggested above.
>
> On Tue, May 29, 2018 at 10:49 PM, Peter Vary  wrote:
>
> > What do you think about adding a ne profile, which adds a possibility to
> > compile the code with one command, until we separate standalone metastore
> > to a new project? Like -Pitests, but -Pmetastore. So "mvn clean install
> > -Pmetastore,itests" will compile everything.
> >
> > Alan Gates  ezt írta (időpont: 2018. máj. 30., Sze
> > 0:42):
> >
> > > On Tue, May 29, 2018 at 3:29 PM Vihang Karajgaonkar <
> vih...@cloudera.com
> > >
> > > wrote:
> > >
> > > > How about cutting out a branch-3.0.1 and releasing 3.0.1 with the
> > pom.xml
> > > > fixed? My concern with above approach is we haven't tested
> > > > standalone-metastore when deployed independent of Hive.
> > >
> > > ​Actually, there is.  The tarballs for source and bin are already out
> > > there.  If I post them on the distribution site then they'll be easier
> to
> > > find.  So we can test that now.  And we can then do a 3.1 release of
> the
> > > metastore whenever we want, as long as it's before a 3.1 release of
> Hive.
> > >
> > > Alan.​
> > >
> > >
> > > > So we don't know if
> > > > there is something is fundamentally broken in that mode and given
> that
> > we
> > > > don't know when 3.1 is going to be released it may remain in that
> state
> > > for
> > > > long time which is not good. I think may be a good approach now would
> > be
> > > to
> > > > test 3.0 standalone-metastore and fix any issues along with the
> pom.xml
> > > > changes and do a 3.0.1 release. What do you think?
> > > >
> > > > Thanks,
> > > > Vihang
> > > >
> > > > On Tue, May 29, 2018 at 1:57 PM, Alan Gates 
> > > wrote:
> > > >
> > > > > In the thread on releasing Hive 3.0 I wrote
> > > > > 
> > > > > We should work on producing a standalone-metastore
> > > > > release in the same time frame so that the schema's, etc. match. I
> > can
> > > RM
> > > > > that unless someone else wants to.
> > > > > 
> > > > > https://lists.apache.org/thread.html/307b281c3742fdf6aeb7fac
> > > > > 3ee74a98830400b67711755572de15b80@%3Cdev.hive.apache.org%3E
> > > > >
> > > > > My thinking was to produce a separate metastore release, like we do
> > for
> > > > > storage-api.  However, I missed that I needed to do some work in
> > > > branch-3.0
> > > > > to disconnect standalone-metastore from the pom before the release
> > (in
> > > > the
> > > > > same way that storage-api does).  Thus when we released Hive 3.0 we
> > > also
> > > > > released the standalone-metastore. See
> > > > > https://search.maven.org/#search%7Cga%7C2%7Cg%3A%22org.
> > apache.hive%22
> > > >  So
> > > > > I can't release another version of standalone-metastore 3.0.  Here
> is
> > > > what
> > > > > I propose we do:
> > > > >
> > > > >
> > > > >1. Put the src and bin tarballs for standalone-metastore in
> Hive's
> > > > >distribution site.  We have already voted on these as part of
> 3.0
> > > > > release
> > > > >process.
> > > > >2. Like storage-api, we keep the standalone-metastore linked in
> > the
> > > > pom
> > > > >in the master branch.  This makes life easier for developers as
> > they
> > > > >produce new patches.
> > > > >3. Also like storage-api, at some future point before we release
> > > Hive
> > > > >3.1 I will:
> > > > >   1. Make a separate branch for standalone-metastore from
> > branch-3
> > > > >   2. Release a standalone-metastore 3.1 from this new branch
> > > > >   3. Remove standalone-metastore from the list of sub-modules
> in
> > > > Hive's
> > > > >   pom.xml
> > > > >   4. Make Hive depend on the released 3.1 version of the
> > > > >   standalone-metastore.
> > > > >4. For branch-3.0, I do not propose to do the same separation as
> > in
> > > > >branch-3, but we can make a different choice in the future if
> > there
> > > > is a
> > > > >reason to do so.
> > > > >
> > > > > Make sense?  Thoughts?
> > > > >
> > > > > Alan.
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (HIVE-19791) Modify TableDesc to contain the catalog

2018-06-04 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19791:
-

 Summary: Modify TableDesc to contain the catalog
 Key: HIVE-19791
 URL: https://issues.apache.org/jira/browse/HIVE-19791
 Project: Hive
  Issue Type: Sub-task
  Components: Query Planning
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


TableDesc currently only contains a table's database and tablename.  It needs 
to also have the catalog name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19769) Create dedicated objects for DB and Table names

2018-06-01 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19769:
-

 Summary: Create dedicated objects for DB and Table names
 Key: HIVE-19769
 URL: https://issues.apache.org/jira/browse/HIVE-19769
 Project: Hive
  Issue Type: Sub-task
  Components: storage-api
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


Currently table names are always strings.  Sometimes that string is just 
tablename, sometimes it is dbname.tablename.  Sometimes the code expects one or 
the other, sometimes it handles either.  This is burdensome for developers and 
error prone.  With the addition of catalog to the hierarchy, this becomes even 
worse.

I propose to add two objects, DatabaseName and TableName.  These will track 
full names of each object.  They will handle inserting default catalog and 
database names when those are not provided.  They will handle the conversions 
to and from strings.

These will need to be added to storage-api because ValidTxnList will use it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Release of standalone-metastore

2018-05-29 Thread Alan Gates
On Tue, May 29, 2018 at 3:29 PM Vihang Karajgaonkar 
wrote:

> How about cutting out a branch-3.0.1 and releasing 3.0.1 with the pom.xml
> fixed? My concern with above approach is we haven't tested
> standalone-metastore when deployed independent of Hive.

​Actually, there is.  The tarballs for source and bin are already out
there.  If I post them on the distribution site then they'll be easier to
find.  So we can test that now.  And we can then do a 3.1 release of the
metastore whenever we want, as long as it's before a 3.1 release of Hive.

Alan.​


> So we don't know if
> there is something is fundamentally broken in that mode and given that we
> don't know when 3.1 is going to be released it may remain in that state for
> long time which is not good. I think may be a good approach now would be to
> test 3.0 standalone-metastore and fix any issues along with the pom.xml
> changes and do a 3.0.1 release. What do you think?
>
> Thanks,
> Vihang
>
> On Tue, May 29, 2018 at 1:57 PM, Alan Gates  wrote:
>
> > In the thread on releasing Hive 3.0 I wrote
> > 
> > We should work on producing a standalone-metastore
> > release in the same time frame so that the schema's, etc. match. I can RM
> > that unless someone else wants to.
> > 
> > https://lists.apache.org/thread.html/307b281c3742fdf6aeb7fac
> > 3ee74a98830400b67711755572de15b80@%3Cdev.hive.apache.org%3E
> >
> > My thinking was to produce a separate metastore release, like we do for
> > storage-api.  However, I missed that I needed to do some work in
> branch-3.0
> > to disconnect standalone-metastore from the pom before the release (in
> the
> > same way that storage-api does).  Thus when we released Hive 3.0 we also
> > released the standalone-metastore. See
> > https://search.maven.org/#search%7Cga%7C2%7Cg%3A%22org.apache.hive%22
>  So
> > I can't release another version of standalone-metastore 3.0.  Here is
> what
> > I propose we do:
> >
> >
> >1. Put the src and bin tarballs for standalone-metastore in Hive's
> >distribution site.  We have already voted on these as part of 3.0
> > release
> >process.
> >2. Like storage-api, we keep the standalone-metastore linked in the
> pom
> >in the master branch.  This makes life easier for developers as they
> >produce new patches.
> >3. Also like storage-api, at some future point before we release Hive
> >3.1 I will:
> >   1. Make a separate branch for standalone-metastore from branch-3
> >   2. Release a standalone-metastore 3.1 from this new branch
> >   3. Remove standalone-metastore from the list of sub-modules in
> Hive's
> >   pom.xml
> >   4. Make Hive depend on the released 3.1 version of the
> >   standalone-metastore.
> >4. For branch-3.0, I do not propose to do the same separation as in
> >branch-3, but we can make a different choice in the future if there
> is a
> >reason to do so.
> >
> > Make sense?  Thoughts?
> >
> > Alan.
> >
>


[DISCUSS] Release of standalone-metastore

2018-05-29 Thread Alan Gates
In the thread on releasing Hive 3.0 I wrote

We should work on producing a standalone-metastore
release in the same time frame so that the schema's, etc. match. I can RM
that unless someone else wants to.

https://lists.apache.org/thread.html/307b281c3742fdf6aeb7fac3ee74a98830400b67711755572de15b80@%3Cdev.hive.apache.org%3E

My thinking was to produce a separate metastore release, like we do for
storage-api.  However, I missed that I needed to do some work in branch-3.0
to disconnect standalone-metastore from the pom before the release (in the
same way that storage-api does).  Thus when we released Hive 3.0 we also
released the standalone-metastore. See
https://search.maven.org/#search%7Cga%7C2%7Cg%3A%22org.apache.hive%22   So
I can't release another version of standalone-metastore 3.0.  Here is what
I propose we do:


   1. Put the src and bin tarballs for standalone-metastore in Hive's
   distribution site.  We have already voted on these as part of 3.0 release
   process.
   2. Like storage-api, we keep the standalone-metastore linked in the pom
   in the master branch.  This makes life easier for developers as they
   produce new patches.
   3. Also like storage-api, at some future point before we release Hive
   3.1 I will:
  1. Make a separate branch for standalone-metastore from branch-3
  2. Release a standalone-metastore 3.1 from this new branch
  3. Remove standalone-metastore from the list of sub-modules in Hive's
  pom.xml
  4. Make Hive depend on the released 3.1 version of the
  standalone-metastore.
   4. For branch-3.0, I do not propose to do the same separation as in
   branch-3, but we can make a different choice in the future if there is a
   reason to do so.

Make sense?  Thoughts?

Alan.


DB install and upgrade scripts in the brave new world of multiple release lines

2018-05-24 Thread Alan Gates
The change to have branches running with master as 4 and branch-3 for 3.x
releases is complicating our DB install and upgrade scripts.  There's a
JIRA to track the changes but some discussion on that JIRA of how best to
proceed, starting with the comment
https://issues.apache.org/jira/browse/HIVE-19323?focusedCommentId=16489833=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16489833
I'm posting this here as others may be interested in chiming in.

Alan.


[jira] [Created] (HIVE-19688) Make catalogs updatable

2018-05-23 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19688:
-

 Summary: Make catalogs updatable
 Key: HIVE-19688
 URL: https://issues.apache.org/jira/browse/HIVE-19688
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates


The initial changes for catalogs did not include an ability to alter catalogs.  
We need to add that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19686) schematool --createCatalog option fails when using Oracle as the RDBMS

2018-05-23 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19686:
-

 Summary: schematool  --createCatalog option fails when using 
Oracle as the RDBMS
 Key: HIVE-19686
 URL: https://issues.apache.org/jira/browse/HIVE-19686
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 3.0.1


Attempts to use the schematool --createCatalog option when the metastore is 
using Oracle result in
{code:java}
SQL Error code: 1786
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to add catalog
at org.apache.hive.beeline.HiveSchemaTool.createCatalog(HiveSchemaTool.java:941)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1459)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
Caused by: java.sql.SQLSyntaxErrorException: ORA-01786: FOR UPDATE of this 
query expression is not allowed

at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1059)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:522)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:587)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:30)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:762)
at 
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:925)
at 
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:)
at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1309)
at 
oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:422)
at org.apache.hive.beeline.HiveSchemaTool.createCatalog(HiveSchemaTool.java:926)
... 7 more
*** schemaTool failed ***{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Question on branches

2018-05-22 Thread Alan Gates
I have a series of patches that fix bugs in the recently released 3.0, e.g.
HIVE-19531.  I would like these to go out in Hive 3.0.1, assuming we create
such a release at some point.  Which branch(es) should I put the patch on
in addition to master?  Or to put the question another way, what branch(es)
will Hive 3.0.1 and Hive 3.1.0 be cut from?

Alan.


Re: build broken by glassfish dependency again

2018-05-16 Thread Alan Gates
+1.  It can't be too important because I was able to get my build going
without the poms by copying the directory structure and pom.lastUpdated
files in from another repo.

Alan.

On Wed, May 16, 2018 at 5:29 PM, Sergey Shelukhin 
wrote:

> Could not transfer artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT
> from/to jvnet-nexus-snapshots
> (https://maven.java.net/content/repositories/snapshots): Failed to
> transfer file:
> https://maven.java.net/content/repositories/snapshots/org/glassfish/javax.
> e
> l/3.0.1-b06-SNAPSHOT/javax.el-3.0.1-b06-SNAPSHOT.pom. Return code is: 402
> , ReasonPhrase:Payment Required.
>
>
> As far as I recall from last time this is a test only dependency.
> Unless there’s further productive feedback I’m going to look into nuking
> this dependency and disabling any tests requiring it with a follow-up jira
> to fix.
>
>


[jira] [Created] (HIVE-19576) IHMSHandler.getTable not always fetching the right catalog

2018-05-16 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19576:
-

 Summary: IHMSHandler.getTable not always fetching the right catalog
 Key: HIVE-19576
 URL: https://issues.apache.org/jira/browse/HIVE-19576
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 3.0.1


{{IHMSHandler.get_table_core(String dbName, String tableName)}} fetches the 
catalog name from the conf.  This causes issues when doing an operation where 
the catalog is known and does not match the default provided in the 
configuration file (e.g. adding a partition).  This method should be removed 
and callers forced to use {{IHMSHandler.get_table_core(String catName, String 
dbName, String tableName)}} instead since callers will know whether they have 
the catalog name or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19558) HiveAuthorizationProviderBase gets catalog name from config rather than db object

2018-05-15 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19558:
-

 Summary: HiveAuthorizationProviderBase gets catalog name from 
config rather than db object
 Key: HIVE-19558
 URL: https://issues.apache.org/jira/browse/HIVE-19558
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 3.0.1


HiveAuthorizationProviderBase.getDatabase uses just the database name to fetch 
the database, relying on getDefaultCatalog() to fetch the catalog name from the 
conf file.  This does not work when the client has passed in an object for a 
different catalog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Stricter commit guidelines

2018-05-15 Thread Alan Gates
+1.

Alan.

On Tue, May 15, 2018 at 9:12 AM, Sergio Pena 
wrote:

> +1
>
> On Tue, May 15, 2018 at 11:05 AM, Gunther Hagleitner <
> ghagleit...@hortonworks.com> wrote:
>
> > +1
> > 
> > From: Sankar Hariappan 
> > Sent: Tuesday, May 15, 2018 9:03 AM
> > To: dev@hive.apache.org
> > Subject: Re: [VOTE] Stricter commit guidelines
> >
> > +1
> >
> >
> > On 15/05/18, 9:30 PM, "Sahil Takiar"  wrote:
> >
> > >+1
> > >
> > >On Tue, May 15, 2018 at 10:56 AM, Owen O'Malley  >
> > >wrote:
> > >
> > >> +1
> > >>
> > >> On Tue, May 15, 2018 at 8:55 AM, Peter Vary 
> wrote:
> > >>
> > >> > +1 - Hoping for something like this for a long while! Thanks for
> > taking
> > >> > this up all!
> > >> >
> > >> > > On May 15, 2018, at 5:44 PM, Jesus Camacho Rodriguez <
> > >> > jcama...@apache.org> wrote:
> > >> > >
> > >> > > Forgot to mention the length of the vote in original message.
> > >> > >
> > >> > > Let's leave the vote open for a shorter period than usual, for
> > instance
> > >> > 48 hours, i.e., till Wednesday 10pm PST. Situation can only get
> worse
> > >> than
> > >> > it is now if we do not take action for a longer period.
> > >> > >
> > >> > > As Alan suggested, vote passes if there is a lazy majority (at
> > least 3
> > >> > votes, more +1s than -1s).
> > >> > >
> > >> > > Thanks,
> > >> > > Jesús
> > >> > >
> > >> > >
> > >> > > On 5/15/18, 8:37 AM, "Andrew Sherman" 
> > wrote:
> > >> > >
> > >> > >+1
> > >> > >
> > >> > >On Tue, May 15, 2018 at 2:34 AM Rui Li 
> > >> wrote:
> > >> > >
> > >> > >> +1
> > >> > >>
> > >> > >> On Tue, May 15, 2018 at 2:24 PM, Prasanth Jayachandran <
> > >> > >> pjayachand...@hortonworks.com> wrote:
> > >> > >>
> > >> > >>> +1
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> Thanks
> > >> > >>> Prasanth
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> On Mon, May 14, 2018 at 10:44 PM -0700, "Jesus Camacho
> Rodriguez"
> > <
> > >> > >>> jcama...@apache.org> wrote:
> > >> > >>>
> > >> > >>>
> > >> > >>> After work has been done to ignore most of the tests that were
> > >> failing
> > >> > >>> consistently/intermittently [1], I wanted to start this vote to
> > >> gather
> > >> > >>> support from the community to be stricter wrt committing patches
> > to
> > >> > Hive.
> > >> > >>> The committers guide [2] already specifies that a +1 should be
> > >> obtained
> > >> > >>> before committing, but there is another clause that allows
> > committing
> > >> > >> under
> > >> > >>> the presence of flaky tests (clause 4). Flaky tests are as good
> as
> > >> > having
> > >> > >>> no tests, hence I propose to remove clause 4 and enforce the +1
> > from
> > >> > >>> testing infra before committing.
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> As I see it, by enforcing that we always get a +1 from the
> testing
> > >> > infra
> > >> > >>> before committing, 1) we will have a more stable project, and 2)
> > we
> > >> > will
> > >> > >>> have another incentive as a community to create a more robust
> > testing
> > >> > >>> infra, e.g., replacing flaky tests for similar unit tests that
> are
> > >> not
> > >> > >>> flaky, trying to decrease running time for tests, etc.
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> Please, share your thoughts about this.
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> Here is my +1.
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> Thanks,
> > >> > >>>
> > >> > >>> Jes?s
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
> > >> > >>> mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
> > >> > >>>
> > >> > >>> [2] https://cwiki.apache.org/confluence/display/Hive/
> > >> > >>> HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>
> > >> > >>
> > >> > >> --
> > >> > >> Best regards!
> > >> > >> Rui Li
> > >> > >>
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > >--
> > >Sahil Takiar
> > >Software Engineer
> > >takiar.sa...@gmail.com | (510) 673-0309
> >
> >
> >
>


[jira] [Created] (HIVE-19531) TransactionalValidationListener is getting catalog name from conf instead of table object.

2018-05-14 Thread Alan Gates (JIRA)
Alan Gates created HIVE-19531:
-

 Summary: TransactionalValidationListener is getting catalog name 
from conf instead of table object.
 Key: HIVE-19531
 URL: https://issues.apache.org/jira/browse/HIVE-19531
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 3.0.0


TransactionalValidationListener.validateTableStructure get the catalog from the 
conf file rather than taking it from the passed in table structure.  This 
causes createTable operations to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >