Re: Some tests started hanging recently

2020-06-18 Thread Zoltan Haindrich

Hey Jagat!

On 6/19/20 3:19 AM, Jagat Singh wrote:

I was not expecting to hear this for my first PR :(


No worries - this could happen; I think we've bumped into some nasty 
concurrency bug...


I will also try to re-run the tests locally on my system and report back to
you.


thank you, it took a while but the flaky-checker have stuck after running 8 times with one of the tests - while the other (with the tez update patch reverted) have finished 
successfully and run it a 100 times.

http://130.211.9.232/job/hive-flaky-check/51/
http://130.211.9.232/job/hive-flaky-check/52/

I'm going to revert tez 0.9.2 for now

cheers,
Zoltan




Thanks,

Jagat Singh

On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich  wrote:


Hey all,

Since yesterday some tests started to hang - most frequently
TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication
test as well - so I don't think its
limited to those 2 tests.

I was not able to figure out what have caused this - my current guess is
that somehow the tez 0.9.2 upgrade have caused it.
To validate this guess I've started the flaky checker with and without
that patch from the current state...

I've collected some jstacks from the containers running for more than 20
hours

https://termbin.com/z1eoc
https://termbin.com/2m0j
https://termbin.com/027t
https://termbin.com/1dbe

cheers,
Zoltan





Re: Some tests started hanging recently

2020-06-18 Thread Jagat Singh
Hello Zoltan,

I was not expecting to hear this for my first PR :(

I will also try to re-run the tests locally on my system and report back to
you.

Thanks,

Jagat Singh

On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich  wrote:

> Hey all,
>
> Since yesterday some tests started to hang - most frequently
> TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication
> test as well - so I don't think its
> limited to those 2 tests.
>
> I was not able to figure out what have caused this - my current guess is
> that somehow the tez 0.9.2 upgrade have caused it.
> To validate this guess I've started the flaky checker with and without
> that patch from the current state...
>
> I've collected some jstacks from the containers running for more than 20
> hours
>
> https://termbin.com/z1eoc
> https://termbin.com/2m0j
> https://termbin.com/027t
> https://termbin.com/1dbe
>
> cheers,
> Zoltan
>


Re: Reviewers and assignees of PRs

2020-06-18 Thread Jagat Singh
Hello Zoltan,

One thing which needs improvement is updating of Hive Contributors wiki
with whatever process happens on Github and Build server-side.

The current confluence is silent on what to expect when we create a PR as a
contributor, who will review, what will build system do? Where to look for
errors?

Based on my first PR experience, do you manually label PRs as test stable,
unstable etc? I am not sure if that can be automated if not done already
along with auto assigning of reviews as you intend to do with this current
proposal.

I can update a few things based on what I learnt as I recently started
contributing and I feel these all things are missing. But there are many
things for which I don't know the answer yet and will appreciate if someone
experienced update the wiki to add details like above questions.

Thanks,

Jagat Singh

On Thu, 18 Jun 2020 at 20:43, Zoltan Haindrich  wrote:

> Hey Panos!
>
> On 6/18/20 11:54 AM, Panos Garefalakis wrote:
> > My only suggestion would be to make reviewing per package/label instead
> of
> > files. This will make the process a bit more clear.
>
> we could use path globs to select the files - so it could match on
> packages as well
> I've not really used it
> '**/schq/**'
>
> > I recently bumped into this GitHub action that lets you automatically
> label
> > PRs based on what paths they modify and could help us towards that goal.
> >
> > https://github.com/actions/labeler
>
> Sure; we can also have that as well! they may fit for different purposes.
> Aactually - based on the "absence" of some labels (eg: metastore) we may
> "skip" some tests.
>
> cheers,
> Zoltan
>
> >
> > Thoughts?
> >
> > Cheers,
> > Panagiotis
> >
> > On Thu, Jun 18, 2020 at 10:42 AM Zoltan Haindrich  wrote:
> >
> >> Hey all!
> >>
> >> I'm happy to see that (I guess) everyone is using the PR based stuff
> >> without issues - there are still some flaky stuff from time-to-time;
> but I
> >> feel that patches go in
> >> faster - and I have a feeling we have more reviewes going on as well -
> >> which is awesome!
> >>
> >> I've read a bit about github "reviewers" / "assignee" stuff - because it
> >> seemed somewhat confusing...
> >> Basically both of them could be a group of users - the meaning of these
> >> fields should be filled by the community.
> >> I would like to propose to use the "reviewers" to use it as people from
> >> whom reviews might be expected.
> >> And use the assignee field to list those who should approve the change
> to
> >> go in (anyone may add asignees/reviewers)
> >>
> >> We sometimes forget PRs and they may become "stale" most of them is just
> >> falling thru the cracks...to prevent this the best would be if everyone
> >> would self-assign PRs which
> >> are in his/her area of interest.
> >>
> >> There are some times when a give feature needs to change not closely
> >> related parts of the codebase - this is usually fine; but there are
> places
> >> which might need "more eyes"
> >> on reviews.
> >> In the past I was sometimes surprised by some interesting changes in say
> >> the thrift api / package.jdo / antlr stuff.
> >>
> >> Because the jira title may not suggest what files will be changed - I
> >> wanted to find a way to auto add some kind of notifications to PRs.
> >>
> >> Today I've found a neat solution to this [1] - which goes a little bit
> >> beyond what I anticipated - there is a small plugin which could enable
> to
> >> auto-add reviewers based on
> >> the changed files (adding a reviewer will also emit an email) - I had to
> >> fix a few small issues with it to ensure that it works/etc [2].
> >>
> >> I really like this approach beacuase it could enable to change the
> >> direction of things - and could enable that contributors doesn't
> >> neccessarily need to look for reviewers.
> >> (but this seems more like just sci-fi right now - lets start small and
> go
> >> from there...)
> >>
> >> I propose to collect some globs and reviewers in a google doc before we
> >> first commit this file into the repo - so that everyone could add things
> >> he/she is interested in.
> >>
> >> cheers,
> >> Zoltan
> >>
> >> [1]
> https://github.com/marketplace/actions/auto-assign-reviewer-by-files
> >> [2] https://github.com/kgyrtkirk/auto-assign-reviewer-by-files
> >> [3]
> >>
> https://docs.google.com/document/d/11n9acHby31rwVHfRW4zxxYukymHS-tTSYlJEghZwJaY/edit?usp=sharing
> >>
> >
>


[jira] [Created] (HIVE-23724) Hive ACID Lock conflicts not getting resolved correctly.

2020-06-18 Thread Aditya Shah (Jira)
Aditya Shah created HIVE-23724:
--

 Summary: Hive ACID Lock conflicts not getting resolved correctly.
 Key: HIVE-23724
 URL: https://issues.apache.org/jira/browse/HIVE-23724
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.2
Reporter: Aditya Shah
Assignee: Aditya Shah


Steps to reproduce:

1. `Drop database temp cascade`
2. Parallelly (after 1. but while 1. is running) fire a `create table 
temp.temp_table (a int, b int) clustered by (a) into 2 buckets stored as orc 
TBLPROPERTIES ('transactional'='true')`
3. Parallelly (after 2. but while 2. is running) fire a `insert overwrite table 
temp.temp_table values (1,2)`

note: The above could be easily reproduced by a unit test in testDbTxnManager.

Observation: Exclusive lock for Table in 3. is granted although exclusive lock 
for DB acquired in 1. is still acquired and shared read lock on DB for 2. is 
waiting.

Cause of issue: while acquiring a lock if we choose to ignore a conflict 
between the desired lock and one of the existing locks we immediately allow the 
desired lock to be acquired without checking against all the existing locks. 
The above-mentioned scenario was one such ignore conflict condition in 2. and 
3. There could be other possible combinations where this may occur. Like for 
example when we request a lock with the same txn ids. Although hive guarantees 
that this scenario will not occur due to all lock requests related to a txn are 
asked at the same and failure of one guarantees failure of all, we in future 
will have to be extra careful with it.

Resolution: Whenever we ignore conflict we should keep looking against all the 
existing locks and only then allow the lock to be acquired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-18 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-23723:


 Summary: Limit operator pushdown through LOJ
 Key: HIVE-23723
 URL: https://issues.apache.org/jira/browse/HIVE-23723
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


Limit operator (without an order by) can be pushed through SELECTS and LEFT 
OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Some tests started hanging recently

2020-06-18 Thread Zoltan Haindrich

Hey all,

Since yesterday some tests started to hang - most frequently TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication test as well - so I don't think its 
limited to those 2 tests.


I was not able to figure out what have caused this - my current guess is that 
somehow the tez 0.9.2 upgrade have caused it.
To validate this guess I've started the flaky checker with and without that 
patch from the current state...

I've collected some jstacks from the containers running for more than 20 hours

https://termbin.com/z1eoc
https://termbin.com/2m0j
https://termbin.com/027t
https://termbin.com/1dbe

cheers,
Zoltan


Re: HIVE building on ARM

2020-06-18 Thread Stamatis Zampetakis
Hello Chinna,

The hudson-jobadmin privilege can be granted by PMC chairs.
I don't know if there is any particular policy in Hive on who should have
this privilege so I guess you should request it from Ashutosh.

Best,
Stamatis

On Thu, Jun 18, 2020 at 12:05 PM Zoltan Haindrich  wrote:

> Hey Chinna!
>
> On 6/18/20 11:43 AM, Chinna Rao Lalam wrote:
> > As you said, migrating this job to the new ci-hadoop instance looks good
> as
> > Hadoop also shares the same armN slaves.
>
> Sounds great!
>
> > I am able to login the new ci-hadoop instance with Apache LDAP
> credentials,
> > but i am not able to see the job creation option. Should I request access
> > or the process for creation of a job is different than jenkin?.
> > Please guide me to create the new job in the ci-hadoop instance. I will
> > migrate this job after connecting the armN slaves to the new system.
>
>
> I've also logged in - and apparently I've create job rights; I'm happy to
> help, but the best would be to self-service yourselft :)
> I think you may miss the "hudson-jobadmin" privilege.
> Probably Gavin (or someone on the infra team) could help you with that..
> to talk to them quickly - you can reach them on the #asfinfra channel (on
> the asf-slack).
>
> The migration effort is coordinated thru the hadoop-migrations mailing
> list (I've cc-ed that list)
> you may want to subscribe to it by sending a mail to:
> hadoop-migrations-subscr...@infra.apache.org
>
> cheers,
> Zoltan
>
>
>
> >
> > Thanks
> > Chinna
> >
> > On Wed, Jun 17, 2020 at 11:57 AM Zhenyu Zheng  >
> > wrote:
> >
> >> Hi Zoltan,
> >>
> >> Thanks alot for the information, so looks like one possible solution is
> as
> >> you suggest, move the current ARM2 and ARM3 (those two were donate to
> >> builds.apache.org by us) to the new ci-hadoop cluster and set up the
> jobs
> >> just as what has been done in current jenkins.
> >>
> >> I will also ask our team member works on other projects to find out what
> >> the status of other projects is.
> >>
> >> BR,
> >>
> >> On Tue, Jun 16, 2020 at 6:41 PM Zoltan Haindrich  wrote:
> >>
> >>> Hey,
> >>>
> >>> There is an effort by the Apache Infra to change the way Jenkins stuff
> is
> >>> organized; a couple months ago Gavin wrote an email about it:
> >>>
> >>>
> http://mail-archives.apache.org/mod_mbox/tez-dev/202004.mbox/%3ccan0gg1dodepzatjz9bofe-2ver7qg7h0hmvyjmsldgjr8_r...@mail.gmail.com%3E
> >>> The resources for running these jobs are coming from the H0~H21 slaves
> >>> which will be migrated to the new jenkins master eventually.
> >>>
> >>>   >> So please
> >>>   >> suggest a way which direction we can move and can you share some
> >>> details
> >>>   >> about the new ci-hadoop instance.
> >>>
> >>> Since Hadoop testing is also happening on ARM - I think the best would
> be
> >>> to also migrate the armN slaves and the Hive arm nightly over to the
> new
> >>> ci-hadoop instance.
> >>>
> >>> On 6/16/20 8:40 AM, Zhenyu Zheng wrote:
>  Thanks for the info, I wonder if where does the resource of ci-hadoop
> >>> and
>  hive-test-kube come from? Do they include ARM resources?
> >>>
> >>> Interesting question; the resources for Hive testing are donated by
> >>> Cloudera.
> >>> About the ARM workers I think Chinna could provide more details.
> >>> ...I've no idea don't know who sponsors the Hxx slaves
> >>>
>  Can you provide some more information about how the new hive-test-kube
> >>> is
>  running?
> >>> It's basically a Jenkins instance which is using kubernetes pods to run
> >>> things.
> >>> The whole thing is running on a GKE cluster.
> >>> While I was working on it I collected stuff needed for it in this repo:
> >>> https://github.com/kgyrtkirk/hive-test-kube/
> >>> it should be possible to start a new deployment using that stuff
> >>>
> >>> cheers,
> >>> Zoltan
> >>>
> 
>  BR,
>  Kevin Zheng
> 
>  On Tue, Jun 16, 2020 at 12:41 PM Chinna Rao Lalam <
>  lalamchinnara...@gmail.com> wrote:
> 
> > Hi Zoltan,
> >
> > Thanks for the update.
> >
> > Current https://builds.apache.org/job/Hive-linux-ARM-trunk/ job is
> > targeting to run hive tests daily on "arm" slaves, it is using 2 arm
> > slaves.
> > To find any potential issues with "arm" and fix the issues. So please
> > suggest a way which direction we can move and can you share some
> >>> details
> > about the new ci-hadoop instance.
> >
> > Thanks,
> > Chinna
> >
> > On Mon, Jun 15, 2020 at 3:56 PM Zoltan Haindrich 
> wrote:
> >
> >> Hey all,
> >>
> >> In an ticket (INFRA-20416) Gavin asked me if we are completely off
> >> builds.apache.org - when I went over the jobs I've saw that
> >> https://builds.apache.org/job/Hive-linux-ARM-trunk/ is running
> there
> >> once a day.
> >>
> >> Since builds.apache.org will be shut down in sometime in the future
> >>> - we
> >> should move this job to the new ci-hadoop instance or to
> >>> hive-test-kube

Re: Reviewers and assignees of PRs

2020-06-18 Thread Zoltan Haindrich

Hey Panos!

On 6/18/20 11:54 AM, Panos Garefalakis wrote:

My only suggestion would be to make reviewing per package/label instead of
files. This will make the process a bit more clear.


we could use path globs to select the files - so it could match on packages as 
well
I've not really used it
'**/schq/**'


I recently bumped into this GitHub action that lets you automatically label
PRs based on what paths they modify and could help us towards that goal.

https://github.com/actions/labeler


Sure; we can also have that as well! they may fit for different purposes.
Aactually - based on the "absence" of some labels (eg: metastore) we may "skip" 
some tests.

cheers,
Zoltan



Thoughts?

Cheers,
Panagiotis

On Thu, Jun 18, 2020 at 10:42 AM Zoltan Haindrich  wrote:


Hey all!

I'm happy to see that (I guess) everyone is using the PR based stuff
without issues - there are still some flaky stuff from time-to-time; but I
feel that patches go in
faster - and I have a feeling we have more reviewes going on as well -
which is awesome!

I've read a bit about github "reviewers" / "assignee" stuff - because it
seemed somewhat confusing...
Basically both of them could be a group of users - the meaning of these
fields should be filled by the community.
I would like to propose to use the "reviewers" to use it as people from
whom reviews might be expected.
And use the assignee field to list those who should approve the change to
go in (anyone may add asignees/reviewers)

We sometimes forget PRs and they may become "stale" most of them is just
falling thru the cracks...to prevent this the best would be if everyone
would self-assign PRs which
are in his/her area of interest.

There are some times when a give feature needs to change not closely
related parts of the codebase - this is usually fine; but there are places
which might need "more eyes"
on reviews.
In the past I was sometimes surprised by some interesting changes in say
the thrift api / package.jdo / antlr stuff.

Because the jira title may not suggest what files will be changed - I
wanted to find a way to auto add some kind of notifications to PRs.

Today I've found a neat solution to this [1] - which goes a little bit
beyond what I anticipated - there is a small plugin which could enable to
auto-add reviewers based on
the changed files (adding a reviewer will also emit an email) - I had to
fix a few small issues with it to ensure that it works/etc [2].

I really like this approach beacuase it could enable to change the
direction of things - and could enable that contributors doesn't
neccessarily need to look for reviewers.
(but this seems more like just sci-fi right now - lets start small and go
from there...)

I propose to collect some globs and reviewers in a google doc before we
first commit this file into the repo - so that everyone could add things
he/she is interested in.

cheers,
Zoltan

[1] https://github.com/marketplace/actions/auto-assign-reviewer-by-files
[2] https://github.com/kgyrtkirk/auto-assign-reviewer-by-files
[3]
https://docs.google.com/document/d/11n9acHby31rwVHfRW4zxxYukymHS-tTSYlJEghZwJaY/edit?usp=sharing





Re: HIVE building on ARM

2020-06-18 Thread Zoltan Haindrich

Hey Chinna!

On 6/18/20 11:43 AM, Chinna Rao Lalam wrote:

As you said, migrating this job to the new ci-hadoop instance looks good as
Hadoop also shares the same armN slaves.


Sounds great!


I am able to login the new ci-hadoop instance with Apache LDAP credentials,
but i am not able to see the job creation option. Should I request access
or the process for creation of a job is different than jenkin?.
Please guide me to create the new job in the ci-hadoop instance. I will
migrate this job after connecting the armN slaves to the new system.



I've also logged in - and apparently I've create job rights; I'm happy to help, 
but the best would be to self-service yourselft :)
I think you may miss the "hudson-jobadmin" privilege.
Probably Gavin (or someone on the infra team) could help you with that..
to talk to them quickly - you can reach them on the #asfinfra channel (on the 
asf-slack).

The migration effort is coordinated thru the hadoop-migrations mailing list 
(I've cc-ed that list)
you may want to subscribe to it by sending a mail to: 
hadoop-migrations-subscr...@infra.apache.org

cheers,
Zoltan





Thanks
Chinna

On Wed, Jun 17, 2020 at 11:57 AM Zhenyu Zheng 
wrote:


Hi Zoltan,

Thanks alot for the information, so looks like one possible solution is as
you suggest, move the current ARM2 and ARM3 (those two were donate to
builds.apache.org by us) to the new ci-hadoop cluster and set up the jobs
just as what has been done in current jenkins.

I will also ask our team member works on other projects to find out what
the status of other projects is.

BR,

On Tue, Jun 16, 2020 at 6:41 PM Zoltan Haindrich  wrote:


Hey,

There is an effort by the Apache Infra to change the way Jenkins stuff is
organized; a couple months ago Gavin wrote an email about it:

http://mail-archives.apache.org/mod_mbox/tez-dev/202004.mbox/%3ccan0gg1dodepzatjz9bofe-2ver7qg7h0hmvyjmsldgjr8_r...@mail.gmail.com%3E
The resources for running these jobs are coming from the H0~H21 slaves
which will be migrated to the new jenkins master eventually.

  >> So please
  >> suggest a way which direction we can move and can you share some
details
  >> about the new ci-hadoop instance.

Since Hadoop testing is also happening on ARM - I think the best would be
to also migrate the armN slaves and the Hive arm nightly over to the new
ci-hadoop instance.

On 6/16/20 8:40 AM, Zhenyu Zheng wrote:

Thanks for the info, I wonder if where does the resource of ci-hadoop

and

hive-test-kube come from? Do they include ARM resources?


Interesting question; the resources for Hive testing are donated by
Cloudera.
About the ARM workers I think Chinna could provide more details.
...I've no idea don't know who sponsors the Hxx slaves


Can you provide some more information about how the new hive-test-kube

is

running?

It's basically a Jenkins instance which is using kubernetes pods to run
things.
The whole thing is running on a GKE cluster.
While I was working on it I collected stuff needed for it in this repo:
https://github.com/kgyrtkirk/hive-test-kube/
it should be possible to start a new deployment using that stuff

cheers,
Zoltan



BR,
Kevin Zheng

On Tue, Jun 16, 2020 at 12:41 PM Chinna Rao Lalam <
lalamchinnara...@gmail.com> wrote:


Hi Zoltan,

Thanks for the update.

Current https://builds.apache.org/job/Hive-linux-ARM-trunk/ job is
targeting to run hive tests daily on "arm" slaves, it is using 2 arm
slaves.
To find any potential issues with "arm" and fix the issues. So please
suggest a way which direction we can move and can you share some

details

about the new ci-hadoop instance.

Thanks,
Chinna

On Mon, Jun 15, 2020 at 3:56 PM Zoltan Haindrich  wrote:


Hey all,

In an ticket (INFRA-20416) Gavin asked me if we are completely off
builds.apache.org - when I went over the jobs I've saw that
https://builds.apache.org/job/Hive-linux-ARM-trunk/ is running there
once a day.

Since builds.apache.org will be shut down in sometime in the future

- we

should move this job to the new ci-hadoop instance or to

hive-test-kube.

The key feature of the job is that it runs the test on the "armX"

slaves;

which are statically configured on b.a.o.
Not sure which way to go - but we will have to move in some direction.

cheers,
Zoltan


On 3/13/20 7:22 AM, Zhenyu Zheng wrote:

Hi Chinna,

Thanks alot for the reply, I uploaded a patch and also a github PR

for

https://issues.apache.org/jira/browse/HIVE-21939 .
In the patch, I bumped the protobuf used in standalone-metadata to

2.6.1

and added a new profile, this profile will identify
the hardware architecture and if it is Aarch64, it will override the
protobuf group.id and package to com.github.os72 which
includes ARM support. For X86 platform, Hive will still download the
protobuf packages from org.google repo. I think with
this method, we can keep the influence to existing x86 users to the
minimum. I hope this could be a acceptable short-term
solution.

I've manually tested on my machine and the github PR t

Re: Reviewers and assignees of PRs

2020-06-18 Thread Panos Garefalakis
Hey Zoltan,

Thanks for doing this! This is definitely a step towards the right
direction.

My only suggestion would be to make reviewing per package/label instead of
files. This will make the process a bit more clear.
I recently bumped into this GitHub action that lets you automatically label
PRs based on what paths they modify and could help us towards that goal.

https://github.com/actions/labeler

Thoughts?

Cheers,
Panagiotis

On Thu, Jun 18, 2020 at 10:42 AM Zoltan Haindrich  wrote:

> Hey all!
>
> I'm happy to see that (I guess) everyone is using the PR based stuff
> without issues - there are still some flaky stuff from time-to-time; but I
> feel that patches go in
> faster - and I have a feeling we have more reviewes going on as well -
> which is awesome!
>
> I've read a bit about github "reviewers" / "assignee" stuff - because it
> seemed somewhat confusing...
> Basically both of them could be a group of users - the meaning of these
> fields should be filled by the community.
> I would like to propose to use the "reviewers" to use it as people from
> whom reviews might be expected.
> And use the assignee field to list those who should approve the change to
> go in (anyone may add asignees/reviewers)
>
> We sometimes forget PRs and they may become "stale" most of them is just
> falling thru the cracks...to prevent this the best would be if everyone
> would self-assign PRs which
> are in his/her area of interest.
>
> There are some times when a give feature needs to change not closely
> related parts of the codebase - this is usually fine; but there are places
> which might need "more eyes"
> on reviews.
> In the past I was sometimes surprised by some interesting changes in say
> the thrift api / package.jdo / antlr stuff.
>
> Because the jira title may not suggest what files will be changed - I
> wanted to find a way to auto add some kind of notifications to PRs.
>
> Today I've found a neat solution to this [1] - which goes a little bit
> beyond what I anticipated - there is a small plugin which could enable to
> auto-add reviewers based on
> the changed files (adding a reviewer will also emit an email) - I had to
> fix a few small issues with it to ensure that it works/etc [2].
>
> I really like this approach beacuase it could enable to change the
> direction of things - and could enable that contributors doesn't
> neccessarily need to look for reviewers.
> (but this seems more like just sci-fi right now - lets start small and go
> from there...)
>
> I propose to collect some globs and reviewers in a google doc before we
> first commit this file into the repo - so that everyone could add things
> he/she is interested in.
>
> cheers,
> Zoltan
>
> [1] https://github.com/marketplace/actions/auto-assign-reviewer-by-files
> [2] https://github.com/kgyrtkirk/auto-assign-reviewer-by-files
> [3]
> https://docs.google.com/document/d/11n9acHby31rwVHfRW4zxxYukymHS-tTSYlJEghZwJaY/edit?usp=sharing
>


Re: HCatalog tests create test output inside source code folders fails rat

2020-06-18 Thread Zoltan Haindrich

Hey Jagat!


On 6/18/20 6:20 AM, Jagat Singh wrote:

I can raise PR for the annoyance I faced.

Cool, that would be great!


I am not sure what is the best end action we want to after checking that
the working tree is clean or not, do you see that we just display the
message in Gradle or actually doing something with those files?


I think anything that works will suffice :D
Probably a quick check if git status reports back that the WS is clean.
I think you could add something like this after the "Test" stage in the 
Jenkinsfile:

stage('ws-check') {
sh '''#!/bin/bash -e
N=`git status --porcelain | tee >(cat >&2) | wc -l`
[ $N -ne 0 ] && echo "there are untracked files in the workspace?" && exit 1
'''
}

cheers,
Zoltan



Thanks in advance,

Jagat Singh

On Wed, 17 Jun 2020 at 18:52, Zoltan Haindrich  wrote:


Hey Jagat!

Yeah; this looks pretty annoying...I think these are some ancient tests; I
don't think those files should be there; this should be fixed.
Could you file a jira to fix it?
I think after running the tests we might want to also add a check that the
worktree is clean.

cheers,
Zoltan


On 6/15/20 6:55 AM, Jagat Singh wrote:

Hello all,

Currently, this line makes test output data to be produced inside the
folders which are not under rat exclude condition, This makes the rat
checks due to the absence of License files. Is this intentional or should
it be fixed? Ideally, the test output data should not stay inside the
current folder structure and should reside in some standard temporary
folder. The folder mapred/testHcatMapReduceOutput gets created under
hcatalog/core at this moment.



https://github.com/apache/hive/blob/3ab174d82ffc2bd27432c0b04433be3bd7db5c6a/hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/HCatMapReduceTest.java#L403



Path path = new Path(fs.getWorkingDirectory(),
"mapred/testHCatMapReduceOutput");

/home/jj/dev/code/open/hive/hcatalog/core/mapred
├── testHCatMapReduceInput
└── testHCatMapReduceOutput
  ├── part-m-0
  ├── part-m-1
  ├── part-m-2
  ├── part-m-3
  ├── part-m-4
  └── _SUCCESS

1 directory, 7 files

Thanks for reading and in advance thanks for your reply.

Regards,

Jagat Singh







Re: HIVE building on ARM

2020-06-18 Thread Chinna Rao Lalam
Hi Zoltan,

As you said, migrating this job to the new ci-hadoop instance looks good as
Hadoop also shares the same armN slaves.
I am able to login the new ci-hadoop instance with Apache LDAP credentials,
but i am not able to see the job creation option. Should I request access
or the process for creation of a job is different than jenkin?.
Please guide me to create the new job in the ci-hadoop instance. I will
migrate this job after connecting the armN slaves to the new system.

Thanks
Chinna

On Wed, Jun 17, 2020 at 11:57 AM Zhenyu Zheng 
wrote:

> Hi Zoltan,
>
> Thanks alot for the information, so looks like one possible solution is as
> you suggest, move the current ARM2 and ARM3 (those two were donate to
> builds.apache.org by us) to the new ci-hadoop cluster and set up the jobs
> just as what has been done in current jenkins.
>
> I will also ask our team member works on other projects to find out what
> the status of other projects is.
>
> BR,
>
> On Tue, Jun 16, 2020 at 6:41 PM Zoltan Haindrich  wrote:
>
>> Hey,
>>
>> There is an effort by the Apache Infra to change the way Jenkins stuff is
>> organized; a couple months ago Gavin wrote an email about it:
>>
>> http://mail-archives.apache.org/mod_mbox/tez-dev/202004.mbox/%3ccan0gg1dodepzatjz9bofe-2ver7qg7h0hmvyjmsldgjr8_r...@mail.gmail.com%3E
>> The resources for running these jobs are coming from the H0~H21 slaves
>> which will be migrated to the new jenkins master eventually.
>>
>>  >> So please
>>  >> suggest a way which direction we can move and can you share some
>> details
>>  >> about the new ci-hadoop instance.
>>
>> Since Hadoop testing is also happening on ARM - I think the best would be
>> to also migrate the armN slaves and the Hive arm nightly over to the new
>> ci-hadoop instance.
>>
>> On 6/16/20 8:40 AM, Zhenyu Zheng wrote:
>> > Thanks for the info, I wonder if where does the resource of ci-hadoop
>> and
>> > hive-test-kube come from? Do they include ARM resources?
>>
>> Interesting question; the resources for Hive testing are donated by
>> Cloudera.
>> About the ARM workers I think Chinna could provide more details.
>> ...I've no idea don't know who sponsors the Hxx slaves
>>
>> > Can you provide some more information about how the new hive-test-kube
>> is
>> > running?
>> It's basically a Jenkins instance which is using kubernetes pods to run
>> things.
>> The whole thing is running on a GKE cluster.
>> While I was working on it I collected stuff needed for it in this repo:
>> https://github.com/kgyrtkirk/hive-test-kube/
>> it should be possible to start a new deployment using that stuff
>>
>> cheers,
>> Zoltan
>>
>> >
>> > BR,
>> > Kevin Zheng
>> >
>> > On Tue, Jun 16, 2020 at 12:41 PM Chinna Rao Lalam <
>> > lalamchinnara...@gmail.com> wrote:
>> >
>> >> Hi Zoltan,
>> >>
>> >> Thanks for the update.
>> >>
>> >> Current https://builds.apache.org/job/Hive-linux-ARM-trunk/ job is
>> >> targeting to run hive tests daily on "arm" slaves, it is using 2 arm
>> >> slaves.
>> >> To find any potential issues with "arm" and fix the issues. So please
>> >> suggest a way which direction we can move and can you share some
>> details
>> >> about the new ci-hadoop instance.
>> >>
>> >> Thanks,
>> >> Chinna
>> >>
>> >> On Mon, Jun 15, 2020 at 3:56 PM Zoltan Haindrich  wrote:
>> >>
>> >>> Hey all,
>> >>>
>> >>> In an ticket (INFRA-20416) Gavin asked me if we are completely off
>> >>> builds.apache.org - when I went over the jobs I've saw that
>> >>> https://builds.apache.org/job/Hive-linux-ARM-trunk/ is running there
>> >>> once a day.
>> >>>
>> >>> Since builds.apache.org will be shut down in sometime in the future
>> - we
>> >>> should move this job to the new ci-hadoop instance or to
>> hive-test-kube.
>> >>> The key feature of the job is that it runs the test on the "armX"
>> slaves;
>> >>> which are statically configured on b.a.o.
>> >>> Not sure which way to go - but we will have to move in some direction.
>> >>>
>> >>> cheers,
>> >>> Zoltan
>> >>>
>> >>>
>> >>> On 3/13/20 7:22 AM, Zhenyu Zheng wrote:
>>  Hi Chinna,
>> 
>>  Thanks alot for the reply, I uploaded a patch and also a github PR
>> for
>>  https://issues.apache.org/jira/browse/HIVE-21939 .
>>  In the patch, I bumped the protobuf used in standalone-metadata to
>> 2.6.1
>>  and added a new profile, this profile will identify
>>  the hardware architecture and if it is Aarch64, it will override the
>>  protobuf group.id and package to com.github.os72 which
>>  includes ARM support. For X86 platform, Hive will still download the
>>  protobuf packages from org.google repo. I think with
>>  this method, we can keep the influence to existing x86 users to the
>>  minimum. I hope this could be a acceptable short-term
>>  solution.
>> 
>>  I've manually tested on my machine and the github PR travis CI test
>> has
>>  already passed, so the build process is OK, so let's
>>  wait for the full test result from builds.apache.org.

Reviewers and assignees of PRs

2020-06-18 Thread Zoltan Haindrich

Hey all!

I'm happy to see that (I guess) everyone is using the PR based stuff without issues - there are still some flaky stuff from time-to-time; but I feel that patches go in 
faster - and I have a feeling we have more reviewes going on as well - which is awesome!


I've read a bit about github "reviewers" / "assignee" stuff - because it seemed 
somewhat confusing...
Basically both of them could be a group of users - the meaning of these fields 
should be filled by the community.
I would like to propose to use the "reviewers" to use it as people from whom 
reviews might be expected.
And use the assignee field to list those who should approve the change to go in 
(anyone may add asignees/reviewers)

We sometimes forget PRs and they may become "stale" most of them is just falling thru the cracks...to prevent this the best would be if everyone would self-assign PRs which 
are in his/her area of interest.


There are some times when a give feature needs to change not closely related parts of the codebase - this is usually fine; but there are places which might need "more eyes" 
on reviews.

In the past I was sometimes surprised by some interesting changes in say the 
thrift api / package.jdo / antlr stuff.

Because the jira title may not suggest what files will be changed - I wanted to 
find a way to auto add some kind of notifications to PRs.

Today I've found a neat solution to this [1] - which goes a little bit beyond what I anticipated - there is a small plugin which could enable to auto-add reviewers based on 
the changed files (adding a reviewer will also emit an email) - I had to fix a few small issues with it to ensure that it works/etc [2].


I really like this approach beacuase it could enable to change the direction of things - and could enable that contributors doesn't neccessarily need to look for reviewers. 
(but this seems more like just sci-fi right now - lets start small and go from there...)


I propose to collect some globs and reviewers in a google doc before we first 
commit this file into the repo - so that everyone could add things he/she is 
interested in.

cheers,
Zoltan

[1] https://github.com/marketplace/actions/auto-assign-reviewer-by-files
[2] https://github.com/kgyrtkirk/auto-assign-reviewer-by-files
[3] 
https://docs.google.com/document/d/11n9acHby31rwVHfRW4zxxYukymHS-tTSYlJEghZwJaY/edit?usp=sharing