[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-02-27 Thread Aron Hamvas (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046676#comment-17046676
 ] 

Aron Hamvas commented on HIVE-22942:


Good!
We have already been brainstorming about this a bit with [~b.maidics] and 
[~zchovan], and I heard rumors that [~pvary] had ideas about this last year. 
Might be good to involve those guys as they all seemed heavily interested.

The (informal) discussions were focused on two major topics:
1. Execution engine. E.g. moving to JUnit 5 is probably not a huge effort and 
it supports parallel execution that could speed up execution.
2. Rewriting Ptest framework or replacing it with a more general purpose 
alternative.

> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-02-27 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046730#comment-17046730
 ] 

Zoltan Haindrich commented on HIVE-22942:
-

How it works right now:
* we run a [job on the ASF jenkins| 
instancehttps://builds.apache.org/job/PreCommit-HIVE-Build/] which logs into 
some cloud instance to launch the ptest execution
* the ptest uses a predefined number of executors(16?) 
* the tests are batched by a custom logic into ~200 batches
* every executor runs 2 batches at a time
* there are some specially tailored features; like timeout at batch level and a 
way to run something in "isolation"

Right now I think the following would be the most promising:
* drop in something else for make use of the [parallel-test-executor plugin for 
jenkins|https://plugins.jenkins.io/parallel-test-executor/]
* it basically works by scanning the last result and it  makes around equally 
sized test groups - and runs that...however it is unable to work if there are 
testcases which run for more time than the bucket size this could be 
probably explored by shoveling in some logic to split the larger cases into 
~30m parts
* creating a job which utilizes the plugin is quite straight forward; so adding 
all the executors as slaves to a jenkins will be able to utilize the same 
compute power

> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-02-28 Thread Andrew Sherman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047974#comment-17047974
 ] 

Andrew Sherman commented on HIVE-22942:
---

See also HIVE-19571

> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-03-10 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055864#comment-17055864
 ] 

Zoltan Haindrich commented on HIVE-22942:
-

Hey All,

I think the best would be to replace the ptest thing with something else - 
which is not maintained by the Hive community; moving to junit5 would be cool; 
but it might be challenging to do...the arallel execution of tests within the 
same machine tend to uncover further issues when we don't expect 2 pieces of 
the same kind of test to be executed at the same time...and I don't think we 
can have a single machine to execute all of them in one place - I think running 
batches in isolated environment on 1 thread might be more robust - and 
reliable; so that we can actually will be able to repro the issue.

I've opened a PR with a working prototype; it isn't complete - but it's able to 
do the following:
* builds upon some jenkins plugins; and the job itself is defined as a 
Jenkinsfile
* uses docker images executed on a kubernetes cluster to provide 
reproducibility - so anyone will be more likely to be able to repro runs of the 
tests by using docker
* to make the parrallel test executor plugin "happy" - I needed to find a way 
to reduce the max testclass execution time belove ~30 minutest
** as a first approach I went on and analyzed test execution times based on the 
actual testcase timesits possible; but defining the ranges and maintaining 
them long term might be intersting at least
**  then I compared how "well" a naive approach would compare...and I concluded 
that going over twice as many splits the result is acceptableso I went this 
way its a cleaner way to do it..
** I wanted to not disrupt existing usages of testing so I came up with the 
following way to declare further classes for qtest over 30minutes ; let's go 
with TestCliDriver for now:
*** in case a special flag is enables (qsplits) the TestCliDriver is split into 
a number of parts; the "split" classes are differ only in the package name; so 
a "-Dtest=TestCliDriver" will still work to run the testcase
*** there is some shell script / java reflection stuff which actually does the 
splitting of the test parameter list into smaller pieces

currently I think the replacement layout will be:
* a kubernetes cluster somewhere (gce/gke) 
* a jenkins running inside the kubernetes cluster
* a local artifact caching instance is added to reduce outside comm
* it would be easier to tie the job into github PRs and live with that instead 
retaining the run-a-patch approach
* as for running multiple ptest; it will be easily possible as the limit will 
be the number of pods the jenkins may launch; 

things that are still need investigations/etc:
* there are a bunch of failing tests ... I guess most of them has some env 
issue in the background
* there should be a timeout on executing a set of tests; the ptest env uses a 
"timeout" on the maven command - I can just throw in the timeout plugin; but 
timeouts should be fixedthey are a sign of big problems like deadlocks/etc
* no support for "isolated" tests - this should be rethinked


> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-03-10 Thread Zoltan Chovan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055892#comment-17055892
 ] 

Zoltan Chovan commented on HIVE-22942:
--

So far it sounds great :) What happens when a build fails for a PR? If there 
are flaky tests, we would need some way to re-trigger the run. Would it be 
possible to only execute the failed tests for the same commit?

If we already using k8s, do you think it would be feasible to introduce 
multiple types of backend dbs for certain tests? We have a lot of directsql 
around TxnHandler and HMS, that would greatly benefit from that. The backend db 
could be spun up on-demand just for some tests that are flagged maybe.

> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-03-10 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055960#comment-17055960
 ] 

Zoltan Haindrich commented on HIVE-22942:
-

bq.   If there are flaky tests, we would need some way to re-trigger the run. 
Would it be possible to only execute the failed tests for the same commit?

that could be possibly done; however I'm not sure if supporting flaky-rerun 
would make things better. The thing is that if there is a feature like that - 
we also accept the fact that we have flaky tests - they should be fixed... - I 
hope that removing the ptest framework and running 1 set of tests in a single 
container will increase the probability to repro test failures.

bq.  If we already using k8s, do you think it would be feasible to introduce 
multiple types of backend dbs for certain tests?

Of course! That would be super usefull - I suspect it would be not that hard to 
add something for that after we have this stuff in place - so one step at a 
time... :)
Stabilizing this stuff will take a bit of time...because this approach changes 
some things for the tests themselfs - they do fail because of already existing 
issues (ex: HIVE-23003) which was tolerated by the existing setup.
If you would like to experiment with: I think we can either integrate some kind 
of smoke testing into the main job - or we could have another one to do that 
stuff.


> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-05-25 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116133#comment-17116133
 ] 

Hive QA commented on HIVE-22942:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/13003951/HIVE-22942.01.patch

{color:green}SUCCESS:{color} +1 due to 24 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 17230 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/22607/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/22607/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-22607/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 13003951 - PreCommit-HIVE-Build

> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22942.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework
> some info/etc about how it compares to existing one:
> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit#heading=h.e51vlxui3e6n



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-05-25 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116152#comment-17116152
 ] 

Hive QA commented on HIVE-22942:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  4m 
34s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} llap-client in master has 27 extant Findbugs 
warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
20s{color} | {color:red} metastore-server in master failed. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
38s{color} | {color:blue} ql in master has 1524 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
37s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  9m  
8s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} llap-client: The patch generated 0 new + 5 unchanged 
- 1 fixed = 5 total (was 6) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} The patch metastore-server passed checkstyle {color} 
|
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 8s{color} | {color:green} The patch ql passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} The patch kafka-handler passed checkstyle {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  3m 
44s{color} | {color:red} root: The patch generated 1 new + 238 unchanged - 2 
fixed = 239 total (was 240) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} The patch hcatalog-unit passed checkstyle {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
49s{color} | {color:red} itests/hive-unit: The patch generated 1 new + 215 
unchanged - 1 fixed = 216 total (was 216) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} The patch qtest passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
3s{color} | {color:red} patch/llap-client cannot run setBugDatabaseInfo from 
findbugs {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
49s{color} | {color:red} metastore-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 10m 
20s{color} | {color:red} patch/ql cannot run setBugDatabaseInfo from findbugs 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
48s{color} | {color:red} patch/kafka-handler cannot run setBugDatabaseInfo from 
findbugs {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
45s{color} | {color:red} patch/itests/hive-unit cannot run setBugDatabaseInfo 
from f

[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-05-27 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117953#comment-17117953
 ] 

Zoltan Haindrich commented on HIVE-22942:
-

[~jcamachorodriguez]: could you please take a look?

> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22942.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework
> some info/etc about how it compares to existing one:
> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit#heading=h.e51vlxui3e6n



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22942) Replace PTest with an alternative

2020-05-28 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119033#comment-17119033
 ] 

Jesus Camacho Rodriguez commented on HIVE-22942:


+1

> Replace PTest with an alternative
> -
>
> Key: HIVE-22942
> URL: https://issues.apache.org/jira/browse/HIVE-22942
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22942.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I never opened a jira about this...but it might actually help collect ideas 
> and actually start going somewhere sooner than later :D
> Right now we maintain the ptest2 project inside Hive to be able to run Hive 
> tests in a distributed fashion...the backstab of this solution is that we are 
> putting much effort into maintaining a distributed test execution framework...
> I think it would be better if we could find an off the shelf solution for the 
> task and migrate to that instead of putting more efforts into the ptest 
> framework
> some info/etc about how it compares to existing one:
> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit#heading=h.e51vlxui3e6n



--
This message was sent by Atlassian Jira
(v8.3.4#803005)