[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046676#comment-17046676 ] Aron Hamvas commented on HIVE-22942: Good! We have already been brainstorming about this a bit with [~b.maidics] and [~zchovan], and I heard rumors that [~pvary] had ideas about this last year. Might be good to involve those guys as they all seemed heavily interested. The (informal) discussions were focused on two major topics: 1. Execution engine. E.g. moving to JUnit 5 is probably not a huge effort and it supports parallel execution that could speed up execution. 2. Rewriting Ptest framework or replacing it with a more general purpose alternative. > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046730#comment-17046730 ] Zoltan Haindrich commented on HIVE-22942: - How it works right now: * we run a [job on the ASF jenkins| instancehttps://builds.apache.org/job/PreCommit-HIVE-Build/] which logs into some cloud instance to launch the ptest execution * the ptest uses a predefined number of executors(16?) * the tests are batched by a custom logic into ~200 batches * every executor runs 2 batches at a time * there are some specially tailored features; like timeout at batch level and a way to run something in "isolation" Right now I think the following would be the most promising: * drop in something else for make use of the [parallel-test-executor plugin for jenkins|https://plugins.jenkins.io/parallel-test-executor/] * it basically works by scanning the last result and it makes around equally sized test groups - and runs that...however it is unable to work if there are testcases which run for more time than the bucket size this could be probably explored by shoveling in some logic to split the larger cases into ~30m parts * creating a job which utilizes the plugin is quite straight forward; so adding all the executors as slaves to a jenkins will be able to utilize the same compute power > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047974#comment-17047974 ] Andrew Sherman commented on HIVE-22942: --- See also HIVE-19571 > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055864#comment-17055864 ] Zoltan Haindrich commented on HIVE-22942: - Hey All, I think the best would be to replace the ptest thing with something else - which is not maintained by the Hive community; moving to junit5 would be cool; but it might be challenging to do...the arallel execution of tests within the same machine tend to uncover further issues when we don't expect 2 pieces of the same kind of test to be executed at the same time...and I don't think we can have a single machine to execute all of them in one place - I think running batches in isolated environment on 1 thread might be more robust - and reliable; so that we can actually will be able to repro the issue. I've opened a PR with a working prototype; it isn't complete - but it's able to do the following: * builds upon some jenkins plugins; and the job itself is defined as a Jenkinsfile * uses docker images executed on a kubernetes cluster to provide reproducibility - so anyone will be more likely to be able to repro runs of the tests by using docker * to make the parrallel test executor plugin "happy" - I needed to find a way to reduce the max testclass execution time belove ~30 minutest ** as a first approach I went on and analyzed test execution times based on the actual testcase timesits possible; but defining the ranges and maintaining them long term might be intersting at least ** then I compared how "well" a naive approach would compare...and I concluded that going over twice as many splits the result is acceptableso I went this way its a cleaner way to do it.. ** I wanted to not disrupt existing usages of testing so I came up with the following way to declare further classes for qtest over 30minutes ; let's go with TestCliDriver for now: *** in case a special flag is enables (qsplits) the TestCliDriver is split into a number of parts; the "split" classes are differ only in the package name; so a "-Dtest=TestCliDriver" will still work to run the testcase *** there is some shell script / java reflection stuff which actually does the splitting of the test parameter list into smaller pieces currently I think the replacement layout will be: * a kubernetes cluster somewhere (gce/gke) * a jenkins running inside the kubernetes cluster * a local artifact caching instance is added to reduce outside comm * it would be easier to tie the job into github PRs and live with that instead retaining the run-a-patch approach * as for running multiple ptest; it will be easily possible as the limit will be the number of pods the jenkins may launch; things that are still need investigations/etc: * there are a bunch of failing tests ... I guess most of them has some env issue in the background * there should be a timeout on executing a set of tests; the ptest env uses a "timeout" on the maven command - I can just throw in the timeout plugin; but timeouts should be fixedthey are a sign of big problems like deadlocks/etc * no support for "isolated" tests - this should be rethinked > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055892#comment-17055892 ] Zoltan Chovan commented on HIVE-22942: -- So far it sounds great :) What happens when a build fails for a PR? If there are flaky tests, we would need some way to re-trigger the run. Would it be possible to only execute the failed tests for the same commit? If we already using k8s, do you think it would be feasible to introduce multiple types of backend dbs for certain tests? We have a lot of directsql around TxnHandler and HMS, that would greatly benefit from that. The backend db could be spun up on-demand just for some tests that are flagged maybe. > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055960#comment-17055960 ] Zoltan Haindrich commented on HIVE-22942: - bq. If there are flaky tests, we would need some way to re-trigger the run. Would it be possible to only execute the failed tests for the same commit? that could be possibly done; however I'm not sure if supporting flaky-rerun would make things better. The thing is that if there is a feature like that - we also accept the fact that we have flaky tests - they should be fixed... - I hope that removing the ptest framework and running 1 set of tests in a single container will increase the probability to repro test failures. bq. If we already using k8s, do you think it would be feasible to introduce multiple types of backend dbs for certain tests? Of course! That would be super usefull - I suspect it would be not that hard to add something for that after we have this stuff in place - so one step at a time... :) Stabilizing this stuff will take a bit of time...because this approach changes some things for the tests themselfs - they do fail because of already existing issues (ex: HIVE-23003) which was tolerated by the existing setup. If you would like to experiment with: I think we can either integrate some kind of smoke testing into the main job - or we could have another one to do that stuff. > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116133#comment-17116133 ] Hive QA commented on HIVE-22942: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/13003951/HIVE-22942.01.patch {color:green}SUCCESS:{color} +1 due to 24 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 17230 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/22607/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/22607/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-22607/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 13003951 - PreCommit-HIVE-Build > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22942.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework > some info/etc about how it compares to existing one: > https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit#heading=h.e51vlxui3e6n -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116152#comment-17116152 ] Hive QA commented on HIVE-22942: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 34s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 30s{color} | {color:blue} llap-client in master has 27 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 20s{color} | {color:red} metastore-server in master failed. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 38s{color} | {color:blue} ql in master has 1524 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 37s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 8s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} llap-client: The patch generated 0 new + 5 unchanged - 1 fixed = 5 total (was 6) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} The patch metastore-server passed checkstyle {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 8s{color} | {color:green} The patch ql passed checkstyle {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} The patch kafka-handler passed checkstyle {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 3m 44s{color} | {color:red} root: The patch generated 1 new + 238 unchanged - 2 fixed = 239 total (was 240) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} The patch hcatalog-unit passed checkstyle {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 49s{color} | {color:red} itests/hive-unit: The patch generated 1 new + 215 unchanged - 1 fixed = 216 total (was 216) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} The patch qtest passed checkstyle {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 3s{color} | {color:red} patch/llap-client cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 49s{color} | {color:red} metastore-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 10m 20s{color} | {color:red} patch/ql cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 48s{color} | {color:red} patch/kafka-handler cannot run setBugDatabaseInfo from findbugs {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 45s{color} | {color:red} patch/itests/hive-unit cannot run setBugDatabaseInfo from f
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117953#comment-17117953 ] Zoltan Haindrich commented on HIVE-22942: - [~jcamachorodriguez]: could you please take a look? > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22942.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework > some info/etc about how it compares to existing one: > https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit#heading=h.e51vlxui3e6n -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22942) Replace PTest with an alternative
[ https://issues.apache.org/jira/browse/HIVE-22942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119033#comment-17119033 ] Jesus Camacho Rodriguez commented on HIVE-22942: +1 > Replace PTest with an alternative > - > > Key: HIVE-22942 > URL: https://issues.apache.org/jira/browse/HIVE-22942 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22942.01.patch > > Time Spent: 50m > Remaining Estimate: 0h > > I never opened a jira about this...but it might actually help collect ideas > and actually start going somewhere sooner than later :D > Right now we maintain the ptest2 project inside Hive to be able to run Hive > tests in a distributed fashion...the backstab of this solution is that we are > putting much effort into maintaining a distributed test execution framework... > I think it would be better if we could find an off the shelf solution for the > task and migrate to that instead of putting more efforts into the ptest > framework > some info/etc about how it compares to existing one: > https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit#heading=h.e51vlxui3e6n -- This message was sent by Atlassian Jira (v8.3.4#803005)