[ https://issues.apache.org/jira/browse/HIVE-16749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066649#comment-16066649 ]
Allen Wittenauer commented on HIVE-16749: ----------------------------------------- FYI, typically people let Yetus execute the docker commands itself. Two big reasons for this: * yetus will run docker build which means that the dockerfile can be modified on the fly as necessary. yetus will detect if the dockerfile has changed and rebuild as necessary--including as part of the patch being tested! * the patchdir and basedir will be available after the container exists, which means logs and such are available post-build. This is very useful to have access to in case of failures. Anyway, cutting back the extra bits, given a directory structure of: artifacts dir for Jenkins here: ${WORKSPACE}/artifacts git checkout to here: ${WORKSPACE}/source You just need the following extra lines on the command line: {code} --patch-dir=${WORKSPACE}/artifacts \ --basedir=${WORKSPACE}/source \ --docker \ --dockerfile=${WORKSPACE}/dev-support/docker/Dockerfile \ {code} Be aware that the Dockerfile needs to have *everything* that Yetus will need to do it's work. e.g., if the pylint test is enabled, then python with all the pre-req pylint eggs needs be installed too. You can see the default/example one that Yetus uses here: https://github.com/apache/yetus/blob/405cd9fa6e4f6240690bbba1bad6d054a4241214/precommit/test-patch-docker/Dockerfile If you have an existing Dockerfile that has some extra stuff you don't want executed as part of the Yetus run, if you can separate that out to the bottom of the file, you can use it too. See Hadoop's as an example: https://github.com/apache/hadoop/blob/ee243e5289212aa2912d191035802ea023367e19/dev-support/docker/Dockerfile The {{{# YETUS CUT HERE}}} line acts as a guard. I also HIGHLY recommend using the {{{--mvn-custom-repos}}} and {{{--jenkins}}} where more than one maven run is happening on a Jenkins instance. Maven does *zero* locking of its cache, which means that simultaneous runs will stomp all over each other and result in wildly inaccurate results. Those flags will guarantee on Jenkins that different executors will use different .m2 caches for themselves as well for different branches. The very first run on a node will take a while as it does the mass download, but after that it's pretty quick. We saw significant unit test failure counts drop after doing that in Hadoop. One other thing: you don't need to run patch. You can monkey patch individual functions inside the hive personality file. It's loaded last which means it can overwrite other functions... :) > Run YETUS in Docker container > ----------------------------- > > Key: HIVE-16749 > URL: https://issues.apache.org/jira/browse/HIVE-16749 > Project: Hive > Issue Type: Sub-task > Reporter: Peter Vary > Assignee: Zoltan Haindrich > Attachments: HIVE-16749.1.patch > > > Think about the pros and cons of running YETUS in a docker container: > - Resources > - Usage complexity > - Yetus version changes > - Findbugs > - etc. > If worthwhile run YETUS in a docker container -- This message was sent by Atlassian JIRA (v6.4.14#64029)