[ https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=830015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830015 ]
ASF GitHub Bot logged work on HIVE-26400: ----------------------------------------- Author: ASF GitHub Bot Created on: 30/Nov/22 10:21 Start Date: 30/Nov/22 10:21 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on code in PR #3448: URL: https://github.com/apache/hive/pull/3448#discussion_r1035785789 ########## dev-support/docker/README.md: ########## @@ -0,0 +1,93 @@ +### Introduction + +--- +Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL as its back database. +Provide the following +- Quick-start/Debugging/Prepare a test env for Hive +- Images can be used as the basis for the Kubernetes operator + +### Overview + +--- +#### Files +- docker-compose.yml: Docker compose file +- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions of the image. +- conf/hiveserver2-site.xml: Configuration for HiveServer2 +- conf/metastore-site.xml: Configuration for Hive Metastore +- build.sh Scripts for build images + +### Quickstart + +--- +#### Build images +Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are so many versions that these dependents have been released, including Hive itself, +providing a way to build Hive against a specified version of the dependent sounds reasonable. There are some build args for this purpose, as listed below: +```shell +--hadoop <hadoop version> +--tez <tez version> +--hive <hive version> +``` +If the version is not provided, then it will read the version from the properties in project top `pom.xml`, +that is, `project.version` for Hive, `hadoop.version` for Hadoop and `tez.version` for Tez. For example: + +```shell +./build.sh --hive 3.1.3 +``` +The command will pull the tarballs of Hive 3.1.3, Hadoop `hadoop.version` and Tez `tez.version` from apache repository +to build the target image. + +```shell +./build.sh --hadoop 3.1.0 --tez 0.10.1 +``` +The above command does not specify the Hive version, it will use the local `apache-hive-${project.version}-bin.tar.gz`, +together with Hadoop 3.1.0 and Tez 0.10.1 to build the target image. + +#### Run services + +- Launch a single standalone Metastore + +If you just want to play around with Metastore, run: +```shell +docker run --name metastore-standalone apachehive/metastore:$HIVE_VERSION +``` + +- Launch a single standalone HiveServer2 for a quick start + +The HiveServer2 will be started with an embedded Metastore by initiating: +```shell +docker run --name hiveserver2-standalone apachehive/hiveserver2:$HIVE_VERSION +``` +The data of the HiveServer2 would be lost between container restarts. + +- Launch a cluster with HiveServer2, Metastore and MySQL as its back database. + +To save data between container restarts, Volumes is used to persist data generated by and used by Hive. Just by executing: +```shell +export HIVESERVER2_IMAGE=apachehive/hiveserver2:$HIVE_VERSION +export METASTORE_IMAGE=apachehive/metastore:$HIVE_VERSION +docker network create hive && docker-compose -f docker-compose.yml up -d +``` + +#### Usage + +--- +- Show the containers +```shell +docker ps +``` +- Check HiveServer2 web ui + - Accessed on browser at http://localhost:10002/ +- Run Beeline: +```shell +docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/' +``` +- Test running some queries +```sql +show tables; +create table hive_example(a string, b int) partitioned by(c int); +alter table hive_example add partition(c=1); +insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3); +select count(distinct a) from hive_example; +set hive.execution.engine=tez; Review Comment: ack Issue Time Tracking ------------------- Worklog Id: (was: 830015) Time Spent: 4h 40m (was: 4.5h) > Provide docker images for Hive > ------------------------------ > > Key: HIVE-26400 > URL: https://issues.apache.org/jira/browse/HIVE-26400 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure > Reporter: Zhihua Deng > Assignee: Zhihua Deng > Priority: Blocker > Labels: hive-4.0.0-must, pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > Make Apache Hive be able to run inside docker container in pseudo-distributed > mode, with MySQL/Derby as its back database, provide the following: > * Quick-start/Debugging/Prepare a test env for Hive; > * Tools to build target image with specified version of Hive and its > dependencies; > * Images can be used as the basis for the Kubernetes operator. -- This message was sent by Atlassian Jira (v8.20.10#820010)