lcs559 opened a new issue #3113:
URL: https://github.com/apache/iceberg/issues/3113
1. **Environment**
```shell
hdfs: 3.1.1.3.1
hive: 3.1.0
iceberg: master branch build(2021/9/14)
```
2. **login hive beeline**
```shell
# beeline
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/3.1.5.0-152/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/3.1.5.0-152/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to
jdbc:hive2://dev.bdp.mgmt01:2181,dev.bdp.mgmt02:2181,dev.bdp.mgmt03:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Enter username for
jdbc:hive2://dev.bdp.mgmt01:2181,dev.bdp.mgmt02:2181,dev.bdp.mgmt03:2181/default:
hive
Enter password for
jdbc:hive2://dev.bdp.mgmt01:2181,dev.bdp.mgmt02:2181,dev.bdp.mgmt03:2181/default:
21/09/14 14:12:04 [main]: INFO jdbc.HiveConnection: Connected to
dev.bdp.mgmt01:10000
Connected to: Apache Hive (version 3.1.0.3.1.5.0-152)
Driver: Hive JDBC (version 3.1.0.3.1.5.0-152)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.0.3.1.5.0-152 by Apache Hive
0: jdbc:hive2://dev.bdp.mgmt01:2181,dev.bdp.m>
```
3. **add `iceberg-hive-runtime-5f90476.jar`**
```sql
> add jar hdfs://bdptest/user/hive/lib/iceberg-hive-runtime-5f90476.jar;
INFO : Added
[/tmp/0be97055-c189-4e8d-ad33-54a50ac828bd_resources/iceberg-hive-runtime-5f90476.jar]
to class path
INFO : Added resources:
[hdfs://bdptest/user/hive/lib/iceberg-hive-runtime-5f90476.jar]
No rows affected (0.263 seconds)
```
4. **create iceberg table**
```sql
> CREATE TABLE iceberg.t5 (
. . . . . . . . . . . . . . . . . . . . . . .> id bigint,
. . . . . . . . . . . . . . . . . . . . . . .> name string
. . . . . . . . . . . . . . . . . . . . . . .> ) PARTITIONED BY (
. . . . . . . . . . . . . . . . . . . . . . .> dept string
. . . . . . . . . . . . . . . . . . . . . . .> ) STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
INFO : Compiling
command(queryId=hive_20210914141346_272fe204-b77a-4576-9cca-219d9442daf8):
CREATE TABLE iceberg.t5 (
id bigint,
name string
) PARTITIONED BY (
dept string
) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling
command(queryId=hive_20210914141346_272fe204-b77a-4576-9cca-219d9442daf8); Time
taken: 0.132 seconds
INFO : Executing
command(queryId=hive_20210914141346_272fe204-b77a-4576-9cca-219d9442daf8):
CREATE TABLE iceberg.t5 (
id bigint,
name string
) PARTITIONED BY (
dept string
) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=hive_20210914141346_272fe204-b77a-4576-9cca-219d9442daf8); Time
taken: 2.056 seconds
INFO : OK
No rows affected (2.222 seconds)
```
5. **insert data**
```sql
> insert into iceberg.t5 values(1,'t1','d1');
INFO : Compiling
command(queryId=hive_20210914141536_be24fd62-5922-472d-b1df-24c2dfa8e383):
insert into iceberg.t5 values(1,'t1','d1')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:_col0, type:bigint, comment:null),
FieldSchema(name:_col1, type:string, comment:null), FieldSchema(name:_col2,
type:string, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20210914141536_be24fd62-5922-472d-b1df-24c2dfa8e383); Time
taken: 0.294 seconds
INFO : Executing
command(queryId=hive_20210914141536_be24fd62-5922-472d-b1df-24c2dfa8e383):
insert into iceberg.t5 values(1,'t1','d1')
INFO : Query ID =
hive_20210914141536_be24fd62-5922-472d-b1df-24c2dfa8e383
INFO : Total jobs = 1
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Starting task [Stage-1:DDL] in serial mode
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-2:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId:
hive_20210914141536_be24fd62-5922-472d-b1df-24c2dfa8e383
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into iceberg.t5 values(1,'t1','d1') (Stage-2)
INFO : Status: Running (Executing on YARN cluster with App id
application_1631502306736_0204)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING
PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0
0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 6.48 s
----------------------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 6.43 seconds
INFO :
INFO : Query Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO :
----------------------------------------------------------------------------------------------
INFO : Compile Query 0.29s
INFO : Prepare Plan 6.28s
INFO : Get Query Coordinator (AM) 0.00s
INFO : Submit Plan 0.67s
INFO : Start DAG 0.69s
INFO : Run DAG 6.43s
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : Task Execution Summary
INFO :
----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms)
INPUT_RECORDS OUTPUT_RECORDS
INFO :
----------------------------------------------------------------------------------------------
INFO : Map 1 3314.00 4,800 179
3 0
INFO :
----------------------------------------------------------------------------------------------
INFO :
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : NUM_SUCCEEDED_TASKS: 1
INFO : TOTAL_LAUNCHED_TASKS: 1
INFO : RACK_LOCAL_TASKS: 1
INFO : AM_CPU_MILLISECONDS: 2090
INFO : AM_GC_TIME_MILLIS: 0
INFO : File System Counters:
INFO : HDFS_BYTES_WRITTEN: 896
INFO : HDFS_WRITE_OPS: 1
INFO : HDFS_OP_CREATE: 1
INFO : org.apache.tez.common.counters.TaskCounter:
INFO : GC_TIME_MILLIS: 179
INFO : TASK_DURATION_MILLIS: 3507
INFO : CPU_MILLISECONDS: 4800
INFO : PHYSICAL_MEMORY_BYTES: 260046848
INFO : VIRTUAL_MEMORY_BYTES: 4520706048
INFO : COMMITTED_HEAP_BYTES: 260046848
INFO : INPUT_RECORDS_PROCESSED: 4
INFO : INPUT_SPLIT_LENGTH_BYTES: 1
INFO : OUTPUT_RECORDS: 0
INFO : HIVE:
INFO : CREATED_FILES: 1
INFO : DESERIALIZE_ERRORS: 0
INFO : RECORDS_IN_Map_1: 3
INFO : RECORDS_OUT_1_iceberg.t5: 1
INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 0
INFO : RECORDS_OUT_OPERATOR_FS_5: 1
INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
INFO : RECORDS_OUT_OPERATOR_SEL_1: 1
INFO : RECORDS_OUT_OPERATOR_SEL_3: 1
INFO : RECORDS_OUT_OPERATOR_TS_0: 1
INFO : RECORDS_OUT_OPERATOR_UDTF_2: 1
INFO : TaskCounter_Map_1_INPUT__dummy_table:
INFO : INPUT_RECORDS_PROCESSED: 4
INFO : INPUT_SPLIT_LENGTH_BYTES: 1
INFO : TaskCounter_Map_1_OUTPUT_out_Map_1:
INFO : OUTPUT_RECORDS: 0
INFO : Starting task [Stage-4:DDL] in serial mode
INFO : Completed executing
command(queryId=hive_20210914141536_be24fd62-5922-472d-b1df-24c2dfa8e383); Time
taken: 14.075 seconds
INFO : OK
No rows affected (14.431 seconds)
```
6. **Query iceberg table by hive**
```sql
> select * from iceberg.t5;
INFO : Compiling
command(queryId=hive_20210914141630_3f4534a0-659e-4100-af91-f7c687d75245):
select * from iceberg.t5
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:t5.id, type:bigint, comment:null),
FieldSchema(name:t5.name, type:string, comment:null), FieldSchema(name:t5.dept,
type:string, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=hive_20210914141630_3f4534a0-659e-4100-af91-f7c687d75245); Time
taken: 0.136 seconds
INFO : Executing
command(queryId=hive_20210914141630_3f4534a0-659e-4100-af91-f7c687d75245):
select * from iceberg.t5
INFO : Completed executing
command(queryId=hive_20210914141630_3f4534a0-659e-4100-af91-f7c687d75245); Time
taken: 0.005 seconds
INFO : OK
+--------+----------+----------+
| t5.id | t5.name | t5.dept |
+--------+----------+----------+
+--------+----------+----------+
No rows selected (0.202 seconds)
```
7. **show create table**
```sql
> show create table iceberg.t5;
INFO : Compiling
command(queryId=hive_20210914141931_a48f0a14-4d07-4867-aaee-4db4ece1d05f): show
create table iceberg.t5
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from
deserializer)], properties:null)
INFO : Completed compiling
command(queryId=hive_20210914141931_a48f0a14-4d07-4867-aaee-4db4ece1d05f); Time
taken: 0.027 seconds
INFO : Executing
command(queryId=hive_20210914141931_a48f0a14-4d07-4867-aaee-4db4ece1d05f): show
create table iceberg.t5
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing
command(queryId=hive_20210914141931_a48f0a14-4d07-4867-aaee-4db4ece1d05f); Time
taken: 0.036 seconds
INFO : OK
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `iceberg.t5`( |
| `id` bigint COMMENT 'from deserializer', |
| `name` string COMMENT 'from deserializer', |
| `dept` string COMMENT 'from deserializer') |
| ROW FORMAT SERDE |
| 'org.apache.iceberg.mr.hive.HiveIcebergSerDe' |
| STORED BY |
| 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' |
| |
| LOCATION |
| 'hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5' |
| TBLPROPERTIES ( |
| 'TRANSLATED_TO_EXTERNAL'='TRUE', |
| 'bucketing_version'='2', |
| 'engine.hive.enabled'='true', |
| 'external.table.purge'='TRUE', |
| 'last_modified_by'='hive', |
| 'last_modified_time'='1631600137', |
|
'metadata_location'='hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/metadata/00000-6a199018-22ec-4f82-8434-5672bd26b7c4.metadata.json',
|
| 'table_type'='ICEBERG', |
| 'transient_lastDdlTime'='1631600137') |
+----------------------------------------------------+
21 rows selected (0.104 seconds)
```
8. **Check the file of the table **
```shell
# sudo -u hdfs hadoop fs -ls
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5
Found 2 items
drwxrwx---+ - hive hadoop 0 2021-09-14 14:15
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/data
drwxrwx---+ - hive hadoop 0 2021-09-14 14:13
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/metadata
# sudo -u hdfs hadoop fs -ls
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/metadata
Found 1 items
-rw-rw----+ 3 hive hadoop 1731 2021-09-14 14:13
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/metadata/00000-6a199018-22ec-4f82-8434-5672bd26b7c4.metadata.json
# sudo -u hdfs hadoop fs -ls
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/data
Found 1 items
drwxrwx---+ - hive hadoop 0 2021-09-14 14:15
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/data/dept=d1
# sudo -u hdfs hadoop fs -ls
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/data/dept=d1
Found 1 items
-rw-rw----+ 3 hive hadoop 896 2021-09-14 14:15
hdfs://bdptest/warehouse/tablespace/managed/hive/iceberg.db/t5/data/dept=d1/00000-0-hive_20210914141536_be24fd62-5922-472d-b1df-24c2dfa8e383-job_1631502306736_0204-00001.parquet
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]