zhangdove opened a new issue #1831:
URL: https://github.com/apache/iceberg/issues/1831
1. Environment
```
spark: 3.0.0
hive: 2.3.7
iceberg: 0.10.0
```
2. SparkSession configuration
```scala
val spark = SparkSession
.builder()
.master("local[2]")
.appName("IcebergAPI")
.config("spark.sql.catalog.hive_prod",
"org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.hive_prod.type", "hive")
.config("spark.sql.catalog.hive_prod.uri", "thrift://localhost:9083")
.enableHiveSupport()
.getOrCreate()
```
3. Create database `db` by hive client
```
➜ bin ./beeline
beeline> !connect jdbc:hive2://localhost:10000 hive hive
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.3.7)
Driver: Hive JDBC (version 2.3.7)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> create database db;
No rows affected (0.105 seconds)
```
3. Create iceberg table by hiveCatalog using Spark (Link:
https://iceberg.apache.org/hive/#using-hive-catalog)
```scala
def createByHiveCatalog(spark: SparkSession): Unit = {
val hadoopConfiguration = spark.sparkContext.hadoopConfiguration
hadoopConfiguration.set(org.apache.iceberg.hadoop.ConfigProperties.ENGINE_HIVE_ENABLED,
"true"); //iceberg.engine.hive.enabled=true
val hiveCatalog = new HiveCatalog(hadoopConfiguration)
val nameSpace = Namespace.of("db")
val tableIdentifier: TableIdentifier = TableIdentifier.of(nameSpace,
"tb")
val columns: List[Types.NestedField] = new ArrayList[Types.NestedField]
columns.add(Types.NestedField.of(1, true, "id", Types.IntegerType.get,
"id doc"))
columns.add(Types.NestedField.of(2, true, "ts",
Types.TimestampType.withZone(), "ts doc"))
val schema: Schema = new Schema(columns)
val partition = PartitionSpec.builderFor(schema).year("ts").build()
hiveCatalog.createTable(tableIdentifier, schema, partition)
}
```
4. Query iceberg table by hive client
```hive
0: jdbc:hive2://localhost:10000> add jar
/Users/dovezhang/software/idea/github/iceberg/hive-runtime/build/libs/iceberg-hive-runtime-0.10.0.jar;
No rows affected (0.043 seconds)
0: jdbc:hive2://localhost:10000> set iceberg.mr.catalog=hive;
No rows affected (0.003 seconds)
0: jdbc:hive2://localhost:10000> select * from db.tb;
+--------+--------+
| tb.id | tb.ts |
+--------+--------+
+--------+--------+
No rows selected (1.166 seconds)
```
5. Write data by hive Catalog using Spark
```scala
case class dbtb(id: Int, time: Timestamp)
def writeDataToIcebergHive(spark: SparkSession): Unit = {
val seq = Seq(dbtb(1, Timestamp.valueOf("2020-07-06 13:40:00")),
dbtb(2, Timestamp.valueOf("2020-07-06 14:30:00")),
dbtb(3, Timestamp.valueOf("2020-07-06 15:20:00")))
val df = spark.createDataFrame(seq).toDF("id", "ts")
import org.apache.spark.sql.functions
df.writeTo(s"hive_prod.db.tb").overwrite(functions.lit(true))
}
```
6. Query iceberg table by hive client
```
0: jdbc:hive2://localhost:10000> select * from db.tb;
+--------+--------+
| tb.id | tb.ts |
+--------+--------+
+--------+--------+
No rows selected (0.152 seconds)
```
After writing the data, no data is returned via Hive Client.
7. Query iceberg table by hive catalog using Spark
```
def readIcebergByHiveCatalog(spark: SparkSession): Unit = {
spark.sql("select * from hive_prod.db.tb").show(false)
}
```
Result
```
+---+-------------------+
|id |ts |
+---+-------------------+
|1 |2020-07-06 13:40:00|
|2 |2020-07-06 14:30:00|
|3 |2020-07-06 15:20:00|
+---+-------------------+
```
8. Check the table of contents for data files
```
➜ bin hdfs dfs -ls /usr/hive/warehouse/db.db/tb/data/ts_year=2020
20/11/26 15:16:51 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 dovezhang supergroup 656 2020-11-26 15:11
/usr/hive/warehouse/db.db/tb/data/ts_year=2020/00000-0-2b98be41-8347-4a8c-a986-d28878ab7a67-00001.parquet
-rw-r--r-- 1 dovezhang supergroup 664 2020-11-26 15:11
/usr/hive/warehouse/db.db/tb/data/ts_year=2020/00001-1-b192e846-5a6a-4ee9-b31a-7e5fcf813b88-00001.parquet
```
I am not sure why the Hive Client cannot see data after Spark builds tables
and writes data. Have anyone know it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]