rahul-ghiware opened a new issue, #404:
URL: https://github.com/apache/incubator-xtable/issues/404
Using Spark 3.4.0, Scala 2.12 and Iceberg spark runtime 1.4.2
- Created iceberg table in tmp folder
```
rghiware ~ $ cd /tmp
rghiware /tmp $ cd iceberg-warehouse/people
rghiware iceberg-warehouse/people $ ls
data metadata
rghiware iceberg-warehouse/people $ cd data
rghiware people/data $ ls
00000-3-4117ce4f-ff56-410b-a248-c9ed512903c8-00001.parquet
```
- Created a yaml file (`my_dataset_config.yaml`) for iceberg target and delta
```
sourceFormat: ICEBERG
targetFormats:
- HUDI
- DELTA
datasets:
-
tableBasePath: file:///tmp/iceberg-warehouse/people
tableDataPath: file:///tmp/iceberg-warehouse/people/data
tableName: people
```
- Ran the one table sync jar locally
```
java -jar ./utilities-0.1.0-beta1-bundled.jar -d my_dataset_config.yaml
```
- Was able to confirm and see .hoodie and _delta_log folders under the data
folders for the table
```
rghiware /tmp $ cd iceberg-warehouse/people/data
rghiware people/data $ ls -altr
total 16
-rw-r--r--@ 1 rghiware wheel 24 Mar 29 09:49
.00000-3-4117ce4f-ff56-410b-a248-c9ed512903c8-00001.parquet.crc
-rw-r--r--@ 1 rghiware wheel 1618 Mar 29 09:49
00000-3-4117ce4f-ff56-410b-a248-c9ed512903c8-00001.parquet
drwxr-xr-x@ 4 rghiware wheel 128 Mar 29 09:49 ..
drwxr-xr-x@ 6 rghiware wheel 192 Mar 29 09:54 .
drwxr-xr-x@ 15 rghiware wheel 480 Mar 29 09:55 .hoodie
drwxr-xr-x@ 4 rghiware wheel 128 Mar 29 09:55 _delta_log
```
- Able to load data with pySpark using delta format
```
pyspark \
--packages io.delta:delta-core_2.12:2.4.0 \
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
\
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension"
```
```
>>> df =
spark.read.format("delta").load("/tmp/iceberg-warehouse/people/data")
>>> df.printSchema()
root
|-- id: integer (nullable = true)
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
|-- city: string (nullable = true)
|-- create_ts: string (nullable = true)
>>> df.show()
24/03/29 10:27:32 WARN package: Truncated the string representation of a
plan since it was too large. This behavior can be adjusted by setting
'spark.sql.debug.maxToStringFields'.
+---+-------+---+----+-------------------+
| id| name|age|city| create_ts|
+---+-------+---+----+-------------------+
| 6|Charlie| 31| DFW|2023-08-29 00:00:00|
| 1| John| 25| NYC|2023-09-28 00:00:00|
| 4| Andrew| 40| NYC|2023-10-28 00:00:00|
| 3|Michael| 35| ORD|2023-09-28 00:00:00|
| 5| Bob| 28| SEA|2023-09-23 00:00:00|
| 2| Emily| 30| SFO|2023-09-28 00:00:00|
+---+-------+---+----+-------------------+
>>>
```
- However unable to load same data with pySpark using hudi format
```
pyspark \
--packages org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0 \
--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
--conf
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
\
--conf
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
```
```
>>> df = spark.read.format("hudi").load("/tmp/iceberg-warehouse/people/data")
24/03/29 10:30:30 WARN DFSPropertiesConfiguration: Cannot find
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
24/03/29 10:30:30 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
>>> df.printSchema()
root
|-- id: integer (nullable = true)
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
|-- city: string (nullable = true)
|-- create_ts: string (nullable = true)
>>> df.show()
+---+----+---+----+---------+
| id|name|age|city|create_ts|
+---+----+---+----+---------+
+---+----+---+----+---------+
>>>
```
Not able to figure out if I'm missing anything here or it is an issue with
xtable jar.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]