Fokko commented on code in PR #9935:
URL: https://github.com/apache/iceberg/pull/9935#discussion_r1521056978
##########
1.5.0/docs/spark-getting-started.md:
##########
@@ -123,18 +123,78 @@ SELECT * FROM local.db.table.snapshots;
+-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+
```
-[DataFrame reads](spark-queries.md#querying-with-dataframes) are supported and
can now reference tables by name using `spark.table`:
+[DataFrame reads](../spark-queries.md#querying-with-dataframes) are supported
and can now reference tables by name using `spark.table`:
```scala
val df = spark.table("local.db.table")
df.count()
```
+### Type compatibility
+
+Spark and Iceberg support different set of types. Iceberg does the type
conversion automatically, but not for all combinations,
+so you may want to understand the type conversion in Iceberg in prior to
design the types of columns in your tables.
+
+#### Spark type to Iceberg type
+
+This type conversion table describes how Spark types are converted to the
Iceberg types. The conversion applies on both creating Iceberg table and
writing to Iceberg table via Spark.
+
+| Spark | Iceberg | Notes |
+|-----------------|----------------------------|-------|
+| boolean | boolean | |
+| short | integer | |
+| byte | integer | |
+| integer | integer | |
+| long | long | |
+| float | float | |
+| double | double | |
+| date | date | |
+| timestamp | timestamp with timezone | |
+| timestamp_ntz | timestamp without timezone | |
Review Comment:
OCD
```suggestion
| timestamp_ntz | timestamp without timezone | |
```
##########
1.5.0/docs/spark-getting-started.md:
##########
@@ -123,18 +123,78 @@ SELECT * FROM local.db.table.snapshots;
+-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+
```
-[DataFrame reads](spark-queries.md#querying-with-dataframes) are supported and
can now reference tables by name using `spark.table`:
+[DataFrame reads](../spark-queries.md#querying-with-dataframes) are supported
and can now reference tables by name using `spark.table`:
```scala
val df = spark.table("local.db.table")
df.count()
```
+### Type compatibility
+
+Spark and Iceberg support different set of types. Iceberg does the type
conversion automatically, but not for all combinations,
+so you may want to understand the type conversion in Iceberg in prior to
design the types of columns in your tables.
+
+#### Spark type to Iceberg type
+
+This type conversion table describes how Spark types are converted to the
Iceberg types. The conversion applies on both creating Iceberg table and
writing to Iceberg table via Spark.
+
+| Spark | Iceberg | Notes |
+|-----------------|----------------------------|-------|
+| boolean | boolean | |
+| short | integer | |
+| byte | integer | |
+| integer | integer | |
+| long | long | |
+| float | float | |
+| double | double | |
+| date | date | |
+| timestamp | timestamp with timezone | |
+| timestamp_ntz | timestamp without timezone | |
+| char | string | |
+| varchar | string | |
+| string | string | |
+| binary | binary | |
+| decimal | decimal | |
+| struct | struct | |
+| array | list | |
+| map | map | |
+
+!!! info
+ The table is based on representing conversion during creating table. In
fact, broader supports are applied on write. Here're some points on write:
+
+ * Iceberg numeric types (`integer`, `long`, `float`, `double`, `decimal`)
support promotion during writes. e.g. You can write Spark types `short`,
`byte`, `integer`, `long` to Iceberg type `long`.
+ * You can write to Iceberg `fixed` type using Spark `binary` type. Note
that assertion on the length will be performed.
+
+#### Iceberg type to Spark type
+
+This type conversion table describes how Iceberg types are converted to the
Spark types. The conversion applies on reading from Iceberg table via Spark.
+
+| Iceberg | Spark | Note |
+|----------------------------|-------------------------|---------------|
+| boolean | boolean | |
+| integer | integer | |
+| long | long | |
+| float | float | |
+| double | double | |
+| date | date | |
+| time | | Not supported |
+| timestamp with timezone | timestamp | |
+| timestamp without timezone | timestamp_ntz | |
Review Comment:
```suggestion
| timestamp without timezone | timestamp_ntz | |
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]