Re: [PR] Add doc fixes for 1.5.0 [iceberg]

via GitHub Tue, 12 Mar 2024 01:40:13 -0700


Fokko commented on code in PR #9935:
URL: https://github.com/apache/iceberg/pull/9935#discussion_r1521056978



##########
1.5.0/docs/spark-getting-started.md:
##########
@@ -123,18 +123,78 @@ SELECT * FROM local.db.table.snapshots;
 
+-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+
 ```
 
-[DataFrame reads](spark-queries.md#querying-with-dataframes) are supported and 
can now reference tables by name using `spark.table`:
+[DataFrame reads](../spark-queries.md#querying-with-dataframes) are supported 
and can now reference tables by name using `spark.table`:
 
 ```scala
 val df = spark.table("local.db.table")
 df.count()
 ```
 
+### Type compatibility
+
+Spark and Iceberg support different set of types. Iceberg does the type 
conversion automatically, but not for all combinations,
+so you may want to understand the type conversion in Iceberg in prior to 
design the types of columns in your tables.
+
+#### Spark type to Iceberg type
+
+This type conversion table describes how Spark types are converted to the 
Iceberg types. The conversion applies on both creating Iceberg table and 
writing to Iceberg table via Spark.
+
+| Spark           | Iceberg                    | Notes |
+|-----------------|----------------------------|-------|
+| boolean         | boolean                    |       |
+| short           | integer                    |       |
+| byte            | integer                    |       |
+| integer         | integer                    |       |
+| long            | long                       |       |
+| float           | float                      |       |
+| double          | double                     |       |
+| date            | date                       |       |
+| timestamp       | timestamp with timezone    |       |
+| timestamp_ntz    | timestamp without timezone |       |

Review Comment:
   OCD
   ```suggestion
   | timestamp_ntz   | timestamp without timezone |       |
   ```



##########
1.5.0/docs/spark-getting-started.md:
##########
@@ -123,18 +123,78 @@ SELECT * FROM local.db.table.snapshots;
 
+-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+
 ```
 
-[DataFrame reads](spark-queries.md#querying-with-dataframes) are supported and 
can now reference tables by name using `spark.table`:
+[DataFrame reads](../spark-queries.md#querying-with-dataframes) are supported 
and can now reference tables by name using `spark.table`:
 
 ```scala
 val df = spark.table("local.db.table")
 df.count()
 ```
 
+### Type compatibility
+
+Spark and Iceberg support different set of types. Iceberg does the type 
conversion automatically, but not for all combinations,
+so you may want to understand the type conversion in Iceberg in prior to 
design the types of columns in your tables.
+
+#### Spark type to Iceberg type
+
+This type conversion table describes how Spark types are converted to the 
Iceberg types. The conversion applies on both creating Iceberg table and 
writing to Iceberg table via Spark.
+
+| Spark           | Iceberg                    | Notes |
+|-----------------|----------------------------|-------|
+| boolean         | boolean                    |       |
+| short           | integer                    |       |
+| byte            | integer                    |       |
+| integer         | integer                    |       |
+| long            | long                       |       |
+| float           | float                      |       |
+| double          | double                     |       |
+| date            | date                       |       |
+| timestamp       | timestamp with timezone    |       |
+| timestamp_ntz    | timestamp without timezone |       |
+| char            | string                     |       |
+| varchar         | string                     |       |
+| string          | string                     |       |
+| binary          | binary                     |       |
+| decimal         | decimal                    |       |
+| struct          | struct                     |       |
+| array           | list                       |       |
+| map             | map                        |       |
+
+!!! info
+    The table is based on representing conversion during creating table. In 
fact, broader supports are applied on write. Here're some points on write:
+
+    * Iceberg numeric types (`integer`, `long`, `float`, `double`, `decimal`) 
support promotion during writes. e.g. You can write Spark types `short`, 
`byte`, `integer`, `long` to Iceberg type `long`.
+    * You can write to Iceberg `fixed` type using Spark `binary` type. Note 
that assertion on the length will be performed.
+
+#### Iceberg type to Spark type
+
+This type conversion table describes how Iceberg types are converted to the 
Spark types. The conversion applies on reading from Iceberg table via Spark.
+
+| Iceberg                    | Spark                   | Note          |
+|----------------------------|-------------------------|---------------|
+| boolean                    | boolean                 |               |
+| integer                    | integer                 |               |
+| long                       | long                    |               |
+| float                      | float                   |               |
+| double                     | double                  |               |
+| date                       | date                    |               |
+| time                       |                         | Not supported |
+| timestamp with timezone    | timestamp               |               |
+| timestamp without timezone | timestamp_ntz            |               |

Review Comment:
   ```suggestion
   | timestamp without timezone | timestamp_ntz           |               |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add doc fixes for 1.5.0 [iceberg]

Reply via email to