This is an automated email from the ASF dual-hosted git repository.
blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new 2c638d40b Spec: Clarify truncate transform for strings is based on
code points (#4937)
2c638d40b is described below
commit 2c638d40bcc2200666280ca19fc30baaaedcfefd
Author: emkornfield <[email protected]>
AuthorDate: Sun Jun 5 11:21:34 2022 -0700
Spec: Clarify truncate transform for strings is based on code points (#4937)
---
format/spec.md | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/format/spec.md b/format/spec.md
index 9b64d49a9..41b22ff5a 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -343,12 +343,13 @@ For hash function details by type, see Appendix B.
| **`int`** | `W`, width | `v - (v % W)` remainders must
be positive [1] | `W=10`: `1` → `0`, `-1` → `-10` |
| **`long`** | `W`, width | `v - (v % W)` remainders must
be positive [1] | `W=10`: `1` → `0`, `-1` → `-10` |
| **`decimal`** | `W`, width (no scale) | `scaled_W = decimal(W, scale(v))` `v
- (v % scaled_W)` [1, 2] | `W=50`, `s=2`: `10.65` → `10.50` |
-| **`string`** | `L`, length | Substring of length `L`:
`v.substring(0, L)` | `L=3`: `iceberg` → `ice` |
+| **`string`** | `L`, length | Substring of length `L`:
`v.substring(0, L)` [3] | `L=3`: `iceberg` → `ice` |
Notes:
1. The remainder, `v % W`, must be positive. For languages where `%` can
produce negative values, the correct truncate function is: `v - (((v % W) + W)
% W)`
2. The width, `W`, used to truncate decimal values is applied using the scale
of the decimal column to avoid additional (and potentially conflicting)
parameters.
+3. Strings are truncated to a valid UTF-8 string with no more than `L` code
points.
#### Partition Evolution