scovich commented on code in PR #9276:
URL: https://github.com/apache/arrow-rs/pull/9276#discussion_r2742959809
##########
parquet-variant/src/path.rs:
##########
@@ -69,11 +69,21 @@ use crate::utils::parse_path;
/// # use parquet_variant::{VariantPath, VariantPathElement};
/// /// You can access the paths using slices
/// // access the field "foo" and then the first element in a variant list
value
-/// let path = VariantPath::from("foo")
+/// let path = VariantPath::try_from("foo").unwrap()
/// .join("bar")
/// .join("baz");
/// assert_eq!(path[1], VariantPathElement::field("bar"));
/// ```
+///
+/// # Example: Accessing filed with bracket
Review Comment:
```suggestion
/// # Example: Accessing field with bracket
```
##########
parquet-variant/src/path.rs:
##########
@@ -69,11 +69,21 @@ use crate::utils::parse_path;
/// # use parquet_variant::{VariantPath, VariantPathElement};
/// /// You can access the paths using slices
/// // access the field "foo" and then the first element in a variant list
value
-/// let path = VariantPath::from("foo")
+/// let path = VariantPath::try_from("foo").unwrap()
/// .join("bar")
/// .join("baz");
/// assert_eq!(path[1], VariantPathElement::field("bar"));
/// ```
+///
+/// # Example: Accessing filed with bracket
+/// ```
+/// # use parquet_variant::{VariantPath, VariantPathElement};
+/// let path = VariantPath::try_from("a[b.c].d[2]").unwrap();
Review Comment:
Questions:
1. How does one distinguish between an array index `2` and a field that
happens to be named `"2"`?
2. What are the rules for brackets vs. periods? is `.foo` shorthand for
`['foo']`?
If we follow [JSONpath
spec](https://www.rfc-editor.org/rfc/rfc9535#name-summary) then:
1. Array indexing is `[2]` and field access is `['2']`
2. `.foo` _IS_ shorthand for `['foo']`, with [specific
rules](https://www.rfc-editor.org/rfc/rfc9535#name-syntax-8) for what
characters can appear in the shorthand form (the regexp
`[_a-zA-Z][_a-zA-Z0-9]*` matches a subset of the allowable shorthand names)
##########
parquet-variant/src/path.rs:
##########
@@ -112,9 +128,11 @@ impl<'a> From<Vec<VariantPathElement<'a>>> for
VariantPath<'a> {
}
/// Create from &str with support for dot notation
-impl<'a> From<&'a str> for VariantPath<'a> {
Review Comment:
Agree this should disappear (arguably shouldn't have ever existed in the
first place, but hindsight mumble mumble).
##########
parquet-variant/src/path.rs:
##########
@@ -103,6 +113,12 @@ impl<'a> VariantPath<'a> {
pub fn is_empty(&self) -> bool {
self.0.is_empty()
}
+
+ /// Parses a path string, panics on invalid input.
+ /// Only use for tests for known-valid input.
+ pub fn from_str_unchecked(s: &'a str) -> Self {
Review Comment:
There is definitely mixed messaging on what "unchecked" and "checked" mean
in rust. For example, we have a contradiction of sorts in primitive integer
operations:
* `+` panics on overflow
*
[checked_add](https://doc.rust-lang.org/std/primitive.i32.html#method.checked_add)
returns None on overflow
*
[unchecked_add](https://doc.rust-lang.org/std/primitive.i32.html#method.unchecked_add)
is `unsafe` and overflow is UB.
String operations are similar:
*
[as_ascii](https://doc.rust-lang.org/std/primitive.str.html#method.as_ascii)
returns None on failure
*
[as_ascii_unchecked](https://doc.rust-lang.org/std/primitive.str.html#method.as_ascii_unchecked)
is UB on invalid input
*
[split_at](https://doc.rust-lang.org/std/primitive.str.html#method.split_at)
panics on failure
*
[split_at_checked](https://doc.rust-lang.org/std/primitive.str.html#method.split_at_checked)
returns None on failure.
Overall, it seems like
* `foo_checked` is a fallible version of `foo`
* only in cases where `foo` is infallible and panics on invalid input
* if `foo` is already fallible, no `foo_checked` will exist
* `foo_unchecked` is an unsafe version of `foo`
* invalid input is UB
* `foo` may be fallible or may panic
So yes, this method should be called just `from_str` with a doc comment that
it panics (use `TryFrom` if you want to handle errors gracefully).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]