scovich commented on code in PR #9276:
URL: https://github.com/apache/arrow-rs/pull/9276#discussion_r2742959809


##########
parquet-variant/src/path.rs:
##########
@@ -69,11 +69,21 @@ use crate::utils::parse_path;
 /// # use parquet_variant::{VariantPath, VariantPathElement};
 /// /// You can access the paths using slices
 /// // access the field "foo" and then the first element in a variant list 
value
-/// let path = VariantPath::from("foo")
+/// let path = VariantPath::try_from("foo").unwrap()
 ///   .join("bar")
 ///   .join("baz");
 /// assert_eq!(path[1], VariantPathElement::field("bar"));
 /// ```
+///
+/// # Example: Accessing filed with bracket

Review Comment:
   ```suggestion
   /// # Example: Accessing field with bracket
   ```



##########
parquet-variant/src/path.rs:
##########
@@ -69,11 +69,21 @@ use crate::utils::parse_path;
 /// # use parquet_variant::{VariantPath, VariantPathElement};
 /// /// You can access the paths using slices
 /// // access the field "foo" and then the first element in a variant list 
value
-/// let path = VariantPath::from("foo")
+/// let path = VariantPath::try_from("foo").unwrap()
 ///   .join("bar")
 ///   .join("baz");
 /// assert_eq!(path[1], VariantPathElement::field("bar"));
 /// ```
+///
+/// # Example: Accessing filed with bracket
+/// ```
+/// # use parquet_variant::{VariantPath, VariantPathElement};
+/// let path = VariantPath::try_from("a[b.c].d[2]").unwrap();

Review Comment:
   Questions:
   1. How does one distinguish between an array index `2` and a field that 
happens to be named `"2"`? 
   2. What are the rules for brackets vs. periods? is `.foo` shorthand for 
`['foo']`?
   
   If we follow [JSONpath 
spec](https://www.rfc-editor.org/rfc/rfc9535#name-summary) then:
   1. Array indexing is `[2]` and field access is `['2']`
   2. `.foo` _IS_ shorthand for `['foo']`, with [specific 
rules](https://www.rfc-editor.org/rfc/rfc9535#name-syntax-8) for what 
characters can appear in the shorthand form (the regexp 
`[_a-zA-Z][_a-zA-Z0-9]*` matches a subset of the allowable shorthand names)
   
   



##########
parquet-variant/src/path.rs:
##########
@@ -112,9 +128,11 @@ impl<'a> From<Vec<VariantPathElement<'a>>> for 
VariantPath<'a> {
 }
 
 /// Create from &str with support for dot notation
-impl<'a> From<&'a str> for VariantPath<'a> {

Review Comment:
   Agree this should disappear (arguably shouldn't have ever existed in the 
first place, but hindsight mumble mumble).



##########
parquet-variant/src/path.rs:
##########
@@ -103,6 +113,12 @@ impl<'a> VariantPath<'a> {
     pub fn is_empty(&self) -> bool {
         self.0.is_empty()
     }
+
+    /// Parses a path string, panics on invalid input.
+    /// Only use for tests for known-valid input.
+    pub fn from_str_unchecked(s: &'a str) -> Self {

Review Comment:
   There is definitely mixed messaging on what "unchecked" and "checked" mean 
in rust. For example, we have a contradiction of sorts in primitive integer 
operations:
   * `+` panics on overflow
   * 
[checked_add](https://doc.rust-lang.org/std/primitive.i32.html#method.checked_add)
 returns None on overflow
   * 
[unchecked_add](https://doc.rust-lang.org/std/primitive.i32.html#method.unchecked_add)
 is `unsafe` and overflow is UB.
   
   String operations are similar:
   * 
[as_ascii](https://doc.rust-lang.org/std/primitive.str.html#method.as_ascii) 
returns None on failure
   * 
[as_ascii_unchecked](https://doc.rust-lang.org/std/primitive.str.html#method.as_ascii_unchecked)
 is UB on invalid input
   * 
[split_at](https://doc.rust-lang.org/std/primitive.str.html#method.split_at) 
panics on failure
   * 
[split_at_checked](https://doc.rust-lang.org/std/primitive.str.html#method.split_at_checked)
 returns None on failure.
   
   Overall, it seems like
   * `foo_checked` is a fallible version of `foo`
      * only in cases where `foo` is infallible and panics on invalid input
      * if `foo` is already fallible, no `foo_checked` will exist
   * `foo_unchecked` is an unsafe version of `foo`
     * invalid input is UB
     * `foo` may be fallible or may panic
   
   So yes, this method should be called just `from_str` with a doc comment that 
it panics (use `TryFrom` if you want to handle errors gracefully).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to