This is an automated email from the ASF dual-hosted git repository.
mbrobbel pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git
The following commit(s) were added to refs/heads/main by this push:
new cdbbbf7509 Improve `Display` for `DataType` and `Field` (#8290)
cdbbbf7509 is described below
commit cdbbbf7509d617552158c633c02f46dcf2eea766
Author: Emil Ernerfeldt <[email protected]>
AuthorDate: Tue Sep 23 16:35:41 2025 +0200
Improve `Display` for `DataType` and `Field` (#8290)
This is part of an attempt to improve the error reporting of `arrow-rs`,
`datafusion`, and any other 3rd party crates.
I believe that error messages should be as readable as possible. Aim for
`rustc` more than `gcc`.
Here's an example of how this PR improves some existing error messages:
Before:
> Casting from Map(Field { name: "entries", data_type: Struct([Field {
name: "key", data_type: Utf8, nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, Field { name: "value",
data_type: Interval(DayTime), nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, false) to Map(Field { name:
"entries", data_type: Struct([Field { name: "key", data_type: Utf8,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "value", data_type: Duration(Second), nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, true) not supported
After:
> Casting from Map(Field { "entries": Struct(key Utf8, value nullable
Interval(DayTime)) }, false) to Map(Field { "entries": Struct(key Utf8,
value Duration(Second)) }, true) not supported
# Which issue does this PR close?
- Closes #7048
- Continues and closes #7051
- Continues https://github.com/apache/arrow-rs/pull/7469
- More improvements coming in
https://github.com/apache/arrow-rs/pull/8291
- Sibling PR: https://github.com/apache/datafusion/pull/17565
- Part of https://github.com/apache/arrow-rs/issues/8351
# Rationale for this change
DataType:s are often shown in error messages. Making these error
messages readable is _very important_.
# What changes are included in this PR?
## ~Unify `Debug` and `Display`~
~The `Display` and `Debug` of `DataType` are now the SAME.~
~Why? Both are frequently used in error messages (both in arrow, and
`datafusion`), and both benefit from being readable yet reversible.~
Reverted based on PR feedback. I will try to improve the `Debug`
formatting in a future PR, with clever use of
https://doc.rust-lang.org/std/fmt/struct.Formatter.html#method.debug_struct
## Improve `Display` of lists
Improve the `Display` formatting of
* `DataType::List`
* `DataType::LargeList`
* `DataType::FixedSizeList`
**Before**: `List(Field { name: \"item\", data_type: Int32, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} })`
**After**: `List(nullable Int32)`
**Before**: `FixedSizeList(Field { name: \"item\", data_type: Int32,
nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 5)`
**After**: `FixedSizeList(5 x Int32)`
### Better formatting of `DataType::Struct`
The formatting of `Struct` is now reversible, including nullability and
metadata.
- Continues https://github.com/apache/arrow-rs/pull/7469
### ~Improve `Debug` format of `Field`~
~Best understood with this diff for an existing test:~
<img width="1140" height="499" alt="Screenshot 2025-09-07 at 18 30 44"
src="https://github.com/user-attachments/assets/794b4de9-8459-4ee7-82d2-8f5ae248614c"
/>
**EDIT**: reverted
# Are these changes tested?
Yes - new tests cover them
# Are there any user-facing changes?
`Display/to_string` has changed, and so this is a **BREAKING CHANGE**.
Care has been taken that the formatting contains all necessary
information (i.e. is reversible), though the actual `FromStr`
implementation is still not written (it is missing on `main`, and
missing in this PR - so no change).
----
Let me know if I went to far… or not far enough 😆
---------
Co-authored-by: irenjj <[email protected]>
---
arrow-array/src/array/fixed_size_list_array.rs | 2 +-
arrow-array/src/array/mod.rs | 6 +-
arrow-array/src/array/primitive_array.rs | 6 +-
arrow-array/src/builder/mod.rs | 8 +-
arrow-array/src/builder/struct_builder.rs | 6 +-
arrow-array/src/ffi.rs | 22 +--
arrow-array/src/record_batch.rs | 4 +-
arrow-cast/src/cast/dictionary.rs | 4 +-
arrow-cast/src/cast/mod.rs | 52 ++---
arrow-csv/src/reader/mod.rs | 2 +-
arrow-data/src/transform/run.rs | 4 +-
arrow-integration-test/src/lib.rs | 8 +-
arrow-json/src/lib.rs | 2 +-
arrow-ord/src/sort.rs | 2 +-
arrow-row/src/list.rs | 2 +-
arrow-schema/src/datatype.rs | 24 +--
arrow-schema/src/datatype_display.rs | 247 ++++++++++++++++++++++++
arrow-schema/src/datatype_parse.rs | 19 +-
arrow-schema/src/field.rs | 35 +++-
arrow-schema/src/lib.rs | 1 +
arrow-schema/src/schema.rs | 15 +-
arrow/src/util/data_gen.rs | 6 +-
parquet-variant-compute/src/arrow_to_variant.rs | 4 +-
parquet-variant-compute/src/variant_array.rs | 2 +-
parquet/benches/arrow_reader_row_filter.rs | 2 +-
parquet/src/arrow/arrow_reader/mod.rs | 10 +-
parquet/src/arrow/arrow_writer/mod.rs | 2 +-
parquet/src/arrow/buffer/view_buffer.rs | 2 +-
parquet/src/basic.rs | 4 +-
29 files changed, 387 insertions(+), 116 deletions(-)
diff --git a/arrow-array/src/array/fixed_size_list_array.rs
b/arrow-array/src/array/fixed_size_list_array.rs
index 4a338591e5..1214708710 100644
--- a/arrow-array/src/array/fixed_size_list_array.rs
+++ b/arrow-array/src/array/fixed_size_list_array.rs
@@ -350,7 +350,7 @@ impl From<ArrayData> for FixedSizeListArray {
let value_length = match data.data_type() {
DataType::FixedSizeList(_, len) => *len,
data_type => {
- panic!("FixedSizeListArray data should contain a FixedSizeList
data type, got {data_type:?}")
+ panic!("FixedSizeListArray data should contain a FixedSizeList
data type, got {data_type}")
}
};
diff --git a/arrow-array/src/array/mod.rs b/arrow-array/src/array/mod.rs
index 5fdfb9fb22..b5ba32745a 100644
--- a/arrow-array/src/array/mod.rs
+++ b/arrow-array/src/array/mod.rs
@@ -824,20 +824,20 @@ pub fn make_array(data: ArrayData) -> ArrayRef {
DataType::UInt16 =>
Arc::new(DictionaryArray::<UInt16Type>::from(data)) as ArrayRef,
DataType::UInt32 =>
Arc::new(DictionaryArray::<UInt32Type>::from(data)) as ArrayRef,
DataType::UInt64 =>
Arc::new(DictionaryArray::<UInt64Type>::from(data)) as ArrayRef,
- dt => panic!("Unexpected dictionary key type {dt:?}"),
+ dt => panic!("Unexpected dictionary key type {dt}"),
},
DataType::RunEndEncoded(ref run_ends_type, _) => match
run_ends_type.data_type() {
DataType::Int16 => Arc::new(RunArray::<Int16Type>::from(data)) as
ArrayRef,
DataType::Int32 => Arc::new(RunArray::<Int32Type>::from(data)) as
ArrayRef,
DataType::Int64 => Arc::new(RunArray::<Int64Type>::from(data)) as
ArrayRef,
- dt => panic!("Unexpected data type for run_ends array {dt:?}"),
+ dt => panic!("Unexpected data type for run_ends array {dt}"),
},
DataType::Null => Arc::new(NullArray::from(data)) as ArrayRef,
DataType::Decimal32(_, _) => Arc::new(Decimal32Array::from(data)) as
ArrayRef,
DataType::Decimal64(_, _) => Arc::new(Decimal64Array::from(data)) as
ArrayRef,
DataType::Decimal128(_, _) => Arc::new(Decimal128Array::from(data)) as
ArrayRef,
DataType::Decimal256(_, _) => Arc::new(Decimal256Array::from(data)) as
ArrayRef,
- dt => panic!("Unexpected data type {dt:?}"),
+ dt => panic!("Unexpected data type {dt}"),
}
}
diff --git a/arrow-array/src/array/primitive_array.rs
b/arrow-array/src/array/primitive_array.rs
index 42594e7a12..d23f413152 100644
--- a/arrow-array/src/array/primitive_array.rs
+++ b/arrow-array/src/array/primitive_array.rs
@@ -1290,7 +1290,7 @@ impl<T: ArrowPrimitiveType> std::fmt::Debug for
PrimitiveArray<T> {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
let data_type = self.data_type();
- write!(f, "PrimitiveArray<{data_type:?}>\n[\n")?;
+ write!(f, "PrimitiveArray<{data_type}>\n[\n")?;
print_long_array(self, f, |array, index, f| match data_type {
DataType::Date32 | DataType::Date64 => {
let v = self.value(index).to_i64().unwrap();
@@ -1299,7 +1299,7 @@ impl<T: ArrowPrimitiveType> std::fmt::Debug for
PrimitiveArray<T> {
None => {
write!(
f,
- "Cast error: Failed to convert {v} to temporal for
{data_type:?}"
+ "Cast error: Failed to convert {v} to temporal for
{data_type}"
)
}
}
@@ -1311,7 +1311,7 @@ impl<T: ArrowPrimitiveType> std::fmt::Debug for
PrimitiveArray<T> {
None => {
write!(
f,
- "Cast error: Failed to convert {v} to temporal for
{data_type:?}"
+ "Cast error: Failed to convert {v} to temporal for
{data_type}"
)
}
}
diff --git a/arrow-array/src/builder/mod.rs b/arrow-array/src/builder/mod.rs
index ea9c98f9b6..91e29957fc 100644
--- a/arrow-array/src/builder/mod.rs
+++ b/arrow-array/src/builder/mod.rs
@@ -567,7 +567,7 @@ pub fn make_builder(datatype: &DataType, capacity: usize)
-> Box<dyn ArrayBuilde
.with_values_field(fields[1].clone()),
)
}
- t => panic!("The field of Map data type {t:?} should have a child
Struct field"),
+ t => panic!("The field of Map data type {t} should have a child
Struct field"),
},
DataType::Struct(fields) =>
Box::new(StructBuilder::from_fields(fields.clone(), capacity)),
t @ DataType::Dictionary(key_type, value_type) => {
@@ -594,7 +594,7 @@ pub fn make_builder(datatype: &DataType, capacity: usize)
-> Box<dyn ArrayBuilde
LargeBinaryDictionaryBuilder::with_capacity(capacity, 256, 1024);
Box::new(dict_builder)
}
- t => panic!("Dictionary value type {t:?} is not
currently supported"),
+ t => panic!("Dictionary value type {t} is not
currently supported"),
}
};
}
@@ -604,10 +604,10 @@ pub fn make_builder(datatype: &DataType, capacity: usize)
-> Box<dyn ArrayBuilde
DataType::Int32 => dict_builder!(Int32Type),
DataType::Int64 => dict_builder!(Int64Type),
_ => {
- panic!("Data type {t:?} with key type {key_type:?} is not
currently supported")
+ panic!("Data type {t} with key type {key_type} is not
currently supported")
}
}
}
- t => panic!("Data type {t:?} is not currently supported"),
+ t => panic!("Data type {t} is not currently supported"),
}
}
diff --git a/arrow-array/src/builder/struct_builder.rs
b/arrow-array/src/builder/struct_builder.rs
index 3afee5863f..d5109ec192 100644
--- a/arrow-array/src/builder/struct_builder.rs
+++ b/arrow-array/src/builder/struct_builder.rs
@@ -62,7 +62,7 @@ use std::sync::Arc;
///
/// // We can't obtain the ListBuilder<StructBuilder> with the expected
generic types, because under the hood
/// // the StructBuilder was returned as a Box<dyn ArrayBuilder> and passed
as such to the ListBuilder constructor
-///
+///
/// // This panics in runtime, even though we know that the builder is a
ListBuilder<StructBuilder>.
/// // let sb = col_struct_builder
/// // .field_builder::<ListBuilder<StructBuilder>>(0)
@@ -267,7 +267,7 @@ impl StructBuilder {
let schema = builder.finish();
panic!("{}", format!(
- "StructBuilder ({:?}) and field_builder with index {}
({:?}) are of unequal lengths: ({} != {}).",
+ "StructBuilder ({}) and field_builder with index {} ({})
are of unequal lengths: ({} != {}).",
schema,
idx,
self.fields[idx].data_type(),
@@ -648,7 +648,7 @@ mod tests {
#[test]
#[should_panic(
- expected = "StructBuilder (Schema { fields: [Field { name: \"f1\",
data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }, Field { name: \"f2\", data_type: Boolean, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }) and
field_builder with index 1 (Boolean) are of unequal lengths: (2 != 1)."
+ expected = "StructBuilder (Field { \"f1\": Int32 }, Field { \"f2\":
Boolean }) and field_builder with index 1 (Boolean) are of unequal lengths: (2
!= 1)."
)]
fn test_struct_array_builder_unequal_field_builders_lengths() {
let mut int_builder = Int32Builder::with_capacity(10);
diff --git a/arrow-array/src/ffi.rs b/arrow-array/src/ffi.rs
index 83eaa3d654..218f729434 100644
--- a/arrow-array/src/ffi.rs
+++ b/arrow-array/src/ffi.rs
@@ -146,11 +146,11 @@ fn bit_width(data_type: &DataType, i: usize) ->
Result<usize> {
if let Some(primitive) = data_type.primitive_width() {
return match i {
0 => Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" doesn't expect buffer at index
0. Please verify that the C data interface is correctly implemented."
+ "The datatype \"{data_type}\" doesn't expect buffer at index
0. Please verify that the C data interface is correctly implemented."
))),
1 => Ok(primitive * 8),
i => Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" expects 2 buffers, but
requested {i}. Please verify that the C data interface is correctly
implemented."
+ "The datatype \"{data_type}\" expects 2 buffers, but requested
{i}. Please verify that the C data interface is correctly implemented."
))),
};
}
@@ -159,7 +159,7 @@ fn bit_width(data_type: &DataType, i: usize) ->
Result<usize> {
(DataType::Boolean, 1) => 1,
(DataType::Boolean, _) => {
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" expects 2 buffers, but
requested {i}. Please verify that the C data interface is correctly
implemented."
+ "The datatype \"{data_type}\" expects 2 buffers, but requested
{i}. Please verify that the C data interface is correctly implemented."
)))
}
(DataType::FixedSizeBinary(num_bytes), 1) => *num_bytes as usize *
u8::BITS as usize,
@@ -169,7 +169,7 @@ fn bit_width(data_type: &DataType, i: usize) ->
Result<usize> {
},
(DataType::FixedSizeBinary(_), _) | (DataType::FixedSizeList(_, _), _)
=> {
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" expects 2 buffers, but
requested {i}. Please verify that the C data interface is correctly
implemented."
+ "The datatype \"{data_type}\" expects 2 buffers, but requested
{i}. Please verify that the C data interface is correctly implemented."
)))
},
// Variable-size list and map have one i32 buffer.
@@ -179,12 +179,12 @@ fn bit_width(data_type: &DataType, i: usize) ->
Result<usize> {
(DataType::Utf8, 2) | (DataType::Binary, 2) => u8::BITS as _,
(DataType::List(_), _) | (DataType::Map(_, _), _) => {
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" expects 2 buffers, but
requested {i}. Please verify that the C data interface is correctly
implemented."
+ "The datatype \"{data_type}\" expects 2 buffers, but requested
{i}. Please verify that the C data interface is correctly implemented."
)))
}
(DataType::Utf8, _) | (DataType::Binary, _) => {
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" expects 3 buffers, but
requested {i}. Please verify that the C data interface is correctly
implemented."
+ "The datatype \"{data_type}\" expects 3 buffers, but requested
{i}. Please verify that the C data interface is correctly implemented."
)))
}
// Variable-sized binaries: have two buffers.
@@ -193,7 +193,7 @@ fn bit_width(data_type: &DataType, i: usize) ->
Result<usize> {
(DataType::LargeUtf8, 2) | (DataType::LargeBinary, 2) |
(DataType::LargeList(_), 2)=> u8::BITS as _,
(DataType::LargeUtf8, _) | (DataType::LargeBinary, _) |
(DataType::LargeList(_), _)=> {
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" expects 3 buffers, but
requested {i}. Please verify that the C data interface is correctly
implemented."
+ "The datatype \"{data_type}\" expects 3 buffers, but requested
{i}. Please verify that the C data interface is correctly implemented."
)))
}
// Variable-sized views: have 3 or more buffers.
@@ -209,24 +209,24 @@ fn bit_width(data_type: &DataType, i: usize) ->
Result<usize> {
(DataType::Union(_, UnionMode::Dense), 1) => i32::BITS as _,
(DataType::Union(_, UnionMode::Sparse), _) => {
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" expects 1 buffer, but
requested {i}. Please verify that the C data interface is correctly
implemented."
+ "The datatype \"{data_type}\" expects 1 buffer, but requested
{i}. Please verify that the C data interface is correctly implemented."
)))
}
(DataType::Union(_, UnionMode::Dense), _) => {
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" expects 2 buffer, but
requested {i}. Please verify that the C data interface is correctly
implemented."
+ "The datatype \"{data_type}\" expects 2 buffer, but requested
{i}. Please verify that the C data interface is correctly implemented."
)))
}
(_, 0) => {
// We don't call this `bit_width` to compute buffer length for
null buffer. If any types that don't have null buffer like
// UnionArray, they should be handled above.
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" doesn't expect buffer at index
0. Please verify that the C data interface is correctly implemented."
+ "The datatype \"{data_type}\" doesn't expect buffer at index
0. Please verify that the C data interface is correctly implemented."
)))
}
_ => {
return Err(ArrowError::CDataInterface(format!(
- "The datatype \"{data_type:?}\" is still not supported in Rust
implementation"
+ "The datatype \"{data_type}\" is still not supported in Rust
implementation"
)))
}
})
diff --git a/arrow-array/src/record_batch.rs b/arrow-array/src/record_batch.rs
index c1023b7390..aeeafe5dd9 100644
--- a/arrow-array/src/record_batch.rs
+++ b/arrow-array/src/record_batch.rs
@@ -360,7 +360,7 @@ impl RecordBatch {
if let Some((i, (col_type, field_type))) = not_match {
return Err(ArrowError::InvalidArgumentError(format!(
- "column types must match schema types, expected {field_type:?}
but found {col_type:?} at column index {i}")));
+ "column types must match schema types, expected {field_type}
but found {col_type} at column index {i}")));
}
Ok(RecordBatch {
@@ -422,7 +422,7 @@ impl RecordBatch {
/// // Insert a key-value pair into the metadata
/// batch.schema_metadata_mut().insert("key".into(), "value".into());
/// assert_eq!(batch.schema().metadata().get("key"),
Some(&String::from("value")));
- /// ```
+ /// ```
pub fn schema_metadata_mut(&mut self) -> &mut
std::collections::HashMap<String, String> {
let schema = Arc::make_mut(&mut self.schema);
&mut schema.metadata
diff --git a/arrow-cast/src/cast/dictionary.rs
b/arrow-cast/src/cast/dictionary.rs
index 43a67a7d9a..c213ac2662 100644
--- a/arrow-cast/src/cast/dictionary.rs
+++ b/arrow-cast/src/cast/dictionary.rs
@@ -78,7 +78,7 @@ pub(crate) fn dictionary_cast<K: ArrowDictionaryKeyType>(
UInt64 => Arc::new(DictionaryArray::<UInt64Type>::from(data)),
_ => {
return Err(ArrowError::CastError(format!(
- "Unsupported type {to_index_type:?} for dictionary
index"
+ "Unsupported type {to_index_type} for dictionary index"
)));
}
};
@@ -313,7 +313,7 @@ pub(crate) fn cast_to_dictionary<K: ArrowDictionaryKeyType>(
pack_byte_to_fixed_size_dictionary::<K>(array, cast_options,
byte_size)
}
_ => Err(ArrowError::CastError(format!(
- "Unsupported output type for dictionary packing:
{dict_value_type:?}"
+ "Unsupported output type for dictionary packing: {dict_value_type}"
))),
}
}
diff --git a/arrow-cast/src/cast/mod.rs b/arrow-cast/src/cast/mod.rs
index 72b2de99bd..fd43fefe62 100644
--- a/arrow-cast/src/cast/mod.rs
+++ b/arrow-cast/src/cast/mod.rs
@@ -798,7 +798,7 @@ pub fn cast_with_options(
UInt32 => dictionary_cast::<UInt32Type>(array, to_type,
cast_options),
UInt64 => dictionary_cast::<UInt64Type>(array, to_type,
cast_options),
_ => Err(ArrowError::CastError(format!(
- "Casting from dictionary type {from_type:?} to {to_type:?} not
supported",
+ "Casting from dictionary type {from_type} to {to_type} not
supported",
))),
},
(_, Dictionary(index_type, value_type)) => match **index_type {
@@ -811,7 +811,7 @@ pub fn cast_with_options(
UInt32 => cast_to_dictionary::<UInt32Type>(array, value_type,
cast_options),
UInt64 => cast_to_dictionary::<UInt64Type>(array, value_type,
cast_options),
_ => Err(ArrowError::CastError(format!(
- "Casting from type {from_type:?} to dictionary type
{to_type:?} not supported",
+ "Casting from type {from_type} to dictionary type {to_type}
not supported",
))),
},
(List(_), List(to)) => cast_list_values::<i32>(array, to,
cast_options),
@@ -1143,10 +1143,10 @@ pub fn cast_with_options(
Ok(Arc::new(array) as ArrayRef)
}
(Struct(_), _) => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported"
+ "Casting from {from_type} to {to_type} not supported"
))),
(_, Struct(_)) => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported"
+ "Casting from {from_type} to {to_type} not supported"
))),
(_, Boolean) => match from_type {
UInt8 => cast_numeric_to_bool::<UInt8Type>(array),
@@ -1164,7 +1164,7 @@ pub fn cast_with_options(
Utf8 => cast_utf8_to_boolean::<i32>(array, cast_options),
LargeUtf8 => cast_utf8_to_boolean::<i64>(array, cast_options),
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
},
(Boolean, _) => match to_type {
@@ -1183,7 +1183,7 @@ pub fn cast_with_options(
Utf8 => value_to_string::<i32>(array, cast_options),
LargeUtf8 => value_to_string::<i64>(array, cast_options),
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
},
(Utf8, _) => match to_type {
@@ -1245,7 +1245,7 @@ pub fn cast_with_options(
cast_string_to_month_day_nano_interval::<i32>(array,
cast_options)
}
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
},
(Utf8View, _) => match to_type {
@@ -1296,7 +1296,7 @@ pub fn cast_with_options(
cast_view_to_month_day_nano_interval(array, cast_options)
}
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
},
(LargeUtf8, _) => match to_type {
@@ -1362,7 +1362,7 @@ pub fn cast_with_options(
cast_string_to_month_day_nano_interval::<i64>(array,
cast_options)
}
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
},
(Binary, _) => match to_type {
@@ -1380,7 +1380,7 @@ pub fn cast_with_options(
cast_binary_to_string::<i32>(array,
cast_options)?.as_string::<i32>(),
))),
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
},
(LargeBinary, _) => match to_type {
@@ -1399,7 +1399,7 @@ pub fn cast_with_options(
Ok(Arc::new(StringViewArray::from(array.as_string::<i64>())))
}
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
},
(FixedSizeBinary(size), _) => match to_type {
@@ -1407,7 +1407,7 @@ pub fn cast_with_options(
LargeBinary => cast_fixed_size_binary_to_binary::<i64>(array,
*size),
BinaryView => cast_fixed_size_binary_to_binary_view(array, *size),
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
},
(BinaryView, Binary) => cast_view_to_byte::<BinaryViewType,
GenericBinaryType<i32>>(array),
@@ -1426,7 +1426,7 @@ pub fn cast_with_options(
Ok(Arc::new(array.as_binary_view().clone().to_string_view()?) as
ArrayRef)
}
(BinaryView, _) => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
(from_type, Utf8View) if from_type.is_primitive() => {
value_to_string_view(array, cast_options)
@@ -2160,7 +2160,7 @@ pub fn cast_with_options(
cast_reinterpret_arrays::<Int32Type, IntervalYearMonthType>(array)
}
(_, _) => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported",
+ "Casting from {from_type} to {to_type} not supported",
))),
}
}
@@ -2201,7 +2201,7 @@ where
LargeUtf8 => value_to_string::<i64>(array, cast_options),
Null => Ok(new_null_array(to_type, array.len())),
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported"
+ "Casting from {from_type} to {to_type} not supported"
))),
}
}
@@ -2304,7 +2304,7 @@ where
LargeUtf8 => cast_string_to_decimal::<D, i64>(array, *precision,
*scale, cast_options),
Null => Ok(new_null_array(to_type, array.len())),
_ => Err(ArrowError::CastError(format!(
- "Casting from {from_type:?} to {to_type:?} not supported"
+ "Casting from {from_type} to {to_type} not supported"
))),
}
}
@@ -8648,8 +8648,12 @@ mod tests {
let new_array_result = cast(&array, &new_type.clone());
assert!(!can_cast_types(array.data_type(), &new_type));
- assert!(
- matches!(new_array_result, Err(ArrowError::CastError(t)) if t ==
r#"Casting from Map(Field { name: "entries", data_type: Struct([Field { name:
"key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }, Field { name: "value", data_type: Utf8, nullable: true,
dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id:
0, dict_is_ordered: false, metadata: {} }, false) to Map(Field { name:
"entries", data_type: Struct([Field { [...]
+ let Err(ArrowError::CastError(t)) = new_array_result else {
+ panic!();
+ };
+ assert_eq!(
+ t,
+ r#"Casting from Map(Field { "entries": Struct(key Utf8, value
nullable Utf8) }, false) to Map(Field { "entries": Struct(key Utf8, value Utf8)
}, true) not supported"#
);
}
@@ -8695,8 +8699,12 @@ mod tests {
let new_array_result = cast(&array, &new_type.clone());
assert!(!can_cast_types(array.data_type(), &new_type));
- assert!(
- matches!(new_array_result, Err(ArrowError::CastError(t)) if t ==
r#"Casting from Map(Field { name: "entries", data_type: Struct([Field { name:
"key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }, Field { name: "value", data_type: Interval(DayTime), nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, false) to Map(Field { name:
"entries", data_type: St [...]
+ let Err(ArrowError::CastError(t)) = new_array_result else {
+ panic!();
+ };
+ assert_eq!(
+ t,
+ r#"Casting from Map(Field { "entries": Struct(key Utf8, value
nullable Interval(DayTime)) }, false) to Map(Field { "entries": Struct(key
Utf8, value Duration(Second)) }, true) not supported"#
);
}
@@ -10788,7 +10796,7 @@ mod tests {
let to_type = DataType::Utf8;
let result = cast(&struct_array, &to_type);
assert_eq!(
- r#"Cast error: Casting from Struct([Field { name: "a", data_type:
Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }])
to Utf8 not supported"#,
+ r#"Cast error: Casting from Struct(a Boolean) to Utf8 not
supported"#,
result.unwrap_err().to_string()
);
}
@@ -10799,7 +10807,7 @@ mod tests {
let to_type = DataType::Struct(vec![Field::new("a", DataType::Boolean,
false)].into());
let result = cast(&array, &to_type);
assert_eq!(
- r#"Cast error: Casting from Utf8 to Struct([Field { name: "a",
data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false,
metadata: {} }]) not supported"#,
+ r#"Cast error: Casting from Utf8 to Struct(a Boolean) not
supported"#,
result.unwrap_err().to_string()
);
}
diff --git a/arrow-csv/src/reader/mod.rs b/arrow-csv/src/reader/mod.rs
index 7b69df51b5..d1fc4eb350 100644
--- a/arrow-csv/src/reader/mod.rs
+++ b/arrow-csv/src/reader/mod.rs
@@ -860,7 +860,7 @@ fn parse(
.collect::<DictionaryArray<UInt64Type>>(),
) as ArrayRef),
_ => Err(ArrowError::ParseError(format!(
- "Unsupported dictionary key type {key_type:?}"
+ "Unsupported dictionary key type {key_type}"
))),
}
}
diff --git a/arrow-data/src/transform/run.rs b/arrow-data/src/transform/run.rs
index f962a50098..af0b9e640c 100644
--- a/arrow-data/src/transform/run.rs
+++ b/arrow-data/src/transform/run.rs
@@ -75,7 +75,7 @@ pub fn extend_nulls(mutable: &mut _MutableArrayData, len:
usize) {
DataType::Int16 => extend_nulls_impl!(i16),
DataType::Int32 => extend_nulls_impl!(i32),
DataType::Int64 => extend_nulls_impl!(i64),
- _ => panic!("Invalid run end type for RunEndEncoded array:
{run_end_type:?}"),
+ _ => panic!("Invalid run end type for RunEndEncoded array:
{run_end_type}"),
};
mutable.child_data[0].data.len += 1;
@@ -225,7 +225,7 @@ pub fn build_extend(array: &ArrayData) -> Extend<'_> {
DataType::Int16 => build_and_process_impl!(i16),
DataType::Int32 => build_and_process_impl!(i32),
DataType::Int64 => build_and_process_impl!(i64),
- _ => panic!("Invalid run end type for RunEndEncoded array:
{dest_run_end_type:?}",),
+ _ => panic!("Invalid run end type for RunEndEncoded array:
{dest_run_end_type}",),
}
},
)
diff --git a/arrow-integration-test/src/lib.rs
b/arrow-integration-test/src/lib.rs
index 177a1c47f3..1f4c4bd4bd 100644
--- a/arrow-integration-test/src/lib.rs
+++ b/arrow-integration-test/src/lib.rs
@@ -794,13 +794,13 @@ pub fn array_from_json(
DataType::Dictionary(key_type, value_type) => {
#[allow(deprecated)]
let dict_id = field.dict_id().ok_or_else(|| {
- ArrowError::JsonError(format!("Unable to find dict_id for
field {field:?}"))
+ ArrowError::JsonError(format!("Unable to find dict_id for
field {field}"))
})?;
// find dictionary
let dictionary = dictionaries
.ok_or_else(|| {
ArrowError::JsonError(format!(
- "Unable to find any dictionaries for field {field:?}"
+ "Unable to find any dictionaries for field {field}"
))
})?
.get(&dict_id);
@@ -814,7 +814,7 @@ pub fn array_from_json(
dictionaries,
),
None => Err(ArrowError::JsonError(format!(
- "Unable to find dictionary for field {field:?}"
+ "Unable to find dictionary for field {field}"
))),
}
}
@@ -946,7 +946,7 @@ pub fn array_from_json(
Ok(Arc::new(array))
}
t => Err(ArrowError::JsonError(format!(
- "data type {t:?} not supported"
+ "data type {t} not supported"
))),
}
}
diff --git a/arrow-json/src/lib.rs b/arrow-json/src/lib.rs
index 6d7ab4400b..5a5430fef9 100644
--- a/arrow-json/src/lib.rs
+++ b/arrow-json/src/lib.rs
@@ -87,7 +87,7 @@ use serde_json::{Number, Value};
///
/// This enum controls which form(s) the Reader will accept and which form the
/// Writer will produce. For example, if the RecordBatch Schema is
-/// `[("a", Int32), ("r", Struct([("b", Boolean), ("c", Utf8)]))]`
+/// `[("a", Int32), ("r", Struct(b Boolean, c Utf8))]`
/// then a Reader with [`StructMode::ObjectOnly`] would read rows of the form
/// `{"a": 1, "r": {"b": true, "c": "cat"}}` while with
['StructMode::ListOnly']
/// would read rows of the form `[1, [true, "cat"]]`. A Writer would produce
diff --git a/arrow-ord/src/sort.rs b/arrow-ord/src/sort.rs
index 21e8d18593..bbf6391a39 100644
--- a/arrow-ord/src/sort.rs
+++ b/arrow-ord/src/sort.rs
@@ -304,7 +304,7 @@ pub fn sort_to_indices(
},
t => {
return Err(ArrowError::ComputeError(format!(
- "Sort not supported for data type {t:?}"
+ "Sort not supported for data type {t}"
)));
}
})
diff --git a/arrow-row/src/list.rs b/arrow-row/src/list.rs
index 72d93d2f4b..43b4e3b4f2 100644
--- a/arrow-row/src/list.rs
+++ b/arrow-row/src/list.rs
@@ -278,7 +278,7 @@ pub unsafe fn decode_fixed_size_list(
DataType::FixedSizeList(element_field, _) => element_field.data_type(),
_ => {
return Err(ArrowError::InvalidArgumentError(format!(
- "Expected FixedSizeListArray, found: {list_type:?}",
+ "Expected FixedSizeListArray, found: {list_type}",
)))
}
};
diff --git a/arrow-schema/src/datatype.rs b/arrow-schema/src/datatype.rs
index 08b3b4cd3c..32bce33474 100644
--- a/arrow-schema/src/datatype.rs
+++ b/arrow-schema/src/datatype.rs
@@ -15,7 +15,6 @@
// specific language governing permissions and limitations
// under the License.
-use std::fmt;
use std::str::FromStr;
use std::sync::Arc;
@@ -92,7 +91,7 @@ use crate::{ArrowError, Field, FieldRef, Fields, UnionFields};
///
/// [`Schema.fbs`]: https://github.com/apache/arrow/blob/main/format/Schema.fbs
/// [the physical memory layout of Apache Arrow]:
https://arrow.apache.org/docs/format/Columnar.html#physical-memory-layout
-#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
+#[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub enum DataType {
/// Null type
@@ -484,27 +483,6 @@ pub enum UnionMode {
Dense,
}
-impl fmt::Display for DataType {
- fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
- match &self {
- DataType::Struct(fields) => {
- write!(f, "Struct(")?;
- if !fields.is_empty() {
- let fields_str = fields
- .iter()
- .map(|f| format!("{} {}", f.name(), f.data_type()))
- .collect::<Vec<_>>()
- .join(", ");
- write!(f, "{fields_str}")?;
- }
- write!(f, ")")?;
- Ok(())
- }
- _ => write!(f, "{self:?}"),
- }
- }
-}
-
/// Parses `str` into a `DataType`.
///
/// This is the reverse of [`DataType`]'s `Display`
diff --git a/arrow-schema/src/datatype_display.rs
b/arrow-schema/src/datatype_display.rs
new file mode 100644
index 0000000000..e1bd86cba0
--- /dev/null
+++ b/arrow-schema/src/datatype_display.rs
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::{collections::HashMap, fmt};
+
+use crate::DataType;
+
+impl fmt::Display for DataType {
+ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+ fn format_metadata(metadata: &HashMap<String, String>) -> String {
+ if metadata.is_empty() {
+ String::new()
+ } else {
+ format!(", metadata: {metadata:?}")
+ }
+ }
+
+ // A lot of these can still be improved a lot.
+ // _Some_ of these can be parsed with `FromStr`, but not all (YET!).
+ // The goal is that the formatting should always be
+ // * Terse and teadable
+ // * Reversible (contain all necessary information to reverse it
perfectly)
+
+ match &self {
+ Self::Null => write!(f, "Null"),
+ Self::Boolean => write!(f, "Boolean"),
+ Self::Int8 => write!(f, "Int8"),
+ Self::Int16 => write!(f, "Int16"),
+ Self::Int32 => write!(f, "Int32"),
+ Self::Int64 => write!(f, "Int64"),
+ Self::UInt8 => write!(f, "UInt8"),
+ Self::UInt16 => write!(f, "UInt16"),
+ Self::UInt32 => write!(f, "UInt32"),
+ Self::UInt64 => write!(f, "UInt64"),
+ Self::Float16 => write!(f, "Float16"),
+ Self::Float32 => write!(f, "Float32"),
+ Self::Float64 => write!(f, "Float64"),
+ Self::Timestamp(time_unit, timezone) => {
+ write!(f, "Timestamp({time_unit:?}, {timezone:?})")
+ }
+ Self::Date32 => write!(f, "Date32"),
+ Self::Date64 => write!(f, "Date64"),
+ Self::Time32(time_unit) => write!(f, "Time32({time_unit:?})"),
+ Self::Time64(time_unit) => write!(f, "Time64({time_unit:?})"),
+ Self::Duration(time_unit) => write!(f, "Duration({time_unit:?})"),
+ Self::Interval(interval_unit) => write!(f,
"Interval({interval_unit:?})"),
+ Self::Binary => write!(f, "Binary"),
+ Self::FixedSizeBinary(bytes_per_value) => {
+ write!(f, "FixedSizeBinary({bytes_per_value:?})")
+ }
+ Self::LargeBinary => write!(f, "LargeBinary"),
+ Self::BinaryView => write!(f, "BinaryView"),
+ Self::Utf8 => write!(f, "Utf8"),
+ Self::LargeUtf8 => write!(f, "LargeUtf8"),
+ Self::Utf8View => write!(f, "Utf8View"),
+ Self::ListView(field) => write!(f, "ListView({field})"), // TODO:
make more readable
+ Self::LargeListView(field) => write!(f, "LargeListView({field})"),
// TODO: make more readable
+ Self::List(field) | Self::LargeList(field) => {
+ let type_name = if matches!(self, Self::List(_)) {
+ "List"
+ } else {
+ "LargeList"
+ };
+
+ let name = field.name();
+ let maybe_nullable = if field.is_nullable() { "nullable " }
else { "" };
+ let data_type = field.data_type();
+ let field_name_str = if name == "item" {
+ String::default()
+ } else {
+ format!(", field: '{name}'")
+ };
+ let metadata_str = format_metadata(field.metadata());
+
+ // e.g. `LargeList(nullable Uint32)
+ write!(
+ f,
+
"{type_name}({maybe_nullable}{data_type}{field_name_str}{metadata_str})"
+ )
+ }
+ Self::FixedSizeList(field, size) => {
+ let name = field.name();
+ let maybe_nullable = if field.is_nullable() { "nullable " }
else { "" };
+ let data_type = field.data_type();
+ let field_name_str = if name == "item" {
+ String::default()
+ } else {
+ format!(", field: '{name}'")
+ };
+ let metadata_str = format_metadata(field.metadata());
+
+ write!(
+ f,
+ "FixedSizeList({size} x
{maybe_nullable}{data_type}{field_name_str}{metadata_str})",
+ )
+ }
+ Self::Struct(fields) => {
+ write!(f, "Struct(")?;
+ if !fields.is_empty() {
+ let fields_str = fields
+ .iter()
+ .map(|field| {
+ let name = field.name();
+ let maybe_nullable = if field.is_nullable() {
"nullable " } else { "" };
+ let data_type = field.data_type();
+ let metadata_str =
format_metadata(field.metadata());
+ format!("{name}
{maybe_nullable}{data_type}{metadata_str}")
+ })
+ .collect::<Vec<_>>()
+ .join(", ");
+ write!(f, "{fields_str}")?;
+ }
+ write!(f, ")")?;
+ Ok(())
+ }
+ Self::Union(union_fields, union_mode) => {
+ write!(f, "Union({union_fields:?}, {union_mode:?})")
+ }
+ Self::Dictionary(data_type, data_type1) => {
+ write!(f, "Dictionary({data_type}, {data_type1:?})")
+ }
+ Self::Decimal32(precision, scale) => write!(f,
"Decimal32({precision:?}, {scale:?})"),
+ Self::Decimal64(precision, scale) => write!(f,
"Decimal64({precision:?}, {scale:?})"),
+ Self::Decimal128(precision, scale) => write!(f,
"Decimal128({precision:?}, {scale:?})"),
+ Self::Decimal256(precision, scale) => write!(f,
"Decimal256({precision:?}, {scale:?})"),
+ Self::Map(field, keys_are_sorted) => write!(f, "Map({field},
{keys_are_sorted:?})"),
+ Self::RunEndEncoded(run_ends_field, values_field) => {
+ write!(f, "RunEndEncoded({run_ends_field}, {values_field})")
+ }
+ }
+ }
+}
+
+#[cfg(test)]
+mod tests {
+
+ use std::sync::Arc;
+
+ use crate::Field;
+
+ use super::*;
+
+ #[test]
+ fn test_display_list() {
+ let list_data_type =
DataType::List(Arc::new(Field::new_list_field(DataType::Int32, true)));
+ let list_data_type_string = list_data_type.to_string();
+ let expected_string = "List(nullable Int32)";
+ assert_eq!(list_data_type_string, expected_string);
+ }
+
+ #[test]
+ fn test_display_list_with_named_field() {
+ let list_data_type = DataType::List(Arc::new(Field::new("foo",
DataType::UInt64, false)));
+ let list_data_type_string = list_data_type.to_string();
+ let expected_string = "List(UInt64, field: 'foo')";
+ assert_eq!(list_data_type_string, expected_string);
+ }
+
+ #[test]
+ fn test_display_nested_list() {
+ let nested_data_type = DataType::List(Arc::new(Field::new_list_field(
+ DataType::List(Arc::new(Field::new_list_field(DataType::UInt64,
false))),
+ false,
+ )));
+ let nested_data_type_string = nested_data_type.to_string();
+ let nested_expected_string = "List(List(UInt64))";
+ assert_eq!(nested_data_type_string, nested_expected_string);
+ }
+
+ #[test]
+ fn test_display_list_with_metadata() {
+ let mut field = Field::new_list_field(DataType::Int32, true);
+ let metadata = HashMap::from([("foo1".to_string(),
"value1".to_string())]);
+ field.set_metadata(metadata);
+ let list_data_type = DataType::List(Arc::new(field));
+ let list_data_type_string = list_data_type.to_string();
+ let expected_string = "List(nullable Int32, metadata: {\"foo1\":
\"value1\"})";
+
+ assert_eq!(list_data_type_string, expected_string);
+ }
+
+ #[test]
+ fn test_display_large_list() {
+ let large_list_data_type =
+
DataType::LargeList(Arc::new(Field::new_list_field(DataType::Int32, true)));
+ let large_list_data_type_string = large_list_data_type.to_string();
+ let expected_string = "LargeList(nullable Int32)";
+ assert_eq!(large_list_data_type_string, expected_string);
+
+ // Test with named field
+ let large_list_named =
+ DataType::LargeList(Arc::new(Field::new("bar", DataType::UInt64,
false)));
+ let large_list_named_string = large_list_named.to_string();
+ let expected_named_string = "LargeList(UInt64, field: 'bar')";
+ assert_eq!(large_list_named_string, expected_named_string);
+
+ // Test with metadata
+ let mut field = Field::new_list_field(DataType::Int32, true);
+ let metadata = HashMap::from([("key1".to_string(),
"value1".to_string())]);
+ field.set_metadata(metadata);
+ let large_list_metadata = DataType::LargeList(Arc::new(field));
+ let large_list_metadata_string = large_list_metadata.to_string();
+ let expected_metadata_string =
+ "LargeList(nullable Int32, metadata: {\"key1\": \"value1\"})";
+ assert_eq!(large_list_metadata_string, expected_metadata_string);
+ }
+
+ #[test]
+ fn test_display_fixed_size_list() {
+ let fixed_size_list =
+
DataType::FixedSizeList(Arc::new(Field::new_list_field(DataType::Int32, true)),
5);
+ let fixed_size_list_string = fixed_size_list.to_string();
+ let expected_string = "FixedSizeList(5 x nullable Int32)";
+ assert_eq!(fixed_size_list_string, expected_string);
+
+ // Test with named field
+ let fixed_size_named =
+ DataType::FixedSizeList(Arc::new(Field::new("baz",
DataType::UInt64, false)), 3);
+ let fixed_size_named_string = fixed_size_named.to_string();
+ let expected_named_string = "FixedSizeList(3 x UInt64, field: 'baz')";
+ assert_eq!(fixed_size_named_string, expected_named_string);
+
+ // Test with metadata
+ let mut field = Field::new_list_field(DataType::Int32, true);
+ let metadata = HashMap::from([("key2".to_string(),
"value2".to_string())]);
+ field.set_metadata(metadata);
+ let fixed_size_metadata = DataType::FixedSizeList(Arc::new(field), 4);
+ let fixed_size_metadata_string = fixed_size_metadata.to_string();
+ let expected_metadata_string =
+ "FixedSizeList(4 x nullable Int32, metadata: {\"key2\":
\"value2\"})";
+ assert_eq!(fixed_size_metadata_string, expected_metadata_string);
+ }
+}
diff --git a/arrow-schema/src/datatype_parse.rs
b/arrow-schema/src/datatype_parse.rs
index 7e71d53ccb..8b48ecd17f 100644
--- a/arrow-schema/src/datatype_parse.rs
+++ b/arrow-schema/src/datatype_parse.rs
@@ -38,14 +38,14 @@ fn make_error_expected(val: &str, expected: &Token, actual:
&Token) -> ArrowErro
/// Implementation of `parse_data_type`, modeled after
<https://github.com/sqlparser-rs/sqlparser-rs>
struct Parser<'a> {
val: &'a str,
- tokenizer: Tokenizer<'a>,
+ tokenizer: Peekable<Tokenizer<'a>>,
}
impl<'a> Parser<'a> {
fn new(val: &'a str) -> Self {
Self {
val,
- tokenizer: Tokenizer::new(val),
+ tokenizer: Tokenizer::new(val).peekable(),
}
}
@@ -345,8 +345,12 @@ impl<'a> Parser<'a> {
))
}
};
+ let nullable = self
+ .tokenizer
+ .next_if(|next| matches!(next, Ok(Token::Nullable)))
+ .is_some();
let field_type = self.parse_next_type()?;
- fields.push(Arc::new(Field::new(field_name, field_type, true)));
+ fields.push(Arc::new(Field::new(field_name, field_type,
nullable)));
match self.next_token()? {
Token::Comma => continue,
Token::RParen => break,
@@ -551,7 +555,10 @@ impl<'a> Tokenizer<'a> {
"Some" => Token::Some,
"None" => Token::None,
+ "nullable" => Token::Nullable,
+
"Struct" => Token::Struct,
+
// If we don't recognize the word, treat it as a field name
word => Token::FieldName(word.to_string()),
};
@@ -618,6 +625,7 @@ enum Token {
LargeList,
FixedSizeList,
Struct,
+ Nullable,
FieldName(String),
}
@@ -649,6 +657,7 @@ impl Display for Token {
Token::Integer(v) => write!(f, "Integer({v})"),
Token::DoubleQuotedString(s) => write!(f,
"DoubleQuotedString({s})"),
Token::Struct => write!(f, "Struct"),
+ Token::Nullable => write!(f, "nullable"),
Token::FieldName(s) => write!(f, "FieldName({s})"),
}
}
@@ -670,7 +679,7 @@ mod test {
/// verifying it is the same
fn round_trip(data_type: DataType) {
let data_type_string = data_type.to_string();
- println!("Input '{data_type_string}' ({data_type:?})");
+ println!("Input '{data_type_string}' ({data_type})");
let parsed_type = parse_data_type(&data_type_string).unwrap();
assert_eq!(
data_type, parsed_type,
@@ -817,7 +826,7 @@ mod test {
];
for (data_type_string, expected_data_type) in cases {
- println!("Parsing '{data_type_string}', expecting
'{expected_data_type:?}'");
+ println!("Parsing '{data_type_string}', expecting
'{expected_data_type}'");
let parsed_data_type = parse_data_type(data_type_string).unwrap();
assert_eq!(parsed_data_type, expected_data_type);
}
diff --git a/arrow-schema/src/field.rs b/arrow-schema/src/field.rs
index 3beae35795..8017fa81b5 100644
--- a/arrow-schema/src/field.rs
+++ b/arrow-schema/src/field.rs
@@ -44,7 +44,7 @@ pub type FieldRef = Arc<Field>;
///
/// Arrow Extension types, are encoded in `Field`s metadata. See
/// [`Self::try_extension_type`] to retrieve the [`ExtensionType`], if any.
-#[derive(Debug, Clone)]
+#[derive(Clone, Debug)]
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
pub struct Field {
name: String,
@@ -860,10 +860,37 @@ impl Field {
}
}
-// TODO: improve display with crate https://crates.io/crates/derive_more ?
impl std::fmt::Display for Field {
- fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
- write!(f, "{self:?}")
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ #![expect(deprecated)] // Must still print dict_id, if set
+ let Self {
+ name,
+ data_type,
+ nullable,
+ dict_id,
+ dict_is_ordered,
+ metadata,
+ } = self;
+ let maybe_nullable = if *nullable { "nullable " } else { "" };
+ let metadata_str = if metadata.is_empty() {
+ String::new()
+ } else {
+ format!(", metadata: {metadata:?}")
+ };
+ let dict_id_str = if dict_id == &0 {
+ String::new()
+ } else {
+ format!(", dict_id: {dict_id}")
+ };
+ let dict_is_ordered_str = if *dict_is_ordered {
+ ", dict_is_ordered"
+ } else {
+ ""
+ };
+ write!(
+ f,
+ "Field {{ {name:?}:
{maybe_nullable}{data_type}{dict_id_str}{dict_is_ordered_str}{metadata_str} }}"
+ )
}
}
diff --git a/arrow-schema/src/lib.rs b/arrow-schema/src/lib.rs
index d1befbd04f..785f2f5516 100644
--- a/arrow-schema/src/lib.rs
+++ b/arrow-schema/src/lib.rs
@@ -28,6 +28,7 @@ mod datatype;
pub use datatype::*;
use std::fmt::Display;
+mod datatype_display;
mod datatype_parse;
mod error;
pub use error::*;
diff --git a/arrow-schema/src/schema.rs b/arrow-schema/src/schema.rs
index 1e4fefbc28..dcb1b6183b 100644
--- a/arrow-schema/src/schema.rs
+++ b/arrow-schema/src/schema.rs
@@ -697,14 +697,13 @@ mod tests {
#[test]
fn create_schema_string() {
let schema = person_schema();
- assert_eq!(schema.to_string(),
- "Field { name: \"first_name\", data_type: Utf8, nullable:
false, dict_id: 0, dict_is_ordered: false, metadata: {\"k\": \"v\"} }, \
- Field { name: \"last_name\", data_type: Utf8, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, \
- Field { name: \"address\", data_type: Struct([\
- Field { name: \"street\", data_type: Utf8, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }, \
- Field { name: \"zip\", data_type: UInt16, nullable: false,
dict_id: 0, dict_is_ordered: false, metadata: {} }\
- ]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {}
}, \
- Field { name: \"interests\", data_type: Dictionary(Int32, Utf8),
nullable: true, dict_id: 123, dict_is_ordered: true, metadata: {} }")
+ assert_eq!(
+ schema.to_string(),
+ "Field { \"first_name\": Utf8, metadata: {\"k\": \"v\"} }, \
+ Field { \"last_name\": Utf8 }, \
+ Field { \"address\": Struct(street Utf8, zip UInt16) }, \
+ Field { \"interests\": nullable Dictionary(Int32, Utf8), dict_id:
123, dict_is_ordered }"
+ )
}
#[test]
diff --git a/arrow/src/util/data_gen.rs b/arrow/src/util/data_gen.rs
index 7ea05811d5..70af62e6b4 100644
--- a/arrow/src/util/data_gen.rs
+++ b/arrow/src/util/data_gen.rs
@@ -267,7 +267,7 @@ fn create_random_decimal_array(field: &Field, size: usize,
null_density: f32) ->
))
}
_ => Err(ArrowError::InvalidArgumentError(format!(
- "Cannot create decimal array for field {field:?}"
+ "Cannot create decimal array for field {field}"
))),
}
}
@@ -298,7 +298,7 @@ fn create_random_list_array(
}
_ => {
return Err(ArrowError::InvalidArgumentError(format!(
- "Cannot create list array for field {field:?}"
+ "Cannot create list array for field {field}"
)))
}
};
@@ -336,7 +336,7 @@ fn create_random_struct_array(
DataType::Struct(fields) => fields,
_ => {
return Err(ArrowError::InvalidArgumentError(format!(
- "Cannot create struct array for field {field:?}"
+ "Cannot create struct array for field {field}"
)))
}
};
diff --git a/parquet-variant-compute/src/arrow_to_variant.rs
b/parquet-variant-compute/src/arrow_to_variant.rs
index 26713ce8ee..ad8958b7db 100644
--- a/parquet-variant-compute/src/arrow_to_variant.rs
+++ b/parquet-variant-compute/src/arrow_to_variant.rs
@@ -261,14 +261,14 @@ pub(crate) fn make_arrow_to_variant_row_builder<'a>(
}
_ => {
return Err(ArrowError::CastError(format!(
- "Unsupported run ends type: {:?}",
+ "Unsupported run ends type: {}",
run_ends.data_type()
)));
}
},
dt => {
return Err(ArrowError::CastError(format!(
- "Unsupported data type for casting to Variant: {dt:?}",
+ "Unsupported data type for casting to Variant: {dt}",
)));
}
};
diff --git a/parquet-variant-compute/src/variant_array.rs
b/parquet-variant-compute/src/variant_array.rs
index ed4b6fe37e..dbed1a4fbb 100644
--- a/parquet-variant-compute/src/variant_array.rs
+++ b/parquet-variant-compute/src/variant_array.rs
@@ -872,7 +872,7 @@ fn typed_value_to_variant(typed_value: &ArrayRef, index:
usize) -> Variant<'_, '
// https://github.com/apache/arrow-rs/issues/8091
debug_assert!(
false,
- "Unsupported typed_value type: {:?}",
+ "Unsupported typed_value type: {}",
typed_value.data_type()
);
Variant::Null
diff --git a/parquet/benches/arrow_reader_row_filter.rs
b/parquet/benches/arrow_reader_row_filter.rs
index 0ef40ac823..ec403f8fd3 100644
--- a/parquet/benches/arrow_reader_row_filter.rs
+++ b/parquet/benches/arrow_reader_row_filter.rs
@@ -461,7 +461,7 @@ fn benchmark_filters_and_projections(c: &mut Criterion) {
let projection_mask = ProjectionMask::roots(schema_descr,
output_projection.clone());
let pred_mask = ProjectionMask::roots(schema_descr,
filter_col.clone());
- let benchmark_name = format!("{filter_type:?}/{proj_case}",);
+ let benchmark_name = format!("{filter_type}/{proj_case}",);
// run the benchmark for the async reader
let bench_id = BenchmarkId::new(benchmark_name.clone(), "async");
diff --git a/parquet/src/arrow/arrow_reader/mod.rs
b/parquet/src/arrow/arrow_reader/mod.rs
index fcb4b63fe7..37ab5c1df9 100644
--- a/parquet/src/arrow/arrow_reader/mod.rs
+++ b/parquet/src/arrow/arrow_reader/mod.rs
@@ -638,7 +638,7 @@ impl ArrowReaderMetadata {
for (field1, field2) in field_iter {
if field1.data_type() != field2.data_type() {
errors.push(format!(
- "data type mismatch for field {}: requested {:?} but found
{:?}",
+ "data type mismatch for field {}: requested {} but found
{}",
field1.name(),
field1.data_type(),
field2.data_type()
@@ -3185,7 +3185,7 @@ mod tests {
"Parquet argument error: Parquet error: encountered non
UTF-8 data";
assert!(
err.to_string().contains(expected_err),
- "data type: {data_type:?}, expected: {expected_err}, got:
{err}"
+ "data type: {data_type}, expected: {expected_err}, got:
{err}"
);
}
}
@@ -3224,7 +3224,7 @@ mod tests {
"Parquet argument error: Parquet error: encountered non
UTF-8 data";
assert!(
err.to_string().contains(expected_err),
- "data type: {data_type:?}, expected: {expected_err}, got:
{err}"
+ "data type: {data_type}, expected: {expected_err}, got:
{err}"
);
}
}
@@ -3677,8 +3677,8 @@ mod tests {
),
])),
"Arrow: Incompatible supplied Arrow schema: data type mismatch for
field nested: \
- requested Struct([Field { name: \"nested1_valid\", data_type:
Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: \"nested1_invalid\", data_type: Int32, nullable: false, dict_id:
0, dict_is_ordered: false, metadata: {} }]) \
- but found Struct([Field { name: \"nested1_valid\", data_type:
Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: \"nested1_invalid\", data_type: Int64, nullable: false, dict_id:
0, dict_is_ordered: false, metadata: {} }])",
+ requested Struct(nested1_valid Utf8, nested1_invalid Int32) \
+ but found Struct(nested1_valid Utf8, nested1_invalid Int64)",
);
}
diff --git a/parquet/src/arrow/arrow_writer/mod.rs
b/parquet/src/arrow/arrow_writer/mod.rs
index c28ea7f99b..684d5cf747 100644
--- a/parquet/src/arrow/arrow_writer/mod.rs
+++ b/parquet/src/arrow/arrow_writer/mod.rs
@@ -1104,7 +1104,7 @@ impl ArrowColumnWriterFactory {
}
_ => return Err(ParquetError::NYI(
format!(
- "Attempting to write an Arrow type {data_type:?} to
parquet that is not yet implemented"
+ "Attempting to write an Arrow type {data_type} to parquet
that is not yet implemented"
)
))
}
diff --git a/parquet/src/arrow/buffer/view_buffer.rs
b/parquet/src/arrow/buffer/view_buffer.rs
index 97db778e47..9e9b8616c3 100644
--- a/parquet/src/arrow/buffer/view_buffer.rs
+++ b/parquet/src/arrow/buffer/view_buffer.rs
@@ -91,7 +91,7 @@ impl ViewBuffer {
let array = unsafe { builder.build_unchecked() };
make_array(array)
}
- _ => panic!("Unsupported data type: {data_type:?}"),
+ _ => panic!("Unsupported data type: {data_type}"),
}
}
}
diff --git a/parquet/src/basic.rs b/parquet/src/basic.rs
index c1e301136d..2cf5e46fea 100644
--- a/parquet/src/basic.rs
+++ b/parquet/src/basic.rs
@@ -941,7 +941,9 @@ impl From<Option<LogicalType>> for ConvertedType {
(16, false) => ConvertedType::UINT_16,
(32, false) => ConvertedType::UINT_32,
(64, false) => ConvertedType::UINT_64,
- t => panic!("Integer type {t:?} is not supported"),
+ (bit_width, is_signed) => panic!(
+ "Integer type bit_width={bit_width},
signed={is_signed} is not supported"
+ ),
},
LogicalType::Json => ConvertedType::JSON,
LogicalType::Bson => ConvertedType::BSON,