Jefffrey commented on code in PR #18363:
URL: https://github.com/apache/datafusion/pull/18363#discussion_r2480004545
##########
datafusion/sqllogictest/test_files/array.slt:
##########
@@ -7737,6 +7737,12 @@ select flatten(arrow_cast(make_array(1, 2, 1, 3, 2),
'FixedSizeList(5, Int64)'))
----
[1, 2, 1, 3, 2] [1, 2, 3, NULL, 4, NULL, 5] [[1.1], [2.2], [3.3], [4.4]]
+query ??
+select flatten(arrow_cast(make_array([1], [2, 3], [null], make_array(4, null,
5)), 'FixedSizeList(4, LargeList(Int64))')),
+ flatten(arrow_cast(make_array([[1.1], [2.2]], [[3.3], [4.4]]),
'List(LargeList(FixedSizeList(1, Float64)))'));
+----
+[1, 2, 3, NULL, 4, NULL, 5] [[1.1], [2.2], [3.3], [4.4]]
Review Comment:
I think should also add a check for the output type via `arrow_typeof()` to
ensure we are indeed getting a large list
##########
datafusion/functions-nested/src/flatten.rs:
##########
@@ -179,7 +208,7 @@ pub fn flatten_inner(args: &[ArrayRef]) -> Result<ArrayRef>
{
Ok(Arc::new(flattened_array) as ArrayRef)
}
LargeList(_) => {
- let (inner_field, inner_offsets, inner_values, nulls) =
+ let (inner_field, inner_offsets, inner_values, _) = // _
instead of nulls?
Review Comment:
This is interesting; so this might be a bug in current behaviour? Do you
think you can mock up a test where it fails on main without this fix?
##########
datafusion/functions-nested/src/flatten.rs:
##########
@@ -216,17 +245,47 @@ fn get_offsets_for_flatten<O: OffsetSizeTrait>(
// Create new large offsets that are equivalent to `flatten` the array.
fn get_large_offsets_for_flatten<O: OffsetSizeTrait, P: OffsetSizeTrait>(
- offsets: OffsetBuffer<O>,
- indexes: OffsetBuffer<P>,
+ inner_offsets: OffsetBuffer<O>,
+ outer_offsets: OffsetBuffer<P>,
) -> OffsetBuffer<i64> {
- let buffer = offsets.into_inner();
- let offsets: Vec<i64> = indexes
+ let buffer = inner_offsets.into_inner();
+ let offsets: Vec<i64> = outer_offsets
.iter()
.map(|i| buffer[i.to_usize().unwrap()].to_i64().unwrap())
.collect();
OffsetBuffer::new(offsets.into())
}
+// Function for converting LargeList offsets into List offsets
+fn downcast_i64_inner_to_i32(
+ inner_offsets: &OffsetBuffer<i64>,
+ outer_offsets: &OffsetBuffer<i32>,
+) -> Result<OffsetBuffer<i32>, ArrowError> {
+ let buffer = inner_offsets.clone().into_inner();
+ let offsets: Result<Vec<i32>, _> = outer_offsets
+ .iter()
+ .map(|i| buffer[i.to_usize().unwrap()])
+ .map(|i| {
+ i32::try_from(i)
+ .map_err(|_| ArrowError::CastError(format!("Cannot downcast
offset {i}")))
+ })
+ .collect();
+ Ok(OffsetBuffer::new(offsets?.into()))
+}
+
+// In case the conversion fails we convert the outer offsets into i64
+fn keep_offsets_i64(
Review Comment:
I think this function should be renamed (and docstring adjusted) now?
`keep_offsets_i64` is a bit confusing to read for me
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]