LorrensP-2158466 commented on issue #10808:
URL: https://github.com/apache/datafusion/issues/10808#issuecomment-2159043547
So I have found a solution that i can compile, but at this stage I'm not
very happy with it.
I will first try to explain my reasoning as to how i got to my current
solution, it is a bit much, so if you are only interested the final you can
[Jump](#currently) to the end. I did underestimate this problem...
## Explanation
I first tried the solution Andrew mentioned, but it runs into following
problems:
1. The Join Plans and Recursive Query
Both the Join and CrossJoin hold the left and right inputs separately in an
Arc, so we need another variant `Double()` which needs to know if the first
input is retrieved and than the second. This is because you can't transform
both the Arc's into a `&[LogicalPlan]`. This also applies to Recursive Queries
which have 2 inputs.
2. The `Union` node holds a `Vec<Arc<LogicalPlan>>` which you also can't
transform safely into a `&[LogicalPlan]`.
3. The `Extension` node doesn't allow us to know how the inputs are stored,
we have to assume this can have multiple inputs all contained in Arc's, so we
get the same problem as the `Union` Node but a little worse because some
Extension nodes just have 1 input but we treat them as having multiple.
To support all these cases i came up with:
```rust
pub enum InputIterator<'parent> {
Empty(Empty<&'parent LogicalPlan>),
Single(Once<&'parent LogicalPlan>),
Double(DoubleIter<&'parent LogicalPlan>), // DoubleIter<T> =
std::array::IntoIter<T, 2>
Multiple(std::vec::IntoIter<&'parent LogicalPlan>),
}
```
### Built-In LogicalPlans
Most cases can be handled by the `Empty`, `Single` or `Double` variant.
The `Multiple` variant has a `Vec::IntoIter`, so we can handle the `Union`
case easier, this is because we have to transform the `Vec<Arc<LogicalPlan>>`
into a collection of references, which results in us allocating and
transforming it back into a iterator. This also happens in the
`inputs()`function but with just the `collect()` call.
### Extension Plans
To handle the `Extension` nodes i added the following function into the
`UserDefinedLogicalNode`trait:
```rust
fn inputs_iter(&self) -> InputIterator<'_>;
```
The user of this trait can then choose their own iterator, and we only have
to call `extension.node.inputs_iter()`.
### More Problems
But there are still some problems with this (i think):
1. The `Multiple`variant still causes us to allocate in a lot of cases, some
UserDefined plans in examples have their inputs stored as `Vec<LogicalPlan>`,
so we need to do `inputs.iter().collect().into_iter()`.
We could add another Iterator type:
```rust
Slice(slice::IntoIter<&'parent LogicalPlan>)
```
which is an slice iterator, so the above would be
`inputs.as_slice().into_iter()`. But adding another variant that still says "i
have multiple inputs" does not look nice.
2. Some Nodes, like `Union` hold their inputs in `Vec<Arc<LogicalPlan>>`
which also causes us to transform and collect this into a `Vec<&LogicalPlan>`
to turn this into a iterator the `Multiple` variant accepts.
### Solutions
To fix this i can split up the `Multiple` variant into 2 new variants:
```rust
pub enum InputIterator{
// ...
Slice(slice::IntoIter<&LogicalPlan>),
FromArcs(Map<Iter<'_, Arc<LogicalPlan>>, AsRef::as_ref>), // maps
Arc<LogicalPlan> into &LogicalPlan
}
```
This does sum up to a total of 5 different iterator types, but i don't know
how i can cover every possible way of holding onto multiple inputs. For example
if node implementation holds their inputs in some other collection (like
`BTreeMap`, `HashMap`, ...) their respective `Iter` implementations have
different types, so if someone wants to return a `InputIterator` they are
forced to also keep their inputs in a `Vec<LogicalPlan>` or
`Vec<Arc<LogicalPlan>>`.
### Currently <a name="currently"></a>
So currently the `InputIterator` looks like this...
```rust
pub enum InputIterator<'parent> {
Empty(Empty<&'parent LogicalPlan>),
Single(Once<&'parent LogicalPlan>),
Double(DoubleIter<&'parent LogicalPlan>),
Slice(SliceIter<'parent, LogicalPlan>),
FromArcs(FromArcsIter<'parent, LogicalPlan>),
}
```
I have made a macro which let's me "delegate" a call to this iterator to any
of the inner iterators, e.g. `fn next()`:
```rust
fn next(&mut self) -> Option<Self::Item> {
delegate!(self, iter => iter.next())
}
```
Things to do:
- find a way to accept many more IteratorTypes, (maybe just have a fallback
in the form of `Vec::into_iter` or `dyn Iterator`).
- Minimize Iter variants while still keeping the implementation clear, maybe
we can collapse the `Empty`, `Single` and `Double` into one variant, but this
does cause more checks.
If you guys really think this can be helpful i can open up a PR so you can
look at some details, but it is maybe worthwhile to just wait until Rust allows
us to return multiple types in a `-> impl Trait` function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]