LorrensP-2158466 commented on issue #10808:
URL: https://github.com/apache/datafusion/issues/10808#issuecomment-2159043547

   So I have found a solution that i can compile, but at this stage I'm not 
very happy with it. 
   I will first try to explain my reasoning as to how i got to my current 
solution, it is a bit much, so if you are only interested the final you can 
[Jump](#currently) to the end. I did underestimate this problem...
   
   ## Explanation
   I first tried the solution Andrew mentioned, but it runs into following 
problems:
   1. The Join Plans and Recursive Query
   Both the Join and CrossJoin hold the left and right inputs separately in an 
Arc, so we need another variant `Double()` which needs to know if the first 
input is retrieved and than the second. This is because you can't transform 
both the Arc's into a `&[LogicalPlan]`. This also applies to Recursive Queries 
which have 2 inputs.
   2. The `Union` node holds a `Vec<Arc<LogicalPlan>>` which you also can't 
transform safely into a `&[LogicalPlan]`.
   3. The `Extension` node doesn't allow us to know how the inputs are stored, 
we have to assume this can have multiple inputs all contained in Arc's, so we 
get the same problem as the `Union` Node but a little worse because some 
Extension nodes just have 1 input but we treat them as having multiple.
   
   To support all these cases i came up with:
   ```rust
   pub enum InputIterator<'parent> {
       Empty(Empty<&'parent LogicalPlan>),
       Single(Once<&'parent LogicalPlan>),
       Double(DoubleIter<&'parent LogicalPlan>), // DoubleIter<T> = 
std::array::IntoIter<T, 2>
       Multiple(std::vec::IntoIter<&'parent LogicalPlan>),
   }
   ```
   ### Built-In LogicalPlans
   Most cases can be handled by the `Empty`, `Single` or `Double` variant.
   The `Multiple` variant has a `Vec::IntoIter`, so we can handle the `Union` 
case easier, this is because we have to transform the `Vec<Arc<LogicalPlan>>` 
into a collection of references, which results in us allocating and 
transforming it back into a iterator. This also happens in the 
`inputs()`function but with just the `collect()` call.
   
   ### Extension Plans
   To handle the `Extension` nodes i added the following function into the 
`UserDefinedLogicalNode`trait:
   ```rust
   fn inputs_iter(&self) -> InputIterator<'_>;
   ```
   The user of this trait can then choose their own iterator, and we only have 
to call `extension.node.inputs_iter()`.
   
   ### More Problems 
   But there are still some problems with this (i think):
   1. The `Multiple`variant still causes us to allocate in a lot of cases, some 
UserDefined plans in examples have their inputs stored as `Vec<LogicalPlan>`, 
so we need to do `inputs.iter().collect().into_iter()`.
   We could add another Iterator type: 
   ```rust
   Slice(slice::IntoIter<&'parent LogicalPlan>)
   ``` 
   which is an slice iterator, so the above would be 
`inputs.as_slice().into_iter()`. But adding another variant that still says "i 
have multiple inputs" does not look nice.
   2. Some Nodes, like `Union` hold their inputs in `Vec<Arc<LogicalPlan>>` 
which also causes us to transform and collect this into a `Vec<&LogicalPlan>` 
to turn this into a iterator the `Multiple` variant accepts.
   
   ### Solutions
   To fix this i can split up the `Multiple` variant into 2 new variants:
   ```rust
   pub enum InputIterator{
       // ...
       Slice(slice::IntoIter<&LogicalPlan>),
       FromArcs(Map<Iter<'_, Arc<LogicalPlan>>, AsRef::as_ref>), // maps 
Arc<LogicalPlan> into &LogicalPlan
   }
   ```
   This does sum up to a total of 5 different iterator types, but i don't know 
how i can cover every possible way of holding onto multiple inputs. For example 
if node implementation holds their inputs in some other collection (like 
`BTreeMap`, `HashMap`, ...) their respective `Iter` implementations have 
different types, so if someone wants to return a `InputIterator` they are 
forced to also keep their inputs in a `Vec<LogicalPlan>` or 
`Vec<Arc<LogicalPlan>>`.
   
   ### Currently <a name="currently"></a>
   
   So currently the `InputIterator` looks like this...
   ```rust
   pub enum InputIterator<'parent> {
       Empty(Empty<&'parent LogicalPlan>),
       Single(Once<&'parent LogicalPlan>),
       Double(DoubleIter<&'parent LogicalPlan>),
       Slice(SliceIter<'parent, LogicalPlan>),
       FromArcs(FromArcsIter<'parent, LogicalPlan>),
   }
   ```
   I have made a macro which let's me "delegate" a call to this iterator to any 
of the inner iterators, e.g. `fn next()`:
   ```rust
   fn next(&mut self) -> Option<Self::Item> {
       delegate!(self, iter => iter.next())
   }
   ```
   Things to do:
   - find a way to accept many more IteratorTypes, (maybe just have a fallback 
in the form of `Vec::into_iter` or `dyn Iterator`).
   - Minimize Iter variants while still keeping the implementation clear, maybe 
we can collapse the `Empty`, `Single` and `Double` into one variant, but this 
does cause more checks.
   
   If you guys really think this can be helpful i can open up a PR so you can 
look at some details, but it is maybe worthwhile to just wait until Rust allows 
us to return multiple types in a `-> impl Trait` function.
   
   
    
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to