Weston Pace created ARROW-17521:
-----------------------------------

             Summary: [Python] Add python bindings for NamedTableProvider for 
Substrait consumer
                 Key: ARROW-17521
                 URL: https://issues.apache.org/jira/browse/ARROW-17521
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Weston Pace


The C++ Substrait consumer currently supports a named table provider to handle 
the NamedTable relation:

{noformat}
using NamedTableProvider =
    std::function<Result<compute::Declaration>(const 
std::vector<std::string>&)>;
static NamedTableProvider kDefaultNamedTableProvider;

/// Options that control the conversion between Substrait and Acero 
representations of a
/// plan.
struct ConversionOptions {
  /// \brief How strictly the converter should adhere to the structure of the 
input.
  ConversionStrictness strictness = ConversionStrictness::BEST_EFFORT;
  /// \brief A custom strategy to be used for providing named tables
  ///
  /// The default behavior will return an invalid status if the plan has any
  /// named table relations.
  NamedTableProvider named_table_provider = kDefaultNamedTableProvider;
};
{noformat}

This is very useful for testing and experimenting as it allows you to provide 
tables from memory (using a table_source node for example).  We should add 
pyarrow bindings.  I don't think they need to expose the full 
compute::DeclarationInfo range of table sources.  A simple approach might be a 
function that, given a list of names, returns either a table, an iterable of 
batches, or a record batch reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to