vibhatha commented on issue #39496: URL: https://github.com/apache/arrow/issues/39496#issuecomment-1880474193
> Is there a well-formed guide (code works too) of how to register it with substrait? There is no explicit guideline or set of rules to add a new functionality. Although there are a few examples and some documentation that could help to get started. There is a basic example of using Arrow and Substrait [here](https://github.com/apache/arrow/blob/main/cpp/examples/arrow/engine_substrait_consumption.cc). This example could give a brief idea on how to use Substrait with Arrow. To understand adding or support new functionalities, the basic step would be to start with Substrait, check the spec and see how the translation units can be updated to support those features. We don't have a thorough development guideline for this, but the existing PRs and source could give a good idea. The C++ source for Substrait engine can be found [here](https://github.com/apache/arrow/tree/main/cpp/src/arrow/engine/substrait), and usage with Acero is documented [here](https://arrow.apache.org/docs/cpp/acero/substrait.html#acero-substrait). To go deeper into the integration components, I would suggest it would be better to checkout the test cases in [C++ source for Acerso/Substrait integration](https://github.com/apache/arrow/blob/main/cpp/src/arrow/engine/substrait/serde_test.cc). There are a few things to think about, Substrait representation of entities is very generic and while implementing extensions it would be best to evaluate the engine limitations and custom features it may require to add the functionality. The engines doesn't have universally the same configuration. For adding new relational algebra operators or add sub features for existing Relational algebra operators, it would be best to evaluate the [relation_internal.cc](https://github.com/apache/arrow/blob/main/cpp/src/arrow/engine/substrait/relation_internal.cc). Likewise there are other components specifically defined to add such features. Mainly what needs to be done is mapping the engine spec to Substrait spec via the protobuf generated classes. The existing PRs could help to get an idea in how to add a new feature. Also community threads and a discussion in here can also be helpful. Hope this helps to get an initial idea on how to approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
