vibhatha commented on a change in pull request #12033:
URL: https://github.com/apache/arrow/pull/12033#discussion_r781950741



##########
File path: docs/source/cpp/streaming_execution.rst
##########
@@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make 
multiple scan
 nodes from that dataset. (Useful for a self-join, for example.)
 Note that producing two scan nodes like this will perform all
 reads and decodes twice.
+
+Constructing ``ExecNode`` using Options
+=======================================
+
+Using the execution plan we can construct various queries. 
+To construct such queries, we have provided a set of building blocks
+referred to as :class:`ExecNode` s. These nodes provide the ability to  
+construct operations like filtering, projection, join, etc. 
+
+This is the list of operations associated with the execution plan;
+
+.. list-table:: Operations and Options
+   :widths: 50 50
+   :header-rows: 1
+
+   * - Operation
+     - Options
+   * - ``source``
+     - :class:`arrow::compute::SourceNodeOptions`
+   * - ``filter``
+     - :class:`arrow::compute::FilterNodeOptions`
+   * - ``project``
+     - :class:`arrow::compute::ProjectNodeOptions`
+   * - ``aggregate``
+     - :class:`arrow::compute::ScalarAggregateOptions`
+   * - ``sink``
+     - :class:`arrow::compute::SinkNodeOptions`
+   * - ``consuming_sink``
+     - :class:`arrow::compute::ConsumingSinkNodeOptions`
+   * - ``order_by_sink``
+     - :class:`arrow::compute::OrderBySinkNodeOptions`
+   * - ``select_k_sink``
+     - :class:`arrow::compute::SelectKSinkNodeOptions`
+   * - ``scan``
+     - :class:`arrow::compute::ScanNodeOptions` 
+   * - ``hash_join``
+     - :class:`arrow::compute::HashJoinNodeOptions`
+   * - ``write``
+     - :class:`arrow::dataset::WriteNodeOptions`
+   * - ``union``
+     - N/A
+
+
+.. _stream_execution_source_docs:
+
+``source``
+----------
+
+`source` operation can be considered as an entry point to create a streaming 
execution plan. 
+A source node can be constructed as follows.
+:class:`arrow::compute::SourceNodeOptions` are used to create the ``source`` 
operation. 
+The :class:`Schema` of the data passing through and a function to generate 
data 
+``arrow::AsyncGenerator<arrow::util::optional<cp::ExecBatch>>``
+are required to create this option. Additionally, when using `source` 
operator, 
+the data scanning operations like filter and project may need to be applied
+in a later part of the execution plan. 
+
+Struct to hold the data generator definition;
+
+.. literalinclude:: 
../../../cpp/examples/arrow/execution_plan_documentation_examples.cc
+  :language: cpp
+  :start-after: (Doc section: BatchesWithSchema Definition)
+  :end-before: (Doc section: BatchesWithSchema Definition)
+  :linenos:
+  :lineno-match:
+
+Generating Batches for computation;
+
+.. literalinclude:: 
../../../cpp/examples/arrow/execution_plan_documentation_examples.cc
+  :language: cpp
+  :start-after: (Doc section: MakeBasicBatches Definition)
+  :end-before: (Doc section: MakeBasicBatches Definition)
+  :linenos:
+  :lineno-match:
+
+Example of using ``source`` (usage of sink is explained in detail in 
:ref:`sink<stream_execution_sink_docs>`);
+
+.. literalinclude:: 
../../../cpp/examples/arrow/execution_plan_documentation_examples.cc
+  :language: cpp
+  :start-after: (Doc section: Source Example)
+  :end-before: (Doc section: Source Example)
+  :linenos:
+  :lineno-match:
+
+.. _stream_execution_filter_docs:
+
+``filter``
+----------
+
+``filter`` operation as the name suggests, provides an option to define a data 
filtering
+criteria. It keeps only rows matching a given expression. 
+Filters can be written using :class:`arrow::compute::Expression`. 
+For example, if we wish to keep rows of column ``b`` greater than 3, 
+then we can use the following expression::, can be written using 
+:class:`arrow::compute::FilterNodeOptions` as follows::

Review comment:
       neat 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to