amol- commented on a change in pull request #155:
URL: https://github.com/apache/arrow-cookbook/pull/155#discussion_r818672623



##########
File path: python/source/data.rst
##########
@@ -294,6 +294,146 @@ using :meth:`pyarrow.Table.set_column`
     item: [["Potato","Bean","Cucumber","Eggs"]]
     new_amount: [[30,20,15,40]]
 
+Group and Sort a Table
+======================
+
+If you have a table which needs to be grouped by a particular key, 
+you can use :meth:`pyarrow.Table.group_by` followed by an aggregation
+operation :meth:`pyarrow.TableGroupBy.aggregate`.
+
+For example, let’s say we have some data with a particular set of keys
+and values associated with that key. And we want to group the data by 
+those keys and apply an aggregate function like sum to evaluate
+how many items are for each unique key. 
+
+.. testcode::
+
+  import pyarrow as pa
+
+  table = pa.table([
+       pa.array(["a", "a", "b", "b", "c", "d", "e", "c"]),
+       pa.array([11, 20, 3, 4, 5, 1, 4, 10]),
+      ], names=["keys", "values"])
+
+  print(table)
+
+.. testoutput::
+
+    pyarrow.Table
+    keys: string
+    values: int64
+    ----
+    keys: [["a","a","b","b","c","d","e","c"]]
+    values: [[11,20,3,4,5,1,4,10]]
+
+Now we let's apply a groupby operation. Note that a groupby 
+operation returns a :class:`pyarrow.TableGroupBy` object which contains 
+the aggregate operator as :meth:`pyarrow.TableGroupBy.aggregate`. 
+
+.. testcode::
+
+  grouped_table = table.group_by("keys")
+
+  print(type(grouped_table))
+
+.. testoutput::
+
+    <class 'pyarrow.lib.TableGroupBy'>
+
+The output will look something similar to this. Now the table is 
+grouped by the field ``key`` and let's apply the aggregate operation
+``sum`` based on the values in the column ``values``. Note that, an 
+aggregation operation pairs with a column name. 
+
+.. testcode::
+
+  aggregated_table = grouped_table.aggregate([("values", "sum")])
+
+  print(aggregated_table)

Review comment:
       If you want to make sure the reader has a chance to get an explanation 
you can link to the docs ( 
https://arrow.apache.org/docs/python/compute.html#grouped-aggregations ) so 
that the reader can get a proper explanation of how things work, but the 
purpose of the cookbook is not to explain things but to provide an immediately 
usable code snippet to solve the target problem.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to