viirya commented on code in PR #52303:
URL: https://github.com/apache/spark/pull/52303#discussion_r2343321734


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonArrowInput.scala:
##########
@@ -194,3 +180,118 @@ private[python] trait BatchedPythonArrowInput extends 
BasicPythonArrowInput {
     }
   }
 }
+
+object BatchedPythonArrowInput {
+  /**
+   * Split a group into smaller Arrow batches within
+   * a separate and complete Arrow streaming format in order
+   * to work around Arrow 2G limit, see ARROW-4890.
+   *
+   * The return value is the number of rows in the batch.
+   *
+   * Note that `rowIter` here is always grouped batch. One group does not span
+   * multiple groups, see also 
[[org.apache.spark.sql.execution.GroupedIterator]].

Review Comment:
   Hmm, what does "One group does not span multiple groups" mean? Do you mean 
the same group won't appear more than once like `GroupedIterator`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to