[GitHub] [spark] ueshin commented on a diff in pull request #40692: [SPARK-43055][CONNECT][PYTHON] Support duplicated nested field names

via GitHub Fri, 07 Apr 2023 14:58:49 -0700


ueshin commented on code in PR #40692:
URL: https://github.com/apache/spark/pull/40692#discussion_r1160985748



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -60,13 +61,19 @@ private[sql] class SparkResult[T](
   private def processResponses(stopOnFirstNonEmptyResponse: Boolean): Boolean 
= {
     while (responses.hasNext) {
       val response = responses.next()
+      if (response.hasSchema) {
+        structType =

Review Comment:
   Now that the original schema arrives earlier than arrow batches, we should 
use it if it's available; otherwise fallback to the schema from arrow batch.
   
   Yes, the response schema and arrow schema could be inconsistent in terms of 
the nested field names if there are duplicates, but it's not problem while 
encoder is handling the `ColumnarBatch` as long as the data structure is 
consistent.
   
   Added some comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on a diff in pull request #40692: [SPARK-43055][CONNECT][PYTHON] Support duplicated nested field names

Reply via email to