[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox


HyukjinKwon commented on code in PR #38659:
URL: https://github.com/apache/spark/pull/38659#discussion_r1028944132


##
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -271,8 +273,12 @@ class SparkConnectPlanner(session: SparkSession) {
   }
 
   private def transformLocalRelation(rel: proto.LocalRelation): LogicalPlan = {
-val attributes = 
rel.getAttributesList.asScala.map(transformAttribute(_)).toSeq
-new org.apache.spark.sql.catalyst.plans.logical.LocalRelation(attributes)
+val (rows, structType) = ArrowConverters.fromBatchWithSchemaIterator(
+  Seq(rel.getData.toByteArray).iterator,

Review Comment:
   Should we use the same protobuf message you added, @zhengruifeng ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox


HyukjinKwon commented on code in PR #38659:
URL: https://github.com/apache/spark/pull/38659#discussion_r1028943446


##
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala:
##
@@ -253,16 +253,94 @@ private[sql] object ArrowConverters extends Logging {
 val vectorLoader = new VectorLoader(root)
 vectorLoader.load(arrowRecordBatch)
 arrowRecordBatch.close()
+vectorSchemaRootToIter(root)
+  }
+}
+  }
+
+  /**
+   * Maps iterator from serialized ArrowRecordBatches to InternalRows. 
Different from
+   * [[fromBatchIterator]], each input arrow batch starts with the schema.
+   */
+  private[sql] def fromBatchWithSchemaIterator(

Review Comment:
   Sorry for late reviews. Can we dedup the logic like 
`ArrowBatchWithSchemaIterator` is doing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox


HyukjinKwon commented on code in PR #38659:
URL: https://github.com/apache/spark/pull/38659#discussion_r1028942760


##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -3257,6 +3257,14 @@ private[spark] object Utils extends Logging {
   case _ => math.max(sortedSize(len / 2), 1)
 }
   }
+
+  def closeAll(closeables: AutoCloseable*): Unit = {

Review Comment:
   I think this is too much to have it as a common util at the core module. 
It's only used twice ..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-21 Thread GitBox


HyukjinKwon commented on code in PR #38659:
URL: https://github.com/apache/spark/pull/38659#discussion_r1028942267


##
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##
@@ -44,14 +47,18 @@ class SparkConnectProtoSuite extends PlanTest with 
SparkConnectPlanTest {
 
   lazy val connectTestRelation =
 createLocalRelationProto(
-  Seq(AttributeReference("id", IntegerType)(), AttributeReference("name", 
StringType)()))
+  Seq(AttributeReference("id", IntegerType)(), AttributeReference("name", 
StringType)()),
+  Seq())
 
   lazy val connectTestRelation2 =
 createLocalRelationProto(
-  Seq(AttributeReference("id", IntegerType)(), AttributeReference("name", 
StringType)()))
+  Seq(AttributeReference("id", IntegerType)(), AttributeReference("name", 
StringType)()),
+  Seq())

Review Comment:
   ```suggestion
 Seq.empty)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-18 Thread GitBox


HyukjinKwon commented on code in PR #38659:
URL: https://github.com/apache/spark/pull/38659#discussion_r1026348675


##
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala:
##
@@ -21,24 +21,22 @@ import java.nio.charset.StandardCharsets
 import java.sql.{Date, Timestamp}
 import java.text.SimpleDateFormat
 import java.util.Locale
-

Review Comment:
   Let's keep these newlines. I think Scala linter would complain about this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org