[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028944132 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -271,8 +273,12 @@ class SparkConnectPlanner(session: SparkSession) { } private def transformLocalRelation(rel: proto.LocalRelation): LogicalPlan = { -val attributes = rel.getAttributesList.asScala.map(transformAttribute(_)).toSeq -new org.apache.spark.sql.catalyst.plans.logical.LocalRelation(attributes) +val (rows, structType) = ArrowConverters.fromBatchWithSchemaIterator( + Seq(rel.getData.toByteArray).iterator, Review Comment: Should we use the same protobuf message you added, @zhengruifeng ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028943446 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -253,16 +253,94 @@ private[sql] object ArrowConverters extends Logging { val vectorLoader = new VectorLoader(root) vectorLoader.load(arrowRecordBatch) arrowRecordBatch.close() +vectorSchemaRootToIter(root) + } +} + } + + /** + * Maps iterator from serialized ArrowRecordBatches to InternalRows. Different from + * [[fromBatchIterator]], each input arrow batch starts with the schema. + */ + private[sql] def fromBatchWithSchemaIterator( Review Comment: Sorry for late reviews. Can we dedup the logic like `ArrowBatchWithSchemaIterator` is doing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028942760 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -3257,6 +3257,14 @@ private[spark] object Utils extends Logging { case _ => math.max(sortedSize(len / 2), 1) } } + + def closeAll(closeables: AutoCloseable*): Unit = { Review Comment: I think this is too much to have it as a common util at the core module. It's only used twice .. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1028942267 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -44,14 +47,18 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest { lazy val connectTestRelation = createLocalRelationProto( - Seq(AttributeReference("id", IntegerType)(), AttributeReference("name", StringType)())) + Seq(AttributeReference("id", IntegerType)(), AttributeReference("name", StringType)()), + Seq()) lazy val connectTestRelation2 = createLocalRelationProto( - Seq(AttributeReference("id", IntegerType)(), AttributeReference("name", StringType)())) + Seq(AttributeReference("id", IntegerType)(), AttributeReference("name", StringType)()), + Seq()) Review Comment: ```suggestion Seq.empty) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1026348675 ## sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala: ## @@ -21,24 +21,22 @@ import java.nio.charset.StandardCharsets import java.sql.{Date, Timestamp} import java.text.SimpleDateFormat import java.util.Locale - Review Comment: Let's keep these newlines. I think Scala linter would complain about this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org