date:20220622

[GitHub] [spark] MaxGekk commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



MaxGekk commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904609109


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala:
##
@@ -80,13 +91,26 @@ case class DescribeTableExec(
   }
 
   private def addPartitioning(rows: ArrayBuffer[InternalRow]): Unit = {
-rows += emptyRow()
-rows += toCatalystRow("# Partitioning", "", "")
-if (table.partitioning.isEmpty) {
-  rows += toCatalystRow("Not partitioned", "", "")
-} else {
-  rows ++= table.partitioning.zipWithIndex.map {
-case (transform, index) => toCatalystRow(s"Part $index", 
transform.describe(), "")
+if (table.partitioning.nonEmpty) {
+  val partitionColumnsOnly = table.partitioning.forall(t => 
t.isInstanceOf[IdentityTransform])
+  if (partitionColumnsOnly) {
+rows += toCatalystRow("# Partition Information", "", "")
+rows += toCatalystRow(s"# ${output(0).name}", output(1).name, 
output(2).name)
+rows ++= table.partitioning
+  .map(_.asInstanceOf[IdentityTransform].ref.fieldNames())
+  .flatMap(table.schema.findNestedField(_))

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36942: [SPARK-39545][SQL] Override `concat` method for `ExpressionSet` in Scala 2.13 to improve the performance

2022-06-22 Thread GitBox



LuciferYang commented on PR #36942:
URL: https://github.com/apache/spark/pull/36942#issuecomment-1163977592

   thanks all ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904603138


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala:
##
@@ -80,13 +91,26 @@ case class DescribeTableExec(
   }
 
   private def addPartitioning(rows: ArrayBuffer[InternalRow]): Unit = {
-rows += emptyRow()
-rows += toCatalystRow("# Partitioning", "", "")
-if (table.partitioning.isEmpty) {
-  rows += toCatalystRow("Not partitioned", "", "")
-} else {
-  rows ++= table.partitioning.zipWithIndex.map {
-case (transform, index) => toCatalystRow(s"Part $index", 
transform.describe(), "")
+if (table.partitioning.nonEmpty) {
+  val partitionColumnsOnly = table.partitioning.forall(t => 
t.isInstanceOf[IdentityTransform])
+  if (partitionColumnsOnly) {
+rows += toCatalystRow("# Partition Information", "", "")
+rows += toCatalystRow(s"# ${output(0).name}", output(1).name, 
output(2).name)
+rows ++= table.partitioning
+  .map(_.asInstanceOf[IdentityTransform].ref.fieldNames())
+  .flatMap(table.schema.findNestedField(_))

Review Comment:
   Can we do the same here? assert that `table.schema.findNestedField` does not 
return None.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36696:
URL: https://github.com/apache/spark/pull/36696#issuecomment-1163974482

   No problem~ Thank you for informing that, @huaxingao . Take your time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



MaxGekk commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904601619


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala:
##
@@ -80,13 +91,26 @@ case class DescribeTableExec(
   }
 
   private def addPartitioning(rows: ArrayBuffer[InternalRow]): Unit = {
-rows += emptyRow()
-rows += toCatalystRow("# Partitioning", "", "")
-if (table.partitioning.isEmpty) {
-  rows += toCatalystRow("Not partitioned", "", "")
-} else {
-  rows ++= table.partitioning.zipWithIndex.map {
-case (transform, index) => toCatalystRow(s"Part $index", 
transform.describe(), "")
+if (table.partitioning.nonEmpty) {
+  val partitionColumnsOnly = table.partitioning.forall(t => 
t.isInstanceOf[IdentityTransform])
+  if (partitionColumnsOnly) {
+rows += toCatalystRow("# Partition Information", "", "")
+rows += toCatalystRow(s"# ${output(0).name}", output(1).name, 
output(2).name)
+rows ++= table.partitioning
+  .map(_.asInstanceOf[IdentityTransform].ref.fieldNames())
+  .flatMap(table.schema.findNestedField(_))

Review Comment:
   I have checked that v1 should trigger an assert in the case:
   1. describePartitionInfo()
   
https://github.com/apache/spark/blob/0b4739eb2c66ce69ffc16ad05ee0f12fe51d150b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L657
   2. table.partitionSchema
   3. assert(partitionFields.map(_.name) == partitionColumnNames)
   
https://github.com/apache/spark/blob/8bbbdb5a4667d90b27b202870cd73c9a19b74781/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L263



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-06-22 Thread GitBox



huaxingao commented on PR #36696:
URL: https://github.com/apache/spark/pull/36696#issuecomment-1163962503

   @dongjoon-hyun Thanks for the ping. Give me a couple of more days. I want to 
check one more time before mark this ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36963: [SPARK-39564][SS] Expose the information of catalog table to the logical plan in streaming query

2022-06-22 Thread GitBox



HeartSaVioR commented on code in PR #36963:
URL: https://github.com/apache/spark/pull/36963#discussion_r904526435


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala:
##
@@ -77,7 +78,11 @@ class IncrementalExecution(
*/
   override
   lazy val optimizedPlan: LogicalPlan = 
executePhase(QueryPlanningTracker.OPTIMIZATION) {
-sparkSession.sessionState.optimizer.executeAndTrack(withCachedData,
+// Performing pre-optimization for streaming specific
+val preOptimized = withCachedData.transform {

Review Comment:
   Self-comment: additional comment to explain what it does.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala:
##
@@ -877,13 +879,18 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product with Tre
   t.copy(properties = Utils.redact(t.properties).toMap,
 options = Utils.redact(t.options).toMap) :: Nil
 case table: CatalogTable =>
-  table.storage.serde match {
-case Some(serde) => table.identifier :: serde :: Nil
-case _ => table.identifier :: Nil
-  }
+  stringArgsForCatalogTable(table)
+
 case other => other :: Nil
   }.mkString(", ")
 
+  private def stringArgsForCatalogTable(table: CatalogTable): Seq[Any] = {
+table.storage.serde match {
+  case Some(serde) => table.identifier :: serde :: Nil

Review Comment:
   I see inconsistence on representing table identifier on the plan - while 
this produces "quoted" string, I also see existing codebase to produce 
"unquoted" string. e.g. LogicalRelation.
   
   I'd like to hear the voice which is our preference. For now I used 
"unquoted" string for other places and leave this as it is (hence "quoted" 
string).



##
sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala:
##
@@ -167,6 +173,17 @@ case class StreamingDataSourceV2Relation(
 case _ =>
   Statistics(sizeInBytes = conf.defaultSizeInBytes)
   }
+
+  private val stringArgsVal: Seq[Any] = {
+val tableQualifier = (catalog, identifier) match {

Review Comment:
   Printing out `catalog` and `identifier` separately does not seem to give a 
good output, hence I changed to produce a single output for composite fields 
`catalog` and `identifier` whenever they are provided altogether in the logical 
node (only on streaming code path).



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala:
##
@@ -868,6 +868,8 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product with Tre
 case null => Nil
 case None => Nil
 case Some(null) => Nil
+case Some(table: CatalogTable) =>

Review Comment:
   This is needed to represent the CatalogTable properly if it is presented. 
(Before the fix it calls CatalogTable.toString which produces a bunch of 
information with multiple lines.)
   
   I feel it could be something be generalized, but I don't know we have other 
existing cases to handle.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##
@@ -576,14 +589,22 @@ class MicroBatchExecution(
 // Replace sources in the logical plan with data that has arrived since 
the last batch.
 val newBatchesPlan = logicalPlan transform {
   // For v1 sources.
-  case StreamingExecutionRelation(source, output) =>
+  case StreamingExecutionRelation(source, output, catalogTable) =>
 newData.get(source).map { dataPlan =>
   val hasFileMetadata = output.exists {
 case FileSourceMetadataAttribute(_) => true
 case _ => false
   }
   val finalDataPlan = dataPlan transformUp {
-case l: LogicalRelation if hasFileMetadata => 
l.withMetadataColumns()
+case l: LogicalRelation =>

Review Comment:
   Many DSv1 data sources produce LogicalRelation as leaf node, which contains 
Relation having a field of catalogTable. We fill out the information of the 
catalogTable in case when the source is not able to provide the information.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingRelation.scala:
##
@@ -42,7 +48,11 @@ object StreamingRelation {
  * It should be used to create [[Source]] and converted to 
[[StreamingExecutionRelation]] when
  * passing to [[StreamExecution]] to run a query.
  */
-case class StreamingRelation(dataSource: DataSource, sourceName: String, 
output: Seq[Attribute])
+case class StreamingRelation(
+dataSource: DataSource,
+sourceName: String,
+output: Seq[Attribute],
+catalogTable: Option[CatalogTable])

Review Comment:
   Self-comment: DataSource has a field `catalogTable`. Leverage it instead of 
adding a new column.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/WriteToMicroBatchDataSourceV1.scala:
##
@@

[GitHub] [spark] cloud-fan closed pull request #36947: [SPARK-39548][SQL] CreateView Command with a window clause query hit a wrong window definition not found issue

2022-06-22 Thread GitBox



cloud-fan closed pull request #36947: [SPARK-39548][SQL] CreateView Command 
with a window clause query hit a wrong window definition not found issue
URL: https://github.com/apache/spark/pull/36947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36948: [SPARK-39547][SQL] V2SessionCatalog should not throw NoSuchDatabaseException in loadNamspaceMetadata

2022-06-22 Thread GitBox



cloud-fan commented on PR #36948:
URL: https://github.com/apache/spark/pull/36948#issuecomment-1163947633

   thanks, merging to master/3.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #36778: [SPARK-39383][SQL] Support DEFAULT columns in ALTER TABLE ALTER COLUMNS to V2 data sources

2022-06-22 Thread GitBox



gengliangwang commented on PR #36778:
URL: https://github.com/apache/spark/pull/36778#issuecomment-1163946455

   LGTM except for one comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #36948: [SPARK-39547][SQL] V2SessionCatalog should not throw NoSuchDatabaseException in loadNamspaceMetadata

2022-06-22 Thread GitBox



cloud-fan closed pull request #36948: [SPARK-39547][SQL] V2SessionCatalog 
should not throw NoSuchDatabaseException in loadNamspaceMetadata
URL: https://github.com/apache/spark/pull/36948


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #36778: [SPARK-39383][SQL] Support DEFAULT columns in ALTER TABLE ALTER COLUMNS to V2 data sources

2022-06-22 Thread GitBox



gengliangwang commented on code in PR #36778:
URL: https://github.com/apache/spark/pull/36778#discussion_r904530629


##
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala:
##
@@ -203,6 +203,14 @@ private[sql] object CatalogV2Util {
   })
   }
 
+case update: UpdateColumnDefaultValue =>
+  replace(schema, update.fieldNames, field =>
+if (update.newDefaultValue().nonEmpty) {

Review Comment:
   Oh wait, it seems that will be "\"\""
   Then when will the default value get cleared?



##
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala:
##
@@ -203,6 +203,14 @@ private[sql] object CatalogV2Util {
   })
   }
 
+case update: UpdateColumnDefaultValue =>
+  replace(schema, update.fieldNames, field =>
+if (update.newDefaultValue().nonEmpty) {

Review Comment:
   Oh wait, it seems that will be `"\"\""`
   Then when will the default value get cleared?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #36778: [SPARK-39383][SQL] Support DEFAULT columns in ALTER TABLE ALTER COLUMNS to V2 data sources

2022-06-22 Thread GitBox



gengliangwang commented on code in PR #36778:
URL: https://github.com/apache/spark/pull/36778#discussion_r904530233


##
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala:
##
@@ -203,6 +203,14 @@ private[sql] object CatalogV2Util {
   })
   }
 
+case update: UpdateColumnDefaultValue =>
+  replace(schema, update.fieldNames, field =>
+if (update.newDefaultValue().nonEmpty) {

Review Comment:
   What if the new default value is an empty string? Do we just clear the 
default value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #36885: [WIP][SPARK-39489][CORE] Improve event logging JsonProtocol performance by using Jackson instead of Json4s

2022-06-22 Thread GitBox



mridulm commented on PR #36885:
URL: https://github.com/apache/spark/pull/36885#issuecomment-1163944293

   +CC @shardulm94, @thejdeep 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #35683: [SPARK-30835][SPARK-39018][CORE][YARN] Add support for YARN decommissioning when ESS is disabled

2022-06-22 Thread GitBox



mridulm commented on PR #35683:
URL: https://github.com/apache/spark/pull/35683#issuecomment-1163943816

   Catching up on PR's ... this should have been caught by GA hooks, right ?
   Should have double checked, thanks for flagging this @tgravescs !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36957:
URL: https://github.com/apache/spark/pull/36957#discussion_r904529034


##
python/pyspark/sql/catalog.py:
##
@@ -341,6 +357,9 @@ def createTable(
 
 .. versionchanged:: 3.1
Added the ``description`` parameter.
+
+.. versionchanged:: 3.4
+   Made ``tableName`` support 3-layer namespace.

Review Comment:
   to be more general:
   ```
   Allowed ``tableName`` to be qualified with catalog name.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36696:
URL: https://github.com/apache/spark/pull/36696#issuecomment-1163943147

   What is the next step for us, @huaxingao ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #36941: [SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1

2022-06-22 Thread GitBox



cloud-fan closed pull request #36941: [SPARK-39543] The option of 
DataFrameWriterV2 should be passed to storage properties if fallback to v1
URL: https://github.com/apache/spark/pull/36941


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36941: [SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1

2022-06-22 Thread GitBox



cloud-fan commented on PR #36941:
URL: https://github.com/apache/spark/pull/36941#issuecomment-1163932293

   thanks, merging to master/3.3/3.2!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR opened a new pull request, #36963: [SPARK-39564][SS] Expose the information of catalog table to the logical plan in streaming query

2022-06-22 Thread GitBox



HeartSaVioR opened a new pull request, #36963:
URL: https://github.com/apache/spark/pull/36963

   ### What changes were proposed in this pull request?
   
   This PR proposes to expose the information of catalog table (V1/V2) to the 
logical plan in streaming query, specifically, parsed plan and analyzed plan. 
(We may discard some information in optimized plan.)
   
   The major change is to propagate the information of catalog table from the 
place we resolve the table to the place we execute the query. In MicroBatch 
execution, we have several transformations on the logical plan which replace 
the node with another node, hence this PR touches multiple logical nodes which 
the code path passes through.
   
   Specifically for DSv1 sink, we don't have a specific write logical node, 
hence it's not feasible to expose the information for the destination. This PR 
introduces `WriteToMicroBatchDataSourceV1` which is DSv1 version of 
`WriteToMicroBatchDataSource` as a logical node for DSv1 sink. Worth noting 
that `WriteToMicroBatchDataSourceV1` plays as a marker - we eliminate this node 
in streaming specific optimization phase.
   
   ### Why are the changes needed?
   
   This PR give a better UX to end users who use table API for streaming query. 
Previously it's not easy or even not feasible to check which tables are being 
read and written from the streaming query. Most likely it requires end users to 
look into their code/query.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, in parsed/analyzed plan, we now expose the table information into the 
read/write logical node. Specifically for DSv1, we introduce a marker write 
node to expose the information for destination without majorly changing 
existing logic.
   
   > DSv1 read and write
   
   >> Before the patch
   
   https://user-images.githubusercontent.com/1317309/175210731-dcc4cc4d-a70b-467d-b577-79c20600db32.png;>
   
   >> After the patch
   
   https://user-images.githubusercontent.com/1317309/175210753-8f3c0b81-6ec5-45df-a2f8-37589960bba2.png;>
   
   > DSv2 read and write
   
   >> Before the patch
   
   https://user-images.githubusercontent.com/1317309/175210780-4a99c670-8a42-4511-959c-cafe0c24bc00.png;>
   
   >> After the patch
   
   https://user-images.githubusercontent.com/1317309/175210807-2b0b6935-e696-4051-b1a2-725d784d9d5f.png;>
   
   ### How was this patch tested?
   
   New test cases. Also manually tested via running following query and checked 
the UI page:
   
   > DSv1 read and write
   
   ```
   /*
   ./bin/spark-shell --conf "spark.sql.ui.explainMode=extended"
   */
   
   spark.sql("drop table if exists stream_source")
   
   spark.sql("drop table if exists stream_target")
   
   spark.sql("create table stream_source (col1 string, col2 int) using parquet")
   
   spark.sql("create table stream_target (col1 string, col2 int) using parquet")
   
   val checkpointDir = java.nio.file.Files.createTempDirectory("checkpoint-")
   
   val q = 
spark.readStream.table("stream_source").writeStream.format("parquet").option("checkpointLocation",
 checkpointDir.toString).toTable("stream_target")
   
   Thread.sleep(1)
   
   spark.sql("insert into stream_source values ('a', 1)")
   spark.sql("insert into stream_source values ('a', 2)")
   spark.sql("insert into stream_source values ('a', 3)")
   
   q.processAllAvailable()
   
   spark.sql("insert into stream_source values ('b', 1)")
   spark.sql("insert into stream_source values ('b', 2)")
   spark.sql("insert into stream_source values ('b', 3)")
   
   q.processAllAvailable()
   
   spark.sql("insert into stream_source values ('c', 1)")
   spark.sql("insert into stream_source values ('c', 2)")
   spark.sql("insert into stream_source values ('c', 3)")
   
   q.processAllAvailable()
   
   q.stop()
   ```
   
   > DSv2 read and write
   
   ```
   /*
   ./bin/spark-shell --packages 
org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.1\
   --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
   --conf spark.sql.catalog.local.type=hadoop \
   --conf spark.sql.catalog.local.warehouse=$PWD/warehouse \
   --conf spark.sql.ui.explainMode=extended
   */
   
   
   spark.sql("drop table if exists local.db.stream_target")
   
   spark.sql("create table local.db.stream_source (col1 string, col2 int) using 
iceberg")
   
   spark.sql("create table local.db.stream_target (col1 string, col2 int) using 
iceberg")
   
   val checkpointDir = java.nio.file.Files.createTempDirectory("checkpoint-")
   
   val q = 
spark.readStream.table("local.db.stream_source").writeStream.format("iceberg").option("checkpointLocation",
 checkpointDir.toString).toTable("local.db.stream_target")
   
   Thread.sleep(1)
   
   spark.sql("insert into local.db.stream_source values ('a', 1)")
   spark.sql("insert into local.db.stream_source values ('a', 2)")
   spark.sql("insert into local.db.stream_source values ('a', 3)")

[GitHub] [spark] zhengruifeng commented on a diff in pull request #36870: [SPARK-39483][PYTHON] Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array

2022-06-22 Thread GitBox



zhengruifeng commented on code in PR #36870:
URL: https://github.com/apache/spark/pull/36870#discussion_r904503764


##
python/pyspark/sql/session.py:
##
@@ -975,6 +976,17 @@ def createDataFrame(  # type: ignore[misc]
 if data.ndim not in [1, 2]:
 raise ValueError("NumPy array input should be of 1 or 2 
dimensions.")
 column_names = ["value"] if data.ndim == 1 else ["_1", "_2"]
+if schema is None and not self._jconf.arrowPySparkEnabled():
+# Construct `schema` from `np.dtype` of the input NumPy array
+# TODO: Apply the logic below when 
self._jconf.arrowPySparkEnabled() is True
+spark_type = _from_numpy_type(data.dtype)
+if spark_type is not None:
+

Review Comment:
   It is nice to follow existing logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-06-22 Thread GitBox



beliefer commented on code in PR #36663:
URL: https://github.com/apache/spark/pull/36663#discussion_r904495438


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Extract.java:
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.expressions;
+
+import org.apache.spark.annotation.Evolving;
+
+import java.io.Serializable;
+
+/**
+ * Represent an extract expression, which contains a field to be extracted
+ * and a source expression where the field should be extracted.
+ * @since 3.4.0

Review Comment:
   ```suggestion
   
* @since 3.4.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36960: [SPARK-39557][SQL] Support ARRAY, STRUCT, MAP types as DEFAULT values

2022-06-22 Thread GitBox



AmplabJenkins commented on PR #36960:
URL: https://github.com/apache/spark/pull/36960#issuecomment-1163897293

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatib

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904487769


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala:
##
@@ -589,4 +589,25 @@ abstract class Catalog {
* @since 2.0.0
*/
   def refreshByPath(path: String): Unit
+
+  /**
+   * Returns the current default catalog in this session.
+   *
+   * @since 3.4.0
+   */
+  def currentCatalog(): String

Review Comment:
   anyway, let's update the build file to make mima happy



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] DS V2 supports push down OFFSET operator

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36295:
URL: https://github.com/apache/spark/pull/36295#discussion_r904487223


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala:
##
@@ -407,7 +407,67 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper wit
 case other => (other, false)
   }
 
-  def pushDownLimits(plan: LogicalPlan): LogicalPlan = plan.transform {
+  private def pushDownOffset(
+  plan: LogicalPlan,
+  offset: Int): Boolean = plan match {
+case sHolder: ScanBuilderHolder =>
+  val isPushed = PushDownUtils.pushOffset(sHolder.builder, offset)
+  if (isPushed) {
+sHolder.pushedOffset = Some(offset)
+  }
+  isPushed
+case Project(projectList, child) if projectList.forall(_.deterministic) =>
+  pushDownOffset(child, offset)
+case _ => false
+  }
+
+  def pushDownLimitAndOffset(plan: LogicalPlan): LogicalPlan = plan.transform {
+case offset @ LimitAndOffset(limit, offsetValue, child) =>
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit)
+  if (canRemoveLimit) {
+// Try to push down OFFSET only if the LIMIT operator has been pushed 
and can be removed.
+// For `df.limit(m).offset(n)`, try to push down `limit(m).offset(n)`.
+// For example, `df.limit(5).offset(3)`, we can push down 
`limit(5).offset(3)`.
+val isPushed = pushDownOffset(newChild, offsetValue)
+if (isPushed) {
+  newChild
+} else {
+  // For `df.limit(m).offset(n)`, only push down `limit(m)`.
+  // Keep the OFFSET operator if we failed to push down OFFSET to the 
data source.
+  offset.withNewChildren(Seq(newChild))
+}
+  } else {
+// Keep the OFFSET operator if we can't remove LIMIT operator.
+offset
+  }
+case globalLimit @ OffsetAndLimit(offset, limit, child) =>
+  // For `df.offset(n).limit(m)`, we can push down `limit(m + n)` first.
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit + offset)
+  if (canRemoveLimit) {
+// Try to push down OFFSET only if the LIMIT operator has been pushed 
and can be removed.
+// For `df.offset(n).limit(m)`, try to push down `limit(m + 
n).offset(n)`.
+// For example, `df.offset(3).limit(5)`, we can push down 
`limit(8).offset(3)`.
+val isPushed = pushDownOffset(newChild, offset)
+if (isPushed) {
+  newChild
+} else {
+  // Still keep the OFFSET operator if we can't push it down.
+  // For example, `df.offset(3).limit(5)`, `limit(8)` has been pushed
+  // and can be removed, Spark still do `offset(3)`.

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] DS V2 supports push down OFFSET operator

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36295:
URL: https://github.com/apache/spark/pull/36295#discussion_r904486692


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala:
##
@@ -407,7 +407,67 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper wit
 case other => (other, false)
   }
 
-  def pushDownLimits(plan: LogicalPlan): LogicalPlan = plan.transform {
+  private def pushDownOffset(
+  plan: LogicalPlan,
+  offset: Int): Boolean = plan match {
+case sHolder: ScanBuilderHolder =>
+  val isPushed = PushDownUtils.pushOffset(sHolder.builder, offset)
+  if (isPushed) {
+sHolder.pushedOffset = Some(offset)
+  }
+  isPushed
+case Project(projectList, child) if projectList.forall(_.deterministic) =>
+  pushDownOffset(child, offset)
+case _ => false
+  }
+
+  def pushDownLimitAndOffset(plan: LogicalPlan): LogicalPlan = plan.transform {
+case offset @ LimitAndOffset(limit, offsetValue, child) =>
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit)
+  if (canRemoveLimit) {
+// Try to push down OFFSET only if the LIMIT operator has been pushed 
and can be removed.
+// For `df.limit(m).offset(n)`, try to push down `limit(m).offset(n)`.
+// For example, `df.limit(5).offset(3)`, we can push down 
`limit(5).offset(3)`.
+val isPushed = pushDownOffset(newChild, offsetValue)
+if (isPushed) {
+  newChild
+} else {
+  // For `df.limit(m).offset(n)`, only push down `limit(m)`.
+  // Keep the OFFSET operator if we failed to push down OFFSET to the 
data source.
+  offset.withNewChildren(Seq(newChild))
+}
+  } else {
+// Keep the OFFSET operator if we can't remove LIMIT operator.
+offset
+  }
+case globalLimit @ OffsetAndLimit(offset, limit, child) =>
+  // For `df.offset(n).limit(m)`, we can push down `limit(m + n)` first.
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit + offset)
+  if (canRemoveLimit) {
+// Try to push down OFFSET only if the LIMIT operator has been pushed 
and can be removed.
+// For `df.offset(n).limit(m)`, try to push down `limit(m + 
n).offset(n)`.
+// For example, `df.offset(3).limit(5)`, we can push down 
`limit(8).offset(3)`.

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] DS V2 supports push down OFFSET operator

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36295:
URL: https://github.com/apache/spark/pull/36295#discussion_r904486458


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala:
##
@@ -407,7 +407,67 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper wit
 case other => (other, false)
   }
 
-  def pushDownLimits(plan: LogicalPlan): LogicalPlan = plan.transform {
+  private def pushDownOffset(
+  plan: LogicalPlan,
+  offset: Int): Boolean = plan match {
+case sHolder: ScanBuilderHolder =>
+  val isPushed = PushDownUtils.pushOffset(sHolder.builder, offset)
+  if (isPushed) {
+sHolder.pushedOffset = Some(offset)
+  }
+  isPushed
+case Project(projectList, child) if projectList.forall(_.deterministic) =>
+  pushDownOffset(child, offset)
+case _ => false
+  }
+
+  def pushDownLimitAndOffset(plan: LogicalPlan): LogicalPlan = plan.transform {
+case offset @ LimitAndOffset(limit, offsetValue, child) =>
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit)
+  if (canRemoveLimit) {
+// Try to push down OFFSET only if the LIMIT operator has been pushed 
and can be removed.
+// For `df.limit(m).offset(n)`, try to push down `limit(m).offset(n)`.
+// For example, `df.limit(5).offset(3)`, we can push down 
`limit(5).offset(3)`.
+val isPushed = pushDownOffset(newChild, offsetValue)
+if (isPushed) {
+  newChild
+} else {
+  // For `df.limit(m).offset(n)`, only push down `limit(m)`.

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] DS V2 supports push down OFFSET operator

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36295:
URL: https://github.com/apache/spark/pull/36295#discussion_r904486283


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala:
##
@@ -407,7 +407,67 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper wit
 case other => (other, false)
   }
 
-  def pushDownLimits(plan: LogicalPlan): LogicalPlan = plan.transform {
+  private def pushDownOffset(
+  plan: LogicalPlan,
+  offset: Int): Boolean = plan match {
+case sHolder: ScanBuilderHolder =>
+  val isPushed = PushDownUtils.pushOffset(sHolder.builder, offset)
+  if (isPushed) {
+sHolder.pushedOffset = Some(offset)
+  }
+  isPushed
+case Project(projectList, child) if projectList.forall(_.deterministic) =>
+  pushDownOffset(child, offset)
+case _ => false
+  }
+
+  def pushDownLimitAndOffset(plan: LogicalPlan): LogicalPlan = plan.transform {
+case offset @ LimitAndOffset(limit, offsetValue, child) =>
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit)
+  if (canRemoveLimit) {
+// Try to push down OFFSET only if the LIMIT operator has been pushed 
and can be removed.
+// For `df.limit(m).offset(n)`, try to push down `limit(m).offset(n)`.
+// For example, `df.limit(5).offset(3)`, we can push down 
`limit(5).offset(3)`.

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compat

2022-06-22 Thread GitBox



HyukjinKwon commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904485439


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala:
##
@@ -589,4 +589,25 @@ abstract class Catalog {
* @since 2.0.0
*/
   def refreshByPath(path: String): Unit
+
+  /**
+   * Returns the current default catalog in this session.
+   *
+   * @since 3.4.0
+   */
+  def currentCatalog(): String

Review Comment:
   Yeah, acutlaly not sure why it complains.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904482933


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala:
##
@@ -80,13 +91,26 @@ case class DescribeTableExec(
   }
 
   private def addPartitioning(rows: ArrayBuffer[InternalRow]): Unit = {
-rows += emptyRow()
-rows += toCatalystRow("# Partitioning", "", "")
-if (table.partitioning.isEmpty) {
-  rows += toCatalystRow("Not partitioned", "", "")
-} else {
-  rows ++= table.partitioning.zipWithIndex.map {
-case (transform, index) => toCatalystRow(s"Part $index", 
transform.describe(), "")
+if (table.partitioning.nonEmpty) {
+  val partitionColumnsOnly = table.partitioning.forall(t => 
t.isInstanceOf[IdentityTransform])
+  if (partitionColumnsOnly) {
+rows += toCatalystRow("# Partition Information", "", "")
+rows += toCatalystRow(s"# ${output(0).name}", output(1).name, 
output(2).name)
+rows ++= table.partitioning
+  .map(_.asInstanceOf[IdentityTransform].ref.fieldNames())
+  .flatMap(table.schema.findNestedField(_))

Review Comment:
   what's the v1 DESC TABLE behavior for malformed tables? e.g. partition 
column does not exist in the table schema? do we fail or silently ignore it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #36295: [SPARK-38978][SQL] DS V2 supports push down OFFSET operator

2022-06-22 Thread GitBox



beliefer commented on code in PR #36295:
URL: https://github.com/apache/spark/pull/36295#discussion_r904481912


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala:
##
@@ -407,7 +407,66 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper wit
 case other => (other, false)
   }
 
-  def pushDownLimits(plan: LogicalPlan): LogicalPlan = plan.transform {
+  private def pushDownOffset(
+  plan: LogicalPlan,
+  offset: Int): Boolean = plan match {
+case sHolder: ScanBuilderHolder =>
+  val isPushed = PushDownUtils.pushOffset(sHolder.builder, offset)
+  if (isPushed) {
+sHolder.pushedOffset = Some(offset)
+  }
+  isPushed
+case p: Project =>
+  pushDownOffset(p.child, offset)
+case _ => false
+  }
+
+  def pushDownLimitAndOffset(plan: LogicalPlan): LogicalPlan = plan.transform {
+case offset @ LimitAndOffset(limit, offsetValue, child) =>
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit)
+  if (canRemoveLimit) {
+// If we can remove limit, it indicates data source only have one 
partition.
+// For `dataset.limit(m).offset(n)`, try to push down 
`limit(m).offset(n)`.
+// For example, `dataset.limit(5).offset(3)`, we can push down 
`limit(5).offset(3)`.
+val isPushed = pushDownOffset(newChild, offsetValue)
+if (isPushed) {
+  newChild
+} else {
+  // For `dataset.limit(m).offset(n)`, only push down `limit(m)`.
+  // Keep the OFFSET operator if we failed to push down OFFSET to the 
data source.
+  offset.withNewChildren(Seq(newChild))
+}
+  } else {
+// Keep the OFFSET operator if we can't remove LIMIT operator.
+offset
+  }
+case globalLimit @ OffsetAndLimit(offset, limit, child) =>
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit + offset)

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatib

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904481141


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala:
##
@@ -589,4 +589,25 @@ abstract class Catalog {
* @since 2.0.0
*/
   def refreshByPath(path: String): Unit
+
+  /**
+   * Returns the current default catalog in this session.
+   *
+   * @since 3.4.0
+   */
+  def currentCatalog(): String
+
+  /**
+   * Sets the current default catalog in this session.
+   *
+   * @since 3.4.0
+   */
+  def setCurrentCatalog(catalogName: String): Unit
+
+  /**
+   * Returns a list of catalogs in this session.

Review Comment:
   ```suggestion
  * Returns a list of catalogs available in this session.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatib

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904480952


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala:
##
@@ -589,4 +589,25 @@ abstract class Catalog {
* @since 2.0.0
*/
   def refreshByPath(path: String): Unit
+
+  /**
+   * Returns the current default catalog in this session.
+   *
+   * @since 3.4.0
+   */
+  def currentCatalog(): String
+
+  /**
+   * Sets the current default catalog in this session.

Review Comment:
   ```suggestion
  * Sets the current catalog in this session.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatib

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904480863


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala:
##
@@ -589,4 +589,25 @@ abstract class Catalog {
* @since 2.0.0
*/
   def refreshByPath(path: String): Unit
+
+  /**
+   * Returns the current default catalog in this session.
+   *
+   * @since 3.4.0
+   */
+  def currentCatalog(): String

Review Comment:
   Do we need backward compatibility here? This is an interface that Spark 
implements and users call. Adding a new method shouldn't break anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikf commented on a diff in pull request #36941: [SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1

2022-06-22 Thread GitBox



Yikf commented on code in PR #36941:
URL: https://github.com/apache/spark/pull/36941#discussion_r904480670


##
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala:
##
@@ -531,6 +534,23 @@ class DataFrameWriterV2Suite extends QueryTest with 
SharedSparkSession with Befo
 assert(table.properties === (Map("provider" -> "foo") ++ 
defaultOwnership).asJava)
   }
 
+  test("SPARK-39543 writeOption should be passed to storage properties when 
fallback to v1") {
+val provider = classOf[InMemoryV1Provider].getName
+
+withSQLConf((SQLConf.USE_V1_SOURCE_LIST.key, provider)) {

Review Comment:
   Yea, Other tests trigger v1 fallback without set `USE_V1_SOURCE_LIST `, 
AFAIK, 
   - Other tests aim to test the read/write process, and the 
`InMemoryV1Provider` is actually a v2 format, and we trigger v1 fallback at the 
`newScanBuilder` & `newWriteBuilder` layer.
   - This test in PR needs to be fallback to V1 when the table is created, so 
we need to set `USE_V1_SOURCE_LIST`(see: 
[isV2Provider](https://github.com/apache/spark/blob/3d68ad8003a16229bd79f86cb31f618167814a7f/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L604))



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatib

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904480555


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala:
##
@@ -589,4 +589,25 @@ abstract class Catalog {
* @since 2.0.0
*/
   def refreshByPath(path: String): Unit
+
+  /**
+   * Returns the current default catalog in this session.

Review Comment:
   ```suggestion
  * Returns the current catalog in this session.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #36295: [SPARK-38978][SQL] DS V2 supports push down OFFSET operator

2022-06-22 Thread GitBox



beliefer commented on code in PR #36295:
URL: https://github.com/apache/spark/pull/36295#discussion_r904473795


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCScanBuilder.scala:
##
@@ -139,6 +142,25 @@ case class JDBCScanBuilder(
 false
   }
 
+  override def pushOffset(offset: Int): Boolean = {
+if (jdbcOptions.pushDownOffset && !isPartiallyPushed) {
+  // Spark pushes down LIMIT first, then OFFSET. In SQL statements, OFFSET 
is applied before
+  // LIMIT. Here we need to adjust the LIMIT value to match SQL statements.
+  // 1. For `dataset.limit(m).offset(n)`, try to push down `LIMIT (m - n) 
OFFSET n`.
+  //For example, `dataset.limit(5).offset(3)`, we can push down `LIMIT 
2 OFFSET 3`.
+  // 2. For `dataset.offset(n).limit(m)`, try to push down `LIMIT m OFFSET 
n`.
+  //For example, `dataset.offset(3).limit(5)`, we can push down `LIMIT 
5 OFFSET 3`.
+  // 3. For `dataset.offset(n)`, try to push down `OFFSET n`.

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #36953: [SPARK-39551][SQL] Add AQE invalid plan check

2022-06-22 Thread GitBox



ulysses-you commented on code in PR #36953:
URL: https://github.com/apache/spark/pull/36953#discussion_r904468078


##
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ValidateSparkPlan.scala:
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.catalyst.optimizer.{BuildLeft, BuildRight}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
BroadcastNestedLoopJoinExec}
+
+/**
+ * Detects invalid physical plans generated by AQE replanning and throws 
`InvalidAQEPlanException`
+ * if such plans are detected. This rule should be called after 
EnsureRequirements where all
+ * necessary Exchange nodes are added.
+ */
+object ValidateSparkPlan extends Rule[SparkPlan] {
+
+  def apply(plan: SparkPlan): SparkPlan = {
+validate(plan)
+plan
+  }
+
+  /**
+   * Validate that the plan satisfies the following condition:
+   * - BroadcastQueryStage only appears as the immediate child and the build 
side of a broadcast
+   *   hash join or broadcast nested loop join.
+   */
+  private def validate(plan: SparkPlan): Unit = plan match {
+case b: BroadcastHashJoinExec =>
+  val (buildPlan, probePlan) = b.buildSide match {
+case BuildLeft => (b.left, b.right)
+case BuildRight => (b.right, b.left)
+  }
+  if (!buildPlan.isInstanceOf[BroadcastQueryStageExec]) {

Review Comment:
   oh, got it. It checks the `BroadcastQueryStageExec` can only appear at the 
direct child of broadcast join but not check the direct child must be the 
`BroadcastQueryStageExec`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36954: [WIP][SPARK-39519][BUILD] Add `-XX:NewRatio=4` to test javaOptions to improve the test stability of `sql/core` module.

2022-06-22 Thread GitBox



LuciferYang commented on PR #36954:
URL: https://github.com/apache/spark/pull/36954#issuecomment-1163859874

   > Thank you for investigation, @LuciferYang . Could you add several empty 
commits to see the flakiness?
   
   ok


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatib

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904460638


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala:
##
@@ -26,6 +26,22 @@ import 
org.apache.spark.sql.catalyst.DefinedByConstructorParams
 // Note: all classes here are expected to be wrapped in Datasets and so must 
extend
 // DefinedByConstructorParams for the catalog to be able to create encoders 
for them.
 
+/**
+ * A catalog in Spark, as returned by the `listCatalogs` method defined in 
[[Catalog]].
+ *
+ * @param name name of the catalog
+ * @since 3.2.0
+ */
+class CatalogMetadata(
+val name: String)

Review Comment:
   ah I see!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #36295: [SPARK-38978][SQL] DS V2 supports push down OFFSET operator

2022-06-22 Thread GitBox



beliefer commented on code in PR #36295:
URL: https://github.com/apache/spark/pull/36295#discussion_r904460364


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala:
##
@@ -407,7 +407,66 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper wit
 case other => (other, false)
   }
 
-  def pushDownLimits(plan: LogicalPlan): LogicalPlan = plan.transform {
+  private def pushDownOffset(
+  plan: LogicalPlan,
+  offset: Int): Boolean = plan match {
+case sHolder: ScanBuilderHolder =>
+  val isPushed = PushDownUtils.pushOffset(sHolder.builder, offset)
+  if (isPushed) {
+sHolder.pushedOffset = Some(offset)
+  }
+  isPushed
+case p: Project =>
+  pushDownOffset(p.child, offset)
+case _ => false
+  }
+
+  def pushDownLimitAndOffset(plan: LogicalPlan): LogicalPlan = plan.transform {
+case offset @ LimitAndOffset(limit, offsetValue, child) =>
+  val (newChild, canRemoveLimit) = pushDownLimit(child, limit)
+  if (canRemoveLimit) {
+// If we can remove limit, it indicates data source only have one 
partition.

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

2022-06-22 Thread GitBox



HyukjinKwon commented on PR #36957:
URL: https://github.com/apache/spark/pull/36957#issuecomment-1163848082

   That's sort of annoying problem yeah. We should build Spark w/ `sbt 
test:package`, and then run the tests. Should probably have a logic like 
https://github.com/apache/spark/blob/master/python/pyspark/sql/tests/test_dataframe.py#L1204-L1238.
 Another option is to move the test codes into somewhere within the main code 
with `private[sql]`. the classes or methods with `private[sql]` can be accessed 
via Py4J (because that syntax is specific to Scala, and Java class files don't 
understand them).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #36957: [SPARK-39555][PYTHON] Make createTable and listTables in the python side support 3-layer-namespace

2022-06-22 Thread GitBox



zhengruifeng commented on PR #36957:
URL: https://github.com/apache/spark/pull/36957#issuecomment-1163845192

   @HyukjinKwon @cloud-fan @amaliujia @huaxingao 
   
   Please take a look when you are free.
   
   I encountered a problem that the python side also need `InMemoryCatalog` if 
we want to add UT for 3-layer-namespace, but `InMemoryCatalog` is only for 
`test` and not accessable from the python side. Is there some approach to work 
around? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

2022-06-22 Thread GitBox



HyukjinKwon commented on PR #36784:
URL: https://github.com/apache/spark/pull/36784#issuecomment-1163842938

   @xiuzhu9527 mind rebasing your fork to sync with the latest master branch in 
Apache Spark? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compat

2022-06-22 Thread GitBox



HyukjinKwon commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904453265


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -743,4 +722,30 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest with BeforeAndAf
 val catalogName2 = "catalog_not_exists"
 assert(!spark.catalog.databaseExists(Array(catalogName2, 
dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - cache table, isCached and 
uncacheTable") {

Review Comment:
   I would add a JIRA prefix here in the test title. e.g., `"SPARK-39506: three 
...`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compat

2022-06-22 Thread GitBox



HyukjinKwon commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904453265


##
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala:
##
@@ -743,4 +722,30 @@ class CatalogSuite extends SharedSparkSession with 
AnalysisTest with BeforeAndAf
 val catalogName2 = "catalog_not_exists"
 assert(!spark.catalog.databaseExists(Array(catalogName2, 
dbName).mkString(".")))
   }
+
+  test("three layer namespace compatibility - cache table, isCached and 
uncacheTable") {

Review Comment:
   I would add a JIRA prefix here in the test title.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compat

2022-06-22 Thread GitBox



HyukjinKwon commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904452971


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala:
##
@@ -26,6 +26,24 @@ import 
org.apache.spark.sql.catalyst.DefinedByConstructorParams
 // Note: all classes here are expected to be wrapped in Datasets and so must 
extend
 // DefinedByConstructorParams for the catalog to be able to create encoders 
for them.
 
+/**
+ * A catalog in Spark, as returned by the `listCatalogs` method defined in 
[[Catalog]].
+ *
+ * @param name name of the catalog
+ * @since 3.4.0
+ */
+class CatalogMetadata(

Review Comment:
   Would have to be a case class.



##
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala:
##
@@ -26,6 +26,24 @@ import 
org.apache.spark.sql.catalyst.DefinedByConstructorParams
 // Note: all classes here are expected to be wrapped in Datasets and so must 
extend
 // DefinedByConstructorParams for the catalog to be able to create encoders 
for them.
 
+/**
+ * A catalog in Spark, as returned by the `listCatalogs` method defined in 
[[Catalog]].
+ *
+ * @param name name of the catalog

Review Comment:
   `description` is missing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compat

2022-06-22 Thread GitBox



HyukjinKwon commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904452603


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala:
##
@@ -589,4 +589,25 @@ abstract class Catalog {
* @since 2.0.0
*/
   def refreshByPath(path: String): Unit
+
+  /**
+   * Returns the current default catalog in this session.
+   *
+   * @since 3.4.0
+   */
+  def currentCatalog(): String

Review Comment:
   Would need to 1. have a default implementation for backward compatibility, 
or 2. exclude this from binary compatibility check.
   
   The real error is:
   
   ```
   [error] spark-sql: Failed binary compatibility check against 
org.apache.spark:spark-sql_2.12:3.2.0! Found 3 potential problems (filtered 602)
   [error]  * abstract method currentCatalog()java.lang.String in class 
org.apache.spark.sql.catalog.Catalog is present only in current version
   [error]filter with: 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.currentCatalog")
   [error]  * abstract method setCurrentCatalog(java.lang.String)Unit in class 
org.apache.spark.sql.catalog.Catalog is present only in current version
   [error]filter with: 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.setCurrentCatalog")
   [error]  * abstract method listCatalogs()org.apache.spark.sql.Dataset in 
class org.apache.spark.sql.catalog.Catalog is present only in current version
   [error]filter with: 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.listCatalogs")
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatible

2022-06-22 Thread GitBox



HyukjinKwon commented on PR #36904:
URL: https://github.com/apache/spark/pull/36904#issuecomment-1163839514

   Adding @zhengruifeng FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-06-22 Thread GitBox



HyukjinKwon commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r904450412


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##
@@ -529,6 +529,15 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
""".stripMargin)
   }
 
+  def inferDateWithLegacyTimeParserError(): Throwable = {
+new IllegalArgumentException(

Review Comment:
   For user-facing errors, it should inherit `SparkThrowable`. cc 
@gengliangwang @MaxGekk @cloud-fan fyi



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-06-22 Thread GitBox



HyukjinKwon commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r904450171


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala:
##
@@ -195,6 +195,19 @@ class CSVOptions(
*/
   val enforceSchema = getBool("enforceSchema", default = true)
 
+  /**
+   * Infer columns with all valid date entries as date type (otherwise 
inferred as timestamp type).
+   * Disabled by default for backwards compatibility and performance. When 
enabled, date entries in
+   * timestamp columns will be cast to timestamp upon parsing. Not compatible 
with
+   * legacyTimeParserPolicy == LEGACY
+   */
+  val inferDate = {
+val inferDateFlag = getBool("inferDate")
+if (SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY && 
inferDateFlag) {
+  throw QueryExecutionErrors.inferDateWithLegacyTimeParserError()

Review Comment:
   Yeah, this is nice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-06-22 Thread GitBox



HyukjinKwon commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r904449724


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala:
##
@@ -117,8 +123,19 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
 case LongType => tryParseLong(field)
 case _: DecimalType => tryParseDecimal(field)
 case DoubleType => tryParseDouble(field)
-case TimestampNTZType => tryParseTimestampNTZ(field)
-case TimestampType => tryParseTimestamp(field)
+case DateType => tryParseDateTime(field)
+case TimestampNTZType =>

Review Comment:
   Maybe we can fix it as below:
   
   ```suggestion
   case TimestampNTZType options.inferDate => tryParseDateTime(field)
   case TimestampNTZType => tryParseTimestampNTZ(field)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36942: [SPARK-39545][SQL] Override `concat` method for `ExpressionSet` in Scala 2.13 to improve the performance

2022-06-22 Thread GitBox



cloud-fan commented on PR #36942:
URL: https://github.com/apache/spark/pull/36942#issuecomment-1163820993

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36848: [SPARK-39449][SQL] Propagate empty relation through Window

2022-06-22 Thread GitBox



wangyum commented on PR #36848:
URL: https://github.com/apache/spark/pull/36848#issuecomment-1163816151

   cc @sigmod @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36962: [SPARK-39563][CORE][TESTS] Use `localHostNameForURI` in `UISuite`

2022-06-22 Thread GitBox



dongjoon-hyun closed pull request #36962: [SPARK-39563][CORE][TESTS] Use 
`localHostNameForURI` in `UISuite`
URL: https://github.com/apache/spark/pull/36962


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sigmod commented on a diff in pull request #36909: [SPARK-39511][SQL] Push down limit 1 for right side of left semi/anti join if join condition is empty

2022-06-22 Thread GitBox



sigmod commented on code in PR #36909:
URL: https://github.com/apache/spark/pull/36909#discussion_r904429256


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##
@@ -753,6 +753,9 @@ object LimitPushDown extends Rule[LogicalPlan] {
 // Merge offset value and limit value into LocalLimit and pushes down 
LocalLimit through Offset.
 case LocalLimit(le, Offset(oe, grandChild)) =>
   Offset(oe, LocalLimit(Add(le, oe), grandChild))
+// Push down local limit 1 if join type is LeftSemiOrAnti and join 
condition is empty.
+case j @ Join(_, right, LeftSemiOrAnti(_), None, _) if 
!right.maxRows.exists(_ <= 1) =>

Review Comment:
   Gotcha, make sense. Thank you, @wangyum !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36961: [SPARK-39562][SQL][TESTS] Make `hive-thriftserver` module unit tests pass in IPv6 env

2022-06-22 Thread GitBox



dongjoon-hyun closed pull request #36961: [SPARK-39562][SQL][TESTS] Make 
`hive-thriftserver` module unit tests pass in IPv6 env
URL: https://github.com/apache/spark/pull/36961


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36961: [SPARK-39562][SQL][TESTS] Make `hive-thriftserver` module unit tests pass in IPv6 env

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36961:
URL: https://github.com/apache/spark/pull/36961#issuecomment-1163807315

   Thank you, @HyukjinKwon  and @wangyum .
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] williamhyun opened a new pull request, #36962: [SPARK-39563][CORE][TESTS] Use localHostNameForURI in UISuite

2022-06-22 Thread GitBox



williamhyun opened a new pull request, #36962:
URL: https://github.com/apache/spark/pull/36962

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36951: [SPARK-38614][SQL] Don't push down limit through window that's using percent_rank

2022-06-22 Thread GitBox



wangyum commented on PR #36951:
URL: https://github.com/apache/spark/pull/36951#issuecomment-1163806447

   Merged to master, branch-3.3 and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #36926: [SPARK-39533][ML] Deprecate scoreLabelsWeight in BinaryClassificationMetrics

2022-06-22 Thread GitBox



zhengruifeng commented on PR #36926:
URL: https://github.com/apache/spark/pull/36926#issuecomment-1163802602

   Thank you @srowen @huaxingao !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36961: [SPARK-39562][SQL][TESTS] Make `hive-thriftserver` module unit tests pass in IPv6 env

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36961:
URL: https://github.com/apache/spark/pull/36961#issuecomment-1163800429

   Thank you, @wangyum !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum closed pull request #36951: [SPARK-38614][SQL] Don't push down limit through window that's using percent_rank

2022-06-22 Thread GitBox



wangyum closed pull request #36951: [SPARK-38614][SQL] Don't push down limit 
through window that's using percent_rank
URL: https://github.com/apache/spark/pull/36951


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a diff in pull request #36909: [SPARK-39511][SQL] Push down limit 1 for right side of left semi/anti join if join condition is empty

2022-06-22 Thread GitBox



wangyum commented on code in PR #36909:
URL: https://github.com/apache/spark/pull/36909#discussion_r904407888


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##
@@ -753,6 +753,9 @@ object LimitPushDown extends Rule[LogicalPlan] {
 // Merge offset value and limit value into LocalLimit and pushes down 
LocalLimit through Offset.
 case LocalLimit(le, Offset(oe, grandChild)) =>
   Offset(oe, LocalLimit(Add(le, oe), grandChild))
+// Push down local limit 1 if join type is LeftSemiOrAnti and join 
condition is empty.
+case j @ Join(_, right, LeftSemiOrAnti(_), None, _) if 
!right.maxRows.exists(_ <= 1) =>

Review Comment:
   ```scala
   import org.apache.spark.sql.catalyst.plans.logical.Join
   val df = spark.sql("select * from range(5) t1 left semi join (select * 
from range(10) where id < 0) t2")
   println(df.queryExecution.optimizedPlan.asInstanceOf[Join].right.maxRows)
   ```
   The `right.maxRows` is `Some(10)` but actually rows is 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Jonathancui123 commented on pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-06-22 Thread GitBox



Jonathancui123 commented on PR #36871:
URL: https://github.com/apache/spark/pull/36871#issuecomment-1163777137

   I added a new `QueryExecutionError` called 
`inferDateWithLegacyTimeParserError` that is thrown when `inferDate=true` and 
SQL Configuration `LegacyTimeParserPolicy` is `LEGACY`. This prevents the 
legacy parser from being used with schema inference for date and fixes the 
failing test 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Jonathancui123 commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-06-22 Thread GitBox



Jonathancui123 commented on code in PR #36871:
URL: https://github.com/apache/spark/pull/36871#discussion_r904402104


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala:
##
@@ -110,15 +116,43 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
   def inferField(typeSoFar: DataType, field: String): DataType = {
 if (field == null || field.isEmpty || field == options.nullValue) {
   typeSoFar
+} else
+if (options.inferDate) {
+  val typeElemInfer = typeSoFar match {
+case NullType => tryParseInteger(field)
+case IntegerType => tryParseInteger(field)
+case LongType => tryParseLong(field)
+case _: DecimalType => tryParseDecimal(field)
+case DoubleType => tryParseDouble(field)
+case DateType => tryParseDateTime(field)
+case TimestampNTZType => tryParseDateTime(field)
+case TimestampType => tryParseDateTime(field)
+case BooleanType => tryParseBoolean(field)
+case StringType => StringType
+case other: DataType =>
+  throw QueryExecutionErrors.dataTypeUnexpectedError(other)
+  }
+  compatibleType(typeSoFar, typeElemInfer).getOrElse(StringType)
 } else {
   val typeElemInfer = typeSoFar match {
 case NullType => tryParseInteger(field)
 case IntegerType => tryParseInteger(field)
 case LongType => tryParseLong(field)
 case _: DecimalType => tryParseDecimal(field)
 case DoubleType => tryParseDouble(field)
-case TimestampNTZType => tryParseTimestampNTZ(field)
-case TimestampType => tryParseTimestamp(field)
+case DateType => tryParseDateTime(field)
+case TimestampNTZType =>
+  if (options.inferDate) {

Review Comment:
   My bad, I meant to remove the first match expression but missed it during 
code cleanup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36951: [SPARK-38614][SQL] Don't push down limit through window that's using percent_rank

2022-06-22 Thread GitBox



HyukjinKwon commented on PR #36951:
URL: https://github.com/apache/spark/pull/36951#issuecomment-1163774647

   cc @hvanhovell FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #36930: [SPARK-39538][SQL] Avoid creating unnecessary SLF4J Logger

2022-06-22 Thread GitBox



srowen commented on PR #36930:
URL: https://github.com/apache/spark/pull/36930#issuecomment-1163770578

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #36930: [SPARK-39538][SQL] Avoid creating unnecessary SLF4J Logger

2022-06-22 Thread GitBox



srowen closed pull request #36930: [SPARK-39538][SQL] Avoid creating 
unnecessary SLF4J Logger
URL: https://github.com/apache/spark/pull/36930


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36961: [SPARK-39562][SQL][TESTS] Make hive-thrift server module passes in IPv6 env

2022-06-22 Thread GitBox



dongjoon-hyun opened a new pull request, #36961:
URL: https://github.com/apache/spark/pull/36961

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #36942: [SPARK-39545][SQL] Override `concat` method for `ExpressionSet` in Scala 2.13 to improve the performance

2022-06-22 Thread GitBox



srowen closed pull request #36942: [SPARK-39545][SQL] Override `concat` method 
for `ExpressionSet` in Scala 2.13 to improve the performance
URL: https://github.com/apache/spark/pull/36942


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #36942: [SPARK-39545][SQL] Override `concat` method for `ExpressionSet` in Scala 2.13 to improve the performance

2022-06-22 Thread GitBox



srowen commented on PR #36942:
URL: https://github.com/apache/spark/pull/36942#issuecomment-1163770025

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] otterc commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-22 Thread GitBox



otterc commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r903254647


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -993,6 +1311,42 @@ AppShufflePartitionInfo getPartitionInfo() {
 }
   }
 
+  /**
+   * Simply encodes an application attempt ID.

Review Comment:
   Nit: Remove `Simply`



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1021,12 +1375,55 @@ public boolean isFinalized() {
 }
   }
 
+  /**
+   * Simply encodes an application attempt shuffle merge ID.

Review Comment:
   Nit: Remove `Simply`



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -632,6 +737,14 @@ public void registerExecutor(String appId, 
ExecutorShuffleInfo executorInfo) {
   appsShuffleInfo.compute(appId, (id, appShuffleInfo) -> {
 if (appShuffleInfo == null || attemptId > 
appShuffleInfo.attemptId) {
   originalAppShuffleInfo.set(appShuffleInfo);
+  AppPathsInfo appPathsInfo = new AppPathsInfo(appId, 
executorInfo.localDirs,
+  mergeDir, executorInfo.subDirsPerLocalDir);
+  // Clean up the outdated App Attempt local path info in the DB 
and
+  // put the newly registered local path info from newer attempt 
into the DB.
+  if (appShuffleInfo != null) {
+removeAppAttemptPathInfoFromDB(new AppAttemptId(appId, 
appShuffleInfo.attemptId));
+  }
+  writeAppPathsInfoToDb(appId, attemptId, appPathsInfo);

Review Comment:
   > If we have to guarantee the success of DB removal, but it fails, should we 
fail the new app attempt Executor registration here? I feel the later one is a 
little bit overkill. WDYT?
   
   I don't think it is an overkill. We can have one method that does something 
like this:
   ```
   writeAppsPathInfoAndDeleteOlder(olderAttempt, newAttempt) {
  try{
  deleteOldAttempt
  writeNewAttempt
} catch (IOException) {
  
}
   ```
   If delete fails, write will never be executed.



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -656,6 +771,206 @@ public void registerExecutor(String appId, 
ExecutorShuffleInfo executorInfo) {
 }
   }
 
+  /**
+   * Close the DB during shutdown
+   */
+  @Override
+  public void close() {
+if (db != null) {
+  try {
+db.close();
+  } catch (IOException e) {
+logger.error("Exception closing leveldb with registered app paths info 
and "
++ "shuffle partition info", e);
+  }
+}
+  }
+
+  /**
+   * Write the application attempt's local path information to the DB
+   */
+  private void writeAppPathsInfoToDb(String appId, int attemptId, AppPathsInfo 
appPathsInfo) {
+if (db != null) {
+  try {
+byte[] key = getDbAppAttemptPathsKey(new AppAttemptId(appId, 
attemptId));
+String valueStr = mapper.writeValueAsString(appPathsInfo);
+byte[] value = valueStr.getBytes(StandardCharsets.UTF_8);
+db.put(key, value);
+  } catch (Exception e) {
+logger.error("Error saving registered app paths info", e);
+  }
+}
+  }
+
+  /**
+   * Write the finalized shuffle merge partition information into the DB
+   */
+  private void writeAppAttemptShuffleMergeInfoToDB(
+  AppAttemptShuffleMergeId appAttemptShuffleMergeId) {
+if (db != null) {
+  // Write AppAttemptShuffleMergeId into LevelDB for finalized shuffles
+  try{
+byte[] dbKey = 
getDbAppAttemptShufflePartitionKey(appAttemptShuffleMergeId);
+db.put(dbKey, new byte[0]);
+  } catch (Exception e) {
+logger.error("Error saving active app shuffle partition", e);
+  }
+}
+
+  }
+
+  /**
+   * Parse the DB key with the prefix and the expected return value type
+   */
+  private  T parseDbKey(String key, String prefix, Class valueType) 
throws IOException {
+String json = key.substring(prefix.length() + 1);
+return mapper.readValue(json, valueType);
+  }
+
+  /**
+   * Generate AppAttemptId from the DB key
+   */
+  private AppAttemptId parseDbAppAttemptPathsKey(String key) throws 
IOException {
+return parseDbKey(key, APP_ATTEMPT_PATH_KEY_PREFIX, AppAttemptId.class);
+  }
+
+  /**
+   * Generate AppAttemptShuffleMergeId from the DB key
+   */
+  private AppAttemptShuffleMergeId parseDbAppAttemptShufflePartitionKey(
+  String key) throws IOException {
+return parseDbKey(
+key, APP_ATTEMPT_SHUFFLE_FINALIZE_STATUS_KEY_PREFIX, 
AppAttemptShuffleMergeId.class);
+  }
+
+  /**
+   * Generate the DB key with the key object and the specified string prefix
+   */
+  private byte[] getDbKey(Object key, String prefix) {
+// We add a common prefix on

[GitHub] [spark] dtenedor commented on pull request #36960: [SPARK-39557][SQL] Support ARRAY, STRUCT, MAP types as DEFAULT values

2022-06-22 Thread GitBox



dtenedor commented on PR #36960:
URL: https://github.com/apache/spark/pull/36960#issuecomment-1163679026

   Hi @gengliangwang this PR adds support for DEFAULT values with 
array/map/struct type. It should make it possible to use this feature for more 
varieties of columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor opened a new pull request, #36960: [SPARK-39557][SQL] Support ARRAY, STRUCT, MAP types as DEFAULT values

2022-06-22 Thread GitBox



dtenedor opened a new pull request, #36960:
URL: https://github.com/apache/spark/pull/36960

   ### What changes were proposed in this pull request?
   
   Support ARRAY, STRUCT, MAP types as DEFAULT values.
   
   Previously these types were not supported and DDL commands that attempted to 
use them returned error messages. Now they work wherever DEFAULT values are 
supported.
   
   ### Why are the changes needed?
   
   This new functionality expands the usefulness of DEFAULT column values and 
Spark SQL in general.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes.
   
   ### How was this patch tested?
   
   This PR adds test both positive and negative unit test coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36958: [SPARK-39559][CORE][WEBUI] Support IPv6 in WebUI

2022-06-22 Thread GitBox



dongjoon-hyun closed pull request #36958: [SPARK-39559][CORE][WEBUI] Support 
IPv6 in WebUI
URL: https://github.com/apache/spark/pull/36958


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36958: [SPARK-39559][CORE][WEBUI] Support IPv6 in WebUI

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36958:
URL: https://github.com/apache/spark/pull/36958#issuecomment-1163658132

   Thank you so much, @viirya . Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36958: [SPARK-39559][CORE][WEBUI] Support IPv6 in WebUI

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36958:
URL: https://github.com/apache/spark/pull/36958#issuecomment-1163657925

   All tests passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36959: [SPARK-39561][CORE] Improve `SparkContext` to propagate`java.net.preferIPv6Addresses`

2022-06-22 Thread GitBox



dongjoon-hyun closed pull request #36959: [SPARK-39561][CORE] Improve 
`SparkContext` to propagate`java.net.preferIPv6Addresses`
URL: https://github.com/apache/spark/pull/36959


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36959: [SPARK-39561][CORE] Improve `SparkContext` to propagate`java.net.preferIPv6Addresses`

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36959:
URL: https://github.com/apache/spark/pull/36959#issuecomment-1163656859

   Thank you so much, @huaxingao ! Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36959: [SPARK-39561][CORE] Improve `SparkContext` to propagate`java.net.preferIPv6Addresses`

2022-06-22 Thread GitBox



huaxingao commented on PR #36959:
URL: https://github.com/apache/spark/pull/36959#issuecomment-1163641151

   LGTM. Thanks for pinging me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36958: [SPARK-39559][CORE][WEBUI] Support IPv6 in WebUI

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36958:
URL: https://github.com/apache/spark/pull/36958#issuecomment-1163628594

   Could you review this when you have some time, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

2022-06-22 Thread GitBox



HyukjinKwon commented on PR #36940:
URL: https://github.com/apache/spark/pull/36940#issuecomment-1163598207

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36947: [SPARK-39548][SQL] CreateView Command with a window clause query hit a wrong window definition not found issue

2022-06-22 Thread GitBox



amaliujia commented on code in PR #36947:
URL: https://github.com/apache/spark/pull/36947#discussion_r904198563


##
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##
@@ -4449,6 +4449,28 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 """.stripMargin),
   Seq(Row(2), Row(1)))
   }
+
+  test("CreateView will make query go into inline CTE code path") {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36959: [SPARK-39561][CORE] Improve `SparkContext` to propagate`java.net.preferIPv6Addresses`

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36959:
URL: https://github.com/apache/spark/pull/36959#issuecomment-1163557969

   Could you review this please, @huaxingao ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36959: [SPARK-39561][CORE] Improve `SparkContext` to propagate`java.net.preferIPv6Addresses`

2022-06-22 Thread GitBox



dongjoon-hyun commented on code in PR #36959:
URL: https://github.com/apache/spark/pull/36959#discussion_r904186486


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -407,6 +407,7 @@ class SparkContext(config: SparkConf) extends Logging {
 SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf)
 
 SparkContext.supplementJavaModuleOptions(_conf)
+SparkContext.supplementJavaIPv6Options(_conf)

Review Comment:
   This works in the same way with `SparkContext.supplementJavaModuleOptions` 
in the above line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36959: [SPARK-39561][CORE] Improve `SparkContext` to propagate`java.net.preferIPv6Addresses`

2022-06-22 Thread GitBox



dongjoon-hyun opened a new pull request, #36959:
URL: https://github.com/apache/spark/pull/36959

   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36958: [SPARK-39559][CORE][WEBUI] Support IPv6 in WebUI

2022-06-22 Thread GitBox



dongjoon-hyun opened a new pull request, #36958:
URL: https://github.com/apache/spark/pull/36958

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] singhpk234 commented on a diff in pull request #36948: [SPARK-39547][SQL] V2SessionCatalog should not throw NoSuchDatabaseException in loadNamspaceMetadata

2022-06-22 Thread GitBox



singhpk234 commented on code in PR #36948:
URL: https://github.com/apache/spark/pull/36948#discussion_r904129674


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala:
##
@@ -244,7 +244,7 @@ class V2SessionCatalog(catalog: SessionCatalog)
 
   override def loadNamespaceMetadata(namespace: Array[String]): 
util.Map[String, String] = {
 namespace match {
-  case Array(db) =>
+  case Array(db) if catalog.databaseExists(db) =>
 catalog.getDatabaseMetadata(db).toMetadata

Review Comment:
   makes sense, we can certainly save a call in case db exists, made the 
changes, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36943: [SPARK-39195][SQL][FOLLOWUP] Remove flaky test of OutputCommitCoordinator

2022-06-22 Thread GitBox



dongjoon-hyun commented on PR #36943:
URL: https://github.com/apache/spark/pull/36943#issuecomment-1163490433

   Thank you, @AngersZh , @HyukjinKwon , @cloud-fan .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatible

2022-06-22 Thread GitBox



amaliujia commented on PR #36904:
URL: https://github.com/apache/spark/pull/36904#issuecomment-1163477190

   Also I am seeing this
   ```
   [info] spark-streaming-kinesis-asl: mimaPreviousArtifacts not set, not 
analyzing binary compatibility
   [error] spark-sql: Failed binary compatibility check against 
org.apache.spark:spark-sql_2.12:3.2.0! Found 3 potential problems (filtered 602)
   [error]  * abstract method currentCatalog()java.lang.String in class 
org.apache.spark.sql.catalog.Catalog is present only in current version
   [error]filter with: 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.currentCatalog")
   [error]  * abstract method setCurrentCatalog(java.lang.String)Unit in class 
org.apache.spark.sql.catalog.Catalog is present only in current version
   [error]filter with: 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.setCurrentCatalog")
   [error]  * abstract method listCatalogs()org.apache.spark.sql.Dataset in 
class org.apache.spark.sql.catalog.Catalog is present only in current version
   [error]filter with: 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.listCatalogs")
   ```
   
   Any idea how to make the new API pass such API compatibility check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #36904: [SPARK-39506][SQL] Make CacheTable, isCached, UncacheTable, setCurrentCatalog, currentCatalog, listCatalogs 3l namespace compatib

2022-06-22 Thread GitBox



amaliujia commented on code in PR #36904:
URL: https://github.com/apache/spark/pull/36904#discussion_r904112358


##
sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala:
##
@@ -26,6 +26,22 @@ import 
org.apache.spark.sql.catalyst.DefinedByConstructorParams
 // Note: all classes here are expected to be wrapped in Datasets and so must 
extend
 // DefinedByConstructorParams for the catalog to be able to create encoders 
for them.
 
+/**
+ * A catalog in Spark, as returned by the `listCatalogs` method defined in 
[[Catalog]].
+ *
+ * @param name name of the catalog
+ * @since 3.2.0
+ */
+class CatalogMetadata(
+val name: String)

Review Comment:
   Sounds good for keeping the class and additionally adding a description 
field.
   
   Regrading renaming to `Catalog`, it will have a naming conflict with 
https://github.com/apache/spark/blob/59eee98024dac42309f2e7196c7e68832317f284/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L33



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



MaxGekk commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904103207


##
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala:
##
@@ -74,11 +54,11 @@ class DescribeTableSuite extends 
command.DescribeTableSuiteBase with CommandSuit
   QueryTest.checkAnswer(
 descriptionDf,
 Seq(
-  Row("id", "bigint", ""),
-  Row("data", "string", ""),
-  Row("", "", ""),
-  Row("# Partitioning", "", ""),
-  Row("Part 0", "id", ""),
+  Row("id", "bigint", null),
+  Row("data", "string", null),
+  Row("# Partition Information", "", ""),
+  Row("# col_name", "data_type", "comment"),
+  Row("id", "bigint", null),

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



MaxGekk commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904072394


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala:
##
@@ -80,13 +81,25 @@ case class DescribeTableExec(
   }
 
   private def addPartitioning(rows: ArrayBuffer[InternalRow]): Unit = {
-rows += emptyRow()
-rows += toCatalystRow("# Partitioning", "", "")
-if (table.partitioning.isEmpty) {
-  rows += toCatalystRow("Not partitioned", "", "")
-} else {
-  rows ++= table.partitioning.zipWithIndex.map {
-case (transform, index) => toCatalystRow(s"Part $index", 
transform.describe(), "")
+if (table.partitioning.nonEmpty) {
+  val partitionColumnsOnly = table.partitioning.forall(t => 
t.isInstanceOf[IdentityTransform])
+  if (partitionColumnsOnly) {
+rows += toCatalystRow("# Partition Information", "", "")
+rows += toCatalystRow(s"# ${output(0).name}", output(1).name, 
output(2).name)
+val nameToField = table.schema.map(f => (f.name, f)).toMap
+rows ++= table.partitioning
+  .map(_.asInstanceOf[IdentityTransform])
+  .flatMap(_.ref.fieldNames())

Review Comment:
   I added a test for v2 implementation: partitioning by nested columns. Just 
in case, v1 doesn't support partitioning by nested columns. Also I fixed v2 
impl to pass the new test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

2022-06-22 Thread GitBox



srowen commented on PR #36784:
URL: https://github.com/apache/spark/pull/36784#issuecomment-1163429380

   Sorry, see "Testing with GitHub actions workflow" under 
https://spark.apache.org/developer-tools.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



cloud-fan commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904039714


##
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala:
##
@@ -74,11 +54,11 @@ class DescribeTableSuite extends 
command.DescribeTableSuiteBase with CommandSuit
   QueryTest.checkAnswer(
 descriptionDf,
 Seq(
-  Row("id", "bigint", ""),
-  Row("data", "string", ""),
-  Row("", "", ""),
-  Row("# Partitioning", "", ""),
-  Row("Part 0", "id", ""),
+  Row("id", "bigint", null),
+  Row("data", "string", null),
+  Row("# Partition Information", "", ""),
+  Row("# col_name", "data_type", "comment"),
+  Row("id", "bigint", null),

Review Comment:
   I see, can we at least include `Table Type` in v2 command? It's simply 
checking if the table has a reserved `EXTERNAL` table property.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



MaxGekk commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904016227


##
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala:
##
@@ -74,11 +54,11 @@ class DescribeTableSuite extends 
command.DescribeTableSuiteBase with CommandSuit
   QueryTest.checkAnswer(
 descriptionDf,
 Seq(
-  Row("id", "bigint", ""),
-  Row("data", "string", ""),
-  Row("", "", ""),
-  Row("# Partitioning", "", ""),
-  Row("Part 0", "id", ""),
+  Row("id", "bigint", null),
+  Row("data", "string", null),
+  Row("# Partition Information", "", ""),
+  Row("# col_name", "data_type", "comment"),
+  Row("id", "bigint", null),

Review Comment:
   v1 (hive):
   ```
   
++--+---+
   |col_name|data_type  
   |comment|
   
++--+---+
   |data|string 
   |null   |
   |id  |bigint 
   |null   |
   |# Partition Information |   
   |   |
   |# col_name  |data_type  
   |comment|
   |id  |bigint 
   |null   |
   ||   
   |   |
   |# Detailed Table Information|   
   |   |
   |Database|ns 
   |   |
   |Table   |table  
   |   |
   |Owner   |maximgekk  
   |   |
   |Created Time|Wed Jun 22 09:39:42 PDT 2022   
   |   |
   |Last Access |UNKNOWN
   |   |
   |Created By  |Spark 3.4.0-SNAPSHOT   
   |   |
   |Type|EXTERNAL   
   |   |
   |Provider|hive   
   |   |
   |Comment |this is a test table   
   |   |
   |Table Properties|[transient_lastDdlTime=1655915982] 
   |   |
   |Location|file:/tmp/testcat/table_name   
   |   |
   |Serde Library   
|org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe|   |
   |InputFormat |org.apache.hadoop.mapred.TextInputFormat   
   |   |
   |OutputFormat
|org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|   |
   |Storage Properties  |[serialization.format=1]   
   |   |
   |Partition Provider  |Catalog
   |   |
   
++--+---+
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sigmod commented on a diff in pull request #36909: [SPARK-39511][SQL] Push down limit 1 for right side of left semi/anti join if join condition is empty

2022-06-22 Thread GitBox



sigmod commented on code in PR #36909:
URL: https://github.com/apache/spark/pull/36909#discussion_r904002034


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##
@@ -753,6 +753,9 @@ object LimitPushDown extends Rule[LogicalPlan] {
 // Merge offset value and limit value into LocalLimit and pushes down 
LocalLimit through Offset.
 case LocalLimit(le, Offset(oe, grandChild)) =>
   Offset(oe, LocalLimit(Add(le, oe), grandChild))
+// Push down local limit 1 if join type is LeftSemiOrAnti and join 
condition is empty.
+case j @ Join(_, right, LeftSemiOrAnti(_), None, _) if 
!right.maxRows.exists(_ <= 1) =>

Review Comment:
   > when right.maxRows > 0. It can not guarantee the result must not be empty.
   
   - For left semi with empty condition, it doesn't matter?  Non empty `right` 
means all `left` rows are join output rows?
   - For left anti with empty condition, it doesn't matter neither?  Non empty 
`right` means all `left` rows are killed and join doesn't output any row?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a diff in pull request #36946: [SPARK-39552][SQL] Unify v1 and v2 `DESCRIBE TABLE`

2022-06-22 Thread GitBox



MaxGekk commented on code in PR #36946:
URL: https://github.com/apache/spark/pull/36946#discussion_r904008641


##
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala:
##
@@ -74,11 +54,11 @@ class DescribeTableSuite extends 
command.DescribeTableSuiteBase with CommandSuit
   QueryTest.checkAnswer(
 descriptionDf,
 Seq(
-  Row("id", "bigint", ""),
-  Row("data", "string", ""),
-  Row("", "", ""),
-  Row("# Partitioning", "", ""),
-  Row("Part 0", "id", ""),
+  Row("id", "bigint", null),
+  Row("data", "string", null),
+  Row("# Partition Information", "", ""),
+  Row("# col_name", "data_type", "comment"),
+  Row("id", "bigint", null),

Review Comment:
   v2 (after the PR):
   ```
   
+++---+
   |col_name|data_type   |comment   
 |
   
+++---+
   |id  |bigint  |null  
 |
   |data|string  |null  
 |
   |# Partition Information ||  
 |
   |# col_name  |data_type   |comment   
 |
   |id  |bigint  |null  
 |
   |||  
 |
   |# Metadata Columns  ||  
 |
   |index   |int |Metadata column 
used to conflict with a data column|
   |_partition  |string  |Partition key 
used to store the row|
   |||  
 |
   |# Detailed Table Information||  
 |
   |Name|test_catalog.ns.table   |  
 |
   |Comment |this is a test table|  
 |
   |Location|file:/tmp/testcat/table_name|  
 |
   |Provider|_   |  
 |
   |Owner   |maximgekk   |  
 |
   |Table Properties|[bar=baz]   |  
 |
   
+++---+
   ```
   
   v1 in memory:
   ```
   +++---+
   |col_name|data_type   |comment|
   +++---+
   |data|string  |null   |
   |id  |bigint  |null   |
   |# Partition Information ||   |
   |# col_name  |data_type   |comment|
   |id  |bigint  |null   |
   |||   |
   |# Detailed Table Information||   |
   |Database|ns  |   |
   |Table   |table   |   |
   |Created Time|Wed Jun 22 09:37:48 PDT 2022|   |
   |Last Access |UNKNOWN |   |
   |Created By  |Spark 3.4.0-SNAPSHOT|   |
   |Type|EXTERNAL|   |
   |Provider|parquet |   |
   |Comment |this is a test table|   |
   |Table Properties|[bar=baz]   |   |
   |Location|file:/tmp/testcat/table_name|   |
   |Partition Provider  |Catalog |   |
   +++---+
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to

[GitHub] [spark] sigmod commented on a diff in pull request #36909: [SPARK-39511][SQL] Push down limit 1 for right side of left semi/anti join if join condition is empty

2022-06-22 Thread GitBox



sigmod commented on code in PR #36909:
URL: https://github.com/apache/spark/pull/36909#discussion_r904002034


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##
@@ -753,6 +753,9 @@ object LimitPushDown extends Rule[LogicalPlan] {
 // Merge offset value and limit value into LocalLimit and pushes down 
LocalLimit through Offset.
 case LocalLimit(le, Offset(oe, grandChild)) =>
   Offset(oe, LocalLimit(Add(le, oe), grandChild))
+// Push down local limit 1 if join type is LeftSemiOrAnti and join 
condition is empty.
+case j @ Join(_, right, LeftSemiOrAnti(_), None, _) if 
!right.maxRows.exists(_ <= 1) =>

Review Comment:
   > when right.maxRows > 0. It can not guarantee the result must not be empty.
   
   - For left semi with empty condition, it doesn't matter?  Non empty right 
means all left rows are join output rows?
   - For left anti with empty condition, it doesn't matter neither?  Non empty 
right means all left rows are killed and join doesn't output any row?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 190 matches

Mail list logo