date:20211212

[GitHub] [spark] SparkQA commented on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



SparkQA commented on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992199941


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50596/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



SparkQA commented on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992199502


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50598/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34825: [SPARK-37563][PYTHON] Implement days, seconds, microseconds properties of TimedeltaIndex

2021-12-12 Thread GitBox



SparkQA commented on pull request #34825:
URL: https://github.com/apache/spark/pull/34825#issuecomment-992199116


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50597/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang removed a comment on pull request #34870: [SPARK-37615][BUILD][FOLLOWUP] Upgrade SBT to 1.5.6 in AppVeyor

2021-12-12 Thread GitBox



LuciferYang removed a comment on pull request #34870:
URL: https://github.com/apache/spark/pull/34870#issuecomment-992177373






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AngersZh commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767479297



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,193 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf

[GitHub] [spark] cloud-fan commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767478374



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,193 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf 
hiv

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34874: [SPARK-37622][K8S] Support K8s executor rolling policy

2021-12-12 Thread GitBox



dongjoon-hyun commented on a change in pull request #34874:
URL: https://github.com/apache/spark/pull/34874#discussion_r767477976



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -145,6 +146,20 @@ private[spark] object Config extends Logging {
   .checkValue(_ >= 0, "Interval should be non-negative")
   .createWithDefault(0)
 
+  object ExecutorRollPolicy extends Enumeration {
+val ID, ADD_TIME, TOTAL_GC_TIME, TOTAL_DURATION = Value
+  }
+
+  val EXECUTOR_ROLL_POLICY =
+ConfigBuilder("spark.kubernetes.executor.rollPolicy")
+  .doc("Executor roll policy: Valid values are ID, ADD_TIME, TOTAL_GC_TIME 
(default), " +
+"and TOTAL_DURATION")

Review comment:
   Thank you for review. Sure! I'll add.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34878: [SPARK-37626][BUILD] Upgrade libthrift to 0.15.0

2021-12-12 Thread GitBox



SparkQA commented on pull request #34878:
URL: https://github.com/apache/spark/pull/34878#issuecomment-992195550


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50600/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34852: [SPARK-37591][SQL] Support the GCM mode by `aes_encrypt()`/`aes_decrypt()`

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34852:
URL: https://github.com/apache/spark/pull/34852#discussion_r767477561



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
##
@@ -386,7 +386,7 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSparkSession {
 }
 
 // Unsupported AES mode and padding in decrypt
-checkUnsupportedMode(df2.selectExpr(s"aes_decrypt(value16, '$key16', 
'GSM')"))
+checkUnsupportedMode(df2.selectExpr(s"aes_decrypt(value16, '$key16', 
'GSM', 'PKCS')"))

Review comment:
   shall we test `GCM`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



SparkQA removed a comment on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992124560


   **[Test build #146118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146118/testReport)**
 for PR 34875 at commit 
[`9578511`](https://github.com/apache/spark/commit/9578511d4a7bec1e05ca4ab2bee14bce21cbd45e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



SparkQA commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992194460


   **[Test build #146118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146118/testReport)**
 for PR 34875 at commit 
[`9578511`](https://github.com/apache/spark/commit/9578511d4a7bec1e05ca4ab2bee14bce21cbd45e).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #34874: [SPARK-37622][K8S] Support K8s executor rolling policy

2021-12-12 Thread GitBox



viirya commented on a change in pull request #34874:
URL: https://github.com/apache/spark/pull/34874#discussion_r767476322



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -145,6 +146,20 @@ private[spark] object Config extends Logging {
   .checkValue(_ >= 0, "Interval should be non-negative")
   .createWithDefault(0)
 
+  object ExecutorRollPolicy extends Enumeration {
+val ID, ADD_TIME, TOTAL_GC_TIME, TOTAL_DURATION = Value
+  }
+
+  val EXECUTOR_ROLL_POLICY =
+ConfigBuilder("spark.kubernetes.executor.rollPolicy")
+  .doc("Executor roll policy: Valid values are ID, ADD_TIME, TOTAL_GC_TIME 
(default), " +
+"and TOTAL_DURATION")

Review comment:
   Do we need to describe the meanings for each policy like the description 
does? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down

2021-12-12 Thread GitBox



Ngone51 commented on a change in pull request #34629:
URL: https://github.com/apache/spark/pull/34629#discussion_r767476270



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -620,11 +620,13 @@ private[spark] class BlockManager(
* Note that this method must be called without any BlockInfo locks held.
*/
   def reregister(): Unit = {
-// TODO: We might need to rate limit re-registering.
-logInfo(s"BlockManager $blockManagerId re-registering with master")
-master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString, maxOnHeapMemory,
-  maxOffHeapMemory, storageEndpoint)
-reportAllBlocks()
+if (!SparkEnv.get.isStopped) {

Review comment:
   Sorry, I mean you can update the fix with `HeartbeatReceiver` in this PR.
   





-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #34845: [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

2021-12-12 Thread GitBox



viirya commented on pull request #34845:
URL: https://github.com/apache/spark/pull/34845#issuecomment-992193200


   Thanks. I will make a backport to branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34673:
URL: https://github.com/apache/spark/pull/34673#discussion_r767474830



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -1073,6 +1075,73 @@ object JdbcUtils extends Logging with SQLConfHelper {
 }
   }
 
+  /**
+   * Check if index exists in a table
+   */
+  def checkIfIndexExists(
+  conn: Connection,
+  sql: String,
+  options: JDBCOptions): Boolean = {
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  val rs = statement.executeQuery(sql)
+  rs.next
+} catch {
+  case _: Exception =>
+logWarning("Cannot retrieved index info.")
+false
+} finally {
+  statement.close()
+}
+  }
+
+  /**
+   * Process index properties and return tuple of indexType and list of the 
other index properties.
+   */
+  def processIndexProperties(
+  properties: util.Map[String, String],
+  catalogName: String
+): (String, Array[String]) = {
+var indexType = ""
+val indexPropertyList: ArrayBuffer[String] = ArrayBuffer[String]()
+val supportedIndexTypeList = getSupportedIndexTypeList(catalogName)
+
+if (!properties.isEmpty) {
+  properties.asScala.foreach { case (k, v) =>
+if (k.equals(SupportsIndex.PROP_TYPE)) {
+  if (containsIndexTypeIgnoreCase(supportedIndexTypeList, v)) {
+indexType = s"USING $v"
+  } else {
+throw new UnsupportedOperationException(s"Index Type $v is not 
supported." +
+  s" The supported Index Types are: 
${supportedIndexTypeList.mkString(" AND ")}")
+  }
+} else {
+  indexPropertyList.append(s"$k = $v")
+}
+  }
+}
+(indexType, indexPropertyList.toArray)
+  }
+
+  def containsIndexTypeIgnoreCase(supportedIndexTypeList: Array[String], 
value: String): Boolean = {
+if (supportedIndexTypeList.isEmpty) {
+  throw new UnsupportedOperationException(s"None of index type is 
supported.")

Review comment:
   to be more user-facing: `Cannot specify 'USING index_type' in 'CREATE 
INDEX'`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao opened a new pull request #34879: [SPARK-37627][SQL] Add sorted column in BucketTransform

2021-12-12 Thread GitBox



huaxingao opened a new pull request #34879:
URL: https://github.com/apache/spark/pull/34879


   
   
   ### What changes were proposed in this pull request?
   In V1, we can create table with sorted bucket like the following:
   ```
 sql("CREATE TABLE tbl(a INT, b INT) USING parquet " +
   "CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS")
   ```
   However, creating table with sorted bucket in V2 failed with Exception
   `org.apache.spark.sql.AnalysisException: Cannot convert bucketing with sort 
columns to a transform.`
   
   
   ### Why are the changes needed?
   This PR adds sorted column in BucketTransform so we can create table in V2 
with sorted bucket
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   new UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34673:
URL: https://github.com/apache/spark/pull/34673#discussion_r767474830



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -1073,6 +1075,73 @@ object JdbcUtils extends Logging with SQLConfHelper {
 }
   }
 
+  /**
+   * Check if index exists in a table
+   */
+  def checkIfIndexExists(
+  conn: Connection,
+  sql: String,
+  options: JDBCOptions): Boolean = {
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  val rs = statement.executeQuery(sql)
+  rs.next
+} catch {
+  case _: Exception =>
+logWarning("Cannot retrieved index info.")
+false
+} finally {
+  statement.close()
+}
+  }
+
+  /**
+   * Process index properties and return tuple of indexType and list of the 
other index properties.
+   */
+  def processIndexProperties(
+  properties: util.Map[String, String],
+  catalogName: String
+): (String, Array[String]) = {
+var indexType = ""
+val indexPropertyList: ArrayBuffer[String] = ArrayBuffer[String]()
+val supportedIndexTypeList = getSupportedIndexTypeList(catalogName)
+
+if (!properties.isEmpty) {
+  properties.asScala.foreach { case (k, v) =>
+if (k.equals(SupportsIndex.PROP_TYPE)) {
+  if (containsIndexTypeIgnoreCase(supportedIndexTypeList, v)) {
+indexType = s"USING $v"
+  } else {
+throw new UnsupportedOperationException(s"Index Type $v is not 
supported." +
+  s" The supported Index Types are: 
${supportedIndexTypeList.mkString(" AND ")}")
+  }
+} else {
+  indexPropertyList.append(s"$k = $v")
+}
+  }
+}
+(indexType, indexPropertyList.toArray)
+  }
+
+  def containsIndexTypeIgnoreCase(supportedIndexTypeList: Array[String], 
value: String): Boolean = {
+if (supportedIndexTypeList.isEmpty) {
+  throw new UnsupportedOperationException(s"None of index type is 
supported.")

Review comment:
   to be more user-facing: `Cannot specify index type in CREATE INDEX`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wankunde commented on a change in pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down

2021-12-12 Thread GitBox



wankunde commented on a change in pull request #34629:
URL: https://github.com/apache/spark/pull/34629#discussion_r767474069



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -620,11 +620,13 @@ private[spark] class BlockManager(
* Note that this method must be called without any BlockInfo locks held.
*/
   def reregister(): Unit = {
-// TODO: We might need to rate limit re-registering.
-logInfo(s"BlockManager $blockManagerId re-registering with master")
-master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString, maxOnHeapMemory,
-  maxOffHeapMemory, storageEndpoint)
-reportAllBlocks()
+if (!SparkEnv.get.isStopped) {

Review comment:
   I have updated the PR description.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34673:
URL: https://github.com/apache/spark/pull/34673#discussion_r767473789



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -1073,6 +1075,73 @@ object JdbcUtils extends Logging with SQLConfHelper {
 }
   }
 
+  /**
+   * Check if index exists in a table
+   */
+  def checkIfIndexExists(
+  conn: Connection,
+  sql: String,
+  options: JDBCOptions): Boolean = {
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  val rs = statement.executeQuery(sql)
+  rs.next
+} catch {
+  case _: Exception =>
+logWarning("Cannot retrieved index info.")
+false
+} finally {
+  statement.close()
+}
+  }
+
+  /**
+   * Process index properties and return tuple of indexType and list of the 
other index properties.
+   */
+  def processIndexProperties(
+  properties: util.Map[String, String],
+  catalogName: String
+): (String, Array[String]) = {

Review comment:
   ```suggestion
 catalogName: String): (String, Array[String]) = {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #34831: [SPARK-37574][CORE][SHUFFLE] Simplify fetchBlocks w/o retry

2021-12-12 Thread GitBox



Ngone51 commented on a change in pull request #34831:
URL: https://github.com/apache/spark/pull/34831#discussion_r767473622



##
File path: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
##
@@ -139,14 +139,7 @@ private[spark] class NettyBlockTransferService(
   }
 }
   }
-
-  if (maxRetries > 0) {
-// Note this Fetcher will correctly handle maxRetries == 0; we avoid 
it just in case there's
-// a bug in this code. We should remove the if statement once we're 
sure of the stability.

Review comment:
   When `spark.shuffle.io.maxRetries=0`, it tests `OneForOneBlockFetcher` 
rather than `RetryingBlockTransferor`, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AngersZh commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767473011



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,193 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf

[GitHub] [spark] cloud-fan commented on a change in pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34673:
URL: https://github.com/apache/spark/pull/34673#discussion_r767472919



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -1073,6 +1076,81 @@ object JdbcUtils extends Logging with SQLConfHelper {
 }
   }
 
+  /**
+   * Check if index exists in a table
+   */
+  def checkIfIndexExists(
+  conn: Connection,
+  sql: String,
+  options: JDBCOptions): Boolean = {
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  val rs = statement.executeQuery(sql)
+  rs.next
+} catch {
+  case _: Exception =>
+logWarning("Cannot retrieved index info.")
+false
+} finally {
+  statement.close()
+}
+  }
+
+  /**
+   * Process index properties and return tuple of indexType and list of the 
other index properties.
+   */
+  def processIndexProperties(
+  properties: util.Map[String, String],
+  catalogName: String
+): (String, Array[String]) = {
+var indexType = ""
+val indexPropertyList: ArrayBuffer[String] = ArrayBuffer[String]()
+val supportedIndexTypeList = getSupportedIndexTypeList(catalogName)
+
+if (!properties.isEmpty) {
+  properties.asScala.foreach { case (k, v) =>
+if (k.equals(SupportsIndex.PROP_TYPE)) {
+  if (containsIndexTypeIgnoreCase(supportedIndexTypeList, v)) {
+indexType = s"USING $v"
+  } else {
+throw new UnsupportedOperationException(s"Index Type $v is not 
supported." +
+  s" The supported Index Types are: 
${supportedIndexTypeList.mkString(" AND ")}")
+  }
+} else {
+  indexPropertyList.append(convertPropertyPairToString(catalogName, k, 
v))
+}
+  }
+}
+(indexType, indexPropertyList.toArray)
+  }
+
+  def containsIndexTypeIgnoreCase(supportedIndexTypeList: Array[String], 
value: String): Boolean = {
+if (supportedIndexTypeList.isEmpty) {
+  throw new UnsupportedOperationException(s"None of index type is 
supported.")
+}
+for (indexType <- supportedIndexTypeList) {
+  if (value.equalsIgnoreCase(indexType)) return true
+}
+false
+  }
+
+  def getSupportedIndexTypeList(catalogName: String): Array[String] = {
+catalogName match {
+  case "mysql" => Array("BTREE", "HASH")
+  case "postgresql" => Array("BTREE", "HASH", "BRIN")
+  case _ => Array.empty
+}
+  }

Review comment:
   So the only benefit of adding this API in `JdbcDialect` is to share the 
code of checking unsupported index type earlier, which I don't think is 
worthwhile.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34673:
URL: https://github.com/apache/spark/pull/34673#discussion_r767472919



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -1073,6 +1076,81 @@ object JdbcUtils extends Logging with SQLConfHelper {
 }
   }
 
+  /**
+   * Check if index exists in a table
+   */
+  def checkIfIndexExists(
+  conn: Connection,
+  sql: String,
+  options: JDBCOptions): Boolean = {
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  val rs = statement.executeQuery(sql)
+  rs.next
+} catch {
+  case _: Exception =>
+logWarning("Cannot retrieved index info.")
+false
+} finally {
+  statement.close()
+}
+  }
+
+  /**
+   * Process index properties and return tuple of indexType and list of the 
other index properties.
+   */
+  def processIndexProperties(
+  properties: util.Map[String, String],
+  catalogName: String
+): (String, Array[String]) = {
+var indexType = ""
+val indexPropertyList: ArrayBuffer[String] = ArrayBuffer[String]()
+val supportedIndexTypeList = getSupportedIndexTypeList(catalogName)
+
+if (!properties.isEmpty) {
+  properties.asScala.foreach { case (k, v) =>
+if (k.equals(SupportsIndex.PROP_TYPE)) {
+  if (containsIndexTypeIgnoreCase(supportedIndexTypeList, v)) {
+indexType = s"USING $v"
+  } else {
+throw new UnsupportedOperationException(s"Index Type $v is not 
supported." +
+  s" The supported Index Types are: 
${supportedIndexTypeList.mkString(" AND ")}")
+  }
+} else {
+  indexPropertyList.append(convertPropertyPairToString(catalogName, k, 
v))
+}
+  }
+}
+(indexType, indexPropertyList.toArray)
+  }
+
+  def containsIndexTypeIgnoreCase(supportedIndexTypeList: Array[String], 
value: String): Boolean = {
+if (supportedIndexTypeList.isEmpty) {
+  throw new UnsupportedOperationException(s"None of index type is 
supported.")
+}
+for (indexType <- supportedIndexTypeList) {
+  if (value.equalsIgnoreCase(indexType)) return true
+}
+false
+  }
+
+  def getSupportedIndexTypeList(catalogName: String): Array[String] = {
+catalogName match {
+  case "mysql" => Array("BTREE", "HASH")
+  case "postgresql" => Array("BTREE", "HASH", "BRIN")
+  case _ => Array.empty
+}
+  }

Review comment:
   So the only benefit of adding this API in `JdbcDialect` is the share the 
code of checking unsupported index type earlier, which I don't think is 
worthwhile.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34673:
URL: https://github.com/apache/spark/pull/34673#discussion_r767472272



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -1073,6 +1076,81 @@ object JdbcUtils extends Logging with SQLConfHelper {
 }
   }
 
+  /**
+   * Check if index exists in a table
+   */
+  def checkIfIndexExists(
+  conn: Connection,
+  sql: String,
+  options: JDBCOptions): Boolean = {
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  val rs = statement.executeQuery(sql)
+  rs.next
+} catch {
+  case _: Exception =>
+logWarning("Cannot retrieved index info.")
+false
+} finally {
+  statement.close()
+}
+  }
+
+  /**
+   * Process index properties and return tuple of indexType and list of the 
other index properties.
+   */
+  def processIndexProperties(
+  properties: util.Map[String, String],
+  catalogName: String
+): (String, Array[String]) = {
+var indexType = ""
+val indexPropertyList: ArrayBuffer[String] = ArrayBuffer[String]()
+val supportedIndexTypeList = getSupportedIndexTypeList(catalogName)
+
+if (!properties.isEmpty) {
+  properties.asScala.foreach { case (k, v) =>
+if (k.equals(SupportsIndex.PROP_TYPE)) {
+  if (containsIndexTypeIgnoreCase(supportedIndexTypeList, v)) {
+indexType = s"USING $v"
+  } else {
+throw new UnsupportedOperationException(s"Index Type $v is not 
supported." +
+  s" The supported Index Types are: 
${supportedIndexTypeList.mkString(" AND ")}")
+  }
+} else {
+  indexPropertyList.append(convertPropertyPairToString(catalogName, k, 
v))
+}
+  }
+}
+(indexType, indexPropertyList.toArray)
+  }
+
+  def containsIndexTypeIgnoreCase(supportedIndexTypeList: Array[String], 
value: String): Boolean = {
+if (supportedIndexTypeList.isEmpty) {
+  throw new UnsupportedOperationException(s"None of index type is 
supported.")
+}
+for (indexType <- supportedIndexTypeList) {
+  if (value.equalsIgnoreCase(indexType)) return true
+}
+false
+  }
+
+  def getSupportedIndexTypeList(catalogName: String): Array[String] = {
+catalogName match {
+  case "mysql" => Array("BTREE", "HASH")
+  case "postgresql" => Array("BTREE", "HASH", "BRIN")
+  case _ => Array.empty
+}
+  }

Review comment:
   I don't think so. This is for failing earlier in `createIndex`, where 
the JDBC dialect implementation already has full control.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34878: [SPARK-37626][BUILD] Upgrade libthrift to 0.15.0

2021-12-12 Thread GitBox



SparkQA removed a comment on pull request #34878:
URL: https://github.com/apache/spark/pull/34878#issuecomment-992181842


   **[Test build #146125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146125/testReport)**
 for PR 34878 at commit 
[`1b7eb59`](https://github.com/apache/spark/commit/1b7eb598e4361a11e7e89f3d9a53af1b14610ade).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34878: [SPARK-37626][BUILD] Upgrade libthrift to 0.15.0

2021-12-12 Thread GitBox



SparkQA commented on pull request #34878:
URL: https://github.com/apache/spark/pull/34878#issuecomment-992187435


   **[Test build #146125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146125/testReport)**
 for PR 34878 at commit 
[`1b7eb59`](https://github.com/apache/spark/commit/1b7eb598e4361a11e7e89f3d9a53af1b14610ade).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak edited a comment on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



sarutak edited a comment on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992185125


   @xinrong-databricks I've modified the URL.
   I just cherry-picked the two commits in this change (9578511 and dd84175) to 
`branch-3.2`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



xinrong-databricks commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992186363


   Got it, thanks @sarutak 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Kwafoor commented on pull request #34862: [SPARK-30471][SQL]Fix issue when comparing String and IntegerType

2021-12-12 Thread GitBox



Kwafoor commented on pull request #34862:
URL: https://github.com/apache/spark/pull/34862#issuecomment-992185361


   > > But I still think SparkSQL should remind user where you wrong.
   > 
   > This doesn't answer the question of why you only fix string integer 
comparison here.
   
   I think string integer comparison is common and easier to encounter,I 
haven't meet the other case.I have just tried, and also encounter problem(long, 
short, etc).
   And I agree with you it's better to not introduce inconsistency to the 
system.
   About other like math(+-*/), it's right about string and integer(SparkSQL 
cast string to double type).
   
   May be I make a confusion title.I found the issue in comparing String and 
IntegerType, and the reason is string2int cast overflow.
   So I fix issue by thrown exception in cast code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



sarutak commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992185125


   @xinrong-databricks I've modified the URL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak edited a comment on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



sarutak edited a comment on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992172755


   @HyukjinKwon 
   LGTM. BTW, the notebook raises an error which is not relevant to this change 
(the runtime is 3.2.0, while `pandas_api` is not in this version).
   ![Screenshot from 2021-12-13 
16-09-18](https://user-images.githubusercontent.com/4736016/145768084-5a054996-d444-4d12-956d-21fb1c744e1f.png)
   
   So, I applied this change to `branch-3.2` and confirmed it works.
   
https://mybinder.org/v2/gh/sarutak/spark/SPARK-37624-branch-3.2?labpath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34673:
URL: https://github.com/apache/spark/pull/34673#discussion_r767468635



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -1073,6 +1076,81 @@ object JdbcUtils extends Logging with SQLConfHelper {
 }
   }
 
+  /**
+   * Check if index exists in a table
+   */
+  def checkIfIndexExists(
+  conn: Connection,
+  sql: String,
+  options: JDBCOptions): Boolean = {
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  val rs = statement.executeQuery(sql)
+  rs.next
+} catch {
+  case _: Exception =>
+logWarning("Cannot retrieved index info.")
+false
+} finally {
+  statement.close()
+}
+  }
+
+  /**
+   * Process index properties and return tuple of indexType and list of the 
other index properties.
+   */
+  def processIndexProperties(
+  properties: util.Map[String, String],
+  catalogName: String
+): (String, Array[String]) = {
+var indexType = ""
+val indexPropertyList: ArrayBuffer[String] = ArrayBuffer[String]()
+val supportedIndexTypeList = getSupportedIndexTypeList(catalogName)
+
+if (!properties.isEmpty) {
+  properties.asScala.foreach { case (k, v) =>
+if (k.equals(SupportsIndex.PROP_TYPE)) {
+  if (containsIndexTypeIgnoreCase(supportedIndexTypeList, v)) {
+indexType = s"USING $v"
+  } else {
+throw new UnsupportedOperationException(s"Index Type $v is not 
supported." +
+  s" The supported Index Types are: 
${supportedIndexTypeList.mkString(" AND ")}")
+  }
+} else {
+  indexPropertyList.append(convertPropertyPairToString(catalogName, k, 
v))
+}
+  }
+}
+(indexType, indexPropertyList.toArray)
+  }
+
+  def containsIndexTypeIgnoreCase(supportedIndexTypeList: Array[String], 
value: String): Boolean = {
+if (supportedIndexTypeList.isEmpty) {
+  throw new UnsupportedOperationException(s"None of index type is 
supported.")
+}
+for (indexType <- supportedIndexTypeList) {
+  if (value.equalsIgnoreCase(indexType)) return true
+}
+false
+  }
+
+  def getSupportedIndexTypeList(catalogName: String): Array[String] = {
+catalogName match {
+  case "mysql" => Array("BTREE", "HASH")
+  case "postgresql" => Array("BTREE", "HASH", "BRIN")
+  case _ => Array.empty
+}
+  }

Review comment:
   Yea +1 to put it in `JdbcDialect`, instead of hardcoding the dialect 
names here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AmplabJenkins commented on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992182882


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146123/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



SparkQA removed a comment on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992174522


   **[Test build #146123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146123/testReport)**
 for PR 34821 at commit 
[`a06b166`](https://github.com/apache/spark/commit/a06b1662c61639a5b2babc9f37f2e82dd3789607).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #34367: [SPARK-37099][SQL] Impl a rank-based filter to optimize top-k computation

2021-12-12 Thread GitBox



zhengruifeng commented on a change in pull request #34367:
URL: https://github.com/apache/spark/pull/34367#discussion_r767467761



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/RankLimitExec.scala
##
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+package org.apache.spark.sql.execution.window
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.plans.physical._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.metric.SQLMetrics
+import org.apache.spark.util.collection.Utils
+
+
+sealed trait RankLimitMode
+
+case object Partial extends RankLimitMode
+
+case object Final extends RankLimitMode
+
+
+
+/**
+ * This operator is designed to filter out unnecessary rows before WindowExec,
+ * for top-k computation.
+ * @param partitionSpec Should be the same as [[WindowExec#partitionSpec]]
+ * @param orderSpec Should be the same as [[WindowExec#orderSpec]]
+ * @param rankFunction The function to compute row rank, should be 
RowNumber/Rank/DenseRank.
+ */
+case class RankLimitExec(
+partitionSpec: Seq[Expression],
+orderSpec: Seq[SortOrder],
+rankFunction: Expression,
+limit: Int,
+mode: RankLimitMode,
+child: SparkPlan) extends UnaryExecNode {
+  assert(orderSpec.nonEmpty && limit > 0)
+
+  private val shouldPass = child match {
+case r: RankLimitExec =>
+  partitionSpec.size == r.partitionSpec.size &&
+partitionSpec.zip(r.partitionSpec).forall(p => 
p._1.semanticEquals(p._2)) &&
+orderSpec.size == r.orderSpec.size &&
+orderSpec.zip(r.orderSpec).forall(o => o._1.semanticEquals(o._2)) &&
+rankFunction.semanticEquals(r.rankFunction) &&
+mode == Final && r.mode == Partial && limit == r.limit
+case _ => false
+  }
+
+  private val shouldApplyTakeOrdered: Boolean = {
+rankFunction match {
+  case _: RowNumber => limit < conf.topKSortFallbackThreshold
+  case _: Rank => false
+  case _: DenseRank => false
+  case f => throw new IllegalArgumentException(s"Unsupported rank 
function: $f")
+}
+  }
+
+  override def output: Seq[Attribute] = child.output
+
+  override def requiredChildOrdering: Seq[Seq[SortOrder]] = {
+if (shouldApplyTakeOrdered) {
+  Seq(partitionSpec.map(SortOrder(_, Ascending)))
+} else {
+  // Should be the same as [[WindowExec#requiredChildOrdering]]
+  Seq(partitionSpec.map(SortOrder(_, Ascending)) ++ orderSpec)
+}
+  }
+
+  override def outputOrdering: Seq[SortOrder] = {
+partitionSpec.map(SortOrder(_, Ascending)) ++ orderSpec
+  }
+
+  override def requiredChildDistribution: Seq[Distribution] = mode match {
+case Partial => super.requiredChildDistribution
+case Final =>
+  // Should be the same as [[WindowExec#requiredChildDistribution]]
+  if (partitionSpec.isEmpty) {
+AllTuples :: Nil
+  } else ClusteredDistribution(partitionSpec) :: Nil
+  }
+
+  override def outputPartitioning: Partitioning = child.outputPartitioning
+
+  override lazy val metrics = Map(
+"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output 
rows"))
+
+  private lazy val ordering = GenerateOrdering.generate(orderSpec, output)
+
+  private lazy val limitFunction = rankFunction match {
+case _: RowNumber if shouldApplyTakeOrdered =>
+  (stream: Iterator[InternalRow]) =>
+Utils.takeOrdered(stream.map(_.copy()), limit)(ordering)

Review comment:
 ```
   val TOP_K_SORT_FALLBACK_THRESHOLD =
   buildConf("spark.sql.execution.topKSortFallbackThreshold")
 .doc("In SQL queries with a SORT followed by a LIMIT like " +
 "'SELECT x FROM t ORDER BY y LIMIT m', if m is under this 
threshold, do a top-K sort" +
 " in memory, otherwise do a global sort which spills to disk if 
necessary.")
 .version("2.4.0")
 .intConf
 .createWithDefault(ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH)
   ```
   
   I think we can still share the same threshold. The problem

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AmplabJenkins removed a comment on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992182882


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146123/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



sarutak commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992182720


   @xinrong-databricks Oops. Let me check it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



SparkQA commented on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992182620


   **[Test build #146123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146123/testReport)**
 for PR 34821 at commit 
[`a06b166`](https://github.com/apache/spark/commit/a06b1662c61639a5b2babc9f37f2e82dd3789607).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



xinrong-databricks commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992182008


   Hi @sarutak "Binder not found" after clicking the link attached.
   
   May I ask which change you applied?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34878: [SPARK-37626][BUILD] Upgrade libthrift to 0.15.0

2021-12-12 Thread GitBox



SparkQA commented on pull request #34878:
URL: https://github.com/apache/spark/pull/34878#issuecomment-992181842


   **[Test build #146125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146125/testReport)**
 for PR 34878 at commit 
[`1b7eb59`](https://github.com/apache/spark/commit/1b7eb598e4361a11e7e89f3d9a53af1b14610ade).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34738:
URL: https://github.com/apache/spark/pull/34738#discussion_r767464502



##
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##
@@ -120,52 +122,124 @@ class JDBCV2Suite extends QueryTest with 
SharedSparkSession with ExplainSuiteHel
   .table("h2.test.employee")
   .filter($"dept" > 1)
   .limit(1)
-checkPushedLimit(df2, true, 1)
+checkPushedLimit(df2, Some(1))
 checkAnswer(df2, Seq(Row(2, "alex", 12000.00, 1200.0)))
 
 val df3 = sql("SELECT name FROM h2.test.employee WHERE dept > 1 LIMIT 1")
 val scan = df3.queryExecution.optimizedPlan.collectFirst {
   case s: DataSourceV2ScanRelation => s
 }.get
 assert(scan.schema.names.sameElements(Seq("NAME")))
-checkPushedLimit(df3, true, 1)
+checkPushedLimit(df3, Some(1))
 checkAnswer(df3, Seq(Row("alex")))
 
 val df4 = spark.read
   .table("h2.test.employee")
   .groupBy("DEPT").sum("SALARY")
   .limit(1)
-checkPushedLimit(df4, false, 0)
+checkPushedLimit(df4, None)
 checkAnswer(df4, Seq(Row(1, 19000.00)))
 
+val name = udf { (x: String) => x.matches("cat|dav|amy") }
+val sub = udf { (x: String) => x.substring(0, 3) }
 val df5 = spark.read
   .table("h2.test.employee")
-  .sort("SALARY")
+  .select($"SALARY", $"BONUS", sub($"NAME").as("shortName"))
+  .filter(name($"shortName"))
+  .limit(1)
+// LIMIT is pushed down only if all the filters are pushed down
+checkPushedLimit(df5, None)
+checkAnswer(df5, Seq(Row(1.00, 1000.0, "amy")))
+  }
+
+  private def checkPushedLimit(df: DataFrame, limit: Option[Int]): Unit = {

Review comment:
   shall we merge `checkPushedTopN` into this? `def checkPushedLimit(df: 
DataFrame, limit: Option[Int], sortValues: Seq[SortValue] = Nil)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bozhang2820 opened a new pull request #34878: [SPARK-37626][BUILD] Upgrade libthrift to 0.15.0

2021-12-12 Thread GitBox



bozhang2820 opened a new pull request #34878:
URL: https://github.com/apache/spark/pull/34878


   ### What changes were proposed in this pull request?
   This PR upgrades libthrift from 0.12.0 to 0.15.0.
   
   ### Why are the changes needed?
   This is to avoid https://nvd.nist.gov/vuln/detail/CVE-2020-13949.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Will rely on PR testings.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34738:
URL: https://github.com/apache/spark/pull/34738#discussion_r767463616



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##
@@ -246,16 +247,42 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 }
   }
 
+  private def pushDownLimit(plan: LogicalPlan, limit: Int): LogicalPlan = plan 
match {
+case operation @ ScanOperation(_, filter, sHolder: ScanBuilderHolder) if 
filter.isEmpty =>
+  val limitPushed = PushDownUtils.pushLimit(sHolder.builder, limit)
+  if (limitPushed) {
+sHolder.pushedLimit = Some(limit)
+  }
+  operation
+case s @ Sort(order, _, operation @ ScanOperation(_, filter, sHolder: 
ScanBuilderHolder))
+  if filter.isEmpty =>
+  val orders = DataSourceStrategy.translateSortOrders(order)
+  val topNPushed = PushDownUtils.pushTopN(sHolder.builder, orders.toArray, 
limit)
+  if (topNPushed) {
+sHolder.pushedLimit = Some(limit)
+sHolder.sortValues = orders
+operation
+  } else {
+s
+  }
+case p: Project =>
+  val newChild = pushDownLimit(p.child, limit)
+  if (newChild == p.child) {
+p
+  } else {
+p.copy(child = newChild)
+  }
+case other => other
+  }
+
   def applyLimit(plan: LogicalPlan): LogicalPlan = plan.transform {
 case globalLimit @ Limit(IntegerLiteral(limitValue), child) =>
-  child match {
-case ScanOperation(_, filter, sHolder: ScanBuilderHolder) if 
filter.length == 0 =>
-  val limitPushed = PushDownUtils.pushLimit(sHolder.builder, 
limitValue)
-  if (limitPushed) {
-sHolder.pushedLimit = Some(limitValue)
-  }
-  globalLimit
-case _ => globalLimit
+  val newChild = pushDownLimit(child, limitValue)
+  if (newChild == child) {

Review comment:
   ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34738:
URL: https://github.com/apache/spark/pull/34738#discussion_r767463374



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##
@@ -246,16 +247,42 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
 }
   }
 
+  private def pushDownLimit(plan: LogicalPlan, limit: Int): LogicalPlan = plan 
match {
+case operation @ ScanOperation(_, filter, sHolder: ScanBuilderHolder) if 
filter.isEmpty =>
+  val limitPushed = PushDownUtils.pushLimit(sHolder.builder, limit)
+  if (limitPushed) {
+sHolder.pushedLimit = Some(limit)
+  }
+  operation
+case s @ Sort(order, _, operation @ ScanOperation(_, filter, sHolder: 
ScanBuilderHolder))
+  if filter.isEmpty =>
+  val orders = DataSourceStrategy.translateSortOrders(order)
+  val topNPushed = PushDownUtils.pushTopN(sHolder.builder, orders.toArray, 
limit)
+  if (topNPushed) {
+sHolder.pushedLimit = Some(limit)
+sHolder.sortValues = orders
+operation
+  } else {
+s
+  }
+case p: Project =>
+  val newChild = pushDownLimit(p.child, limit)
+  if (newChild == p.child) {

Review comment:
   we can call `p.withNewChildren(Seq(newChild))`, which does this check 
for you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #34870: [SPARK-37615][BUILD][FOLLOWUP] Upgrade SBT to 1.5.6 in AppVeyor

2021-12-12 Thread GitBox



LuciferYang commented on pull request #34870:
URL: https://github.com/apache/spark/pull/34870#issuecomment-992177373


   > Is that Maven plugin relevant to this PR, @LuciferYang ?
   
   No, just feed back another scene about the log4j2 will be download in Spark 
build process, I will file a new Jira to tracking this later
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang edited a comment on pull request #34870: [SPARK-37615][BUILD][FOLLOWUP] Upgrade SBT to 1.5.6 in AppVeyor

2021-12-12 Thread GitBox



LuciferYang edited a comment on pull request #34870:
URL: https://github.com/apache/spark/pull/34870#issuecomment-992177373


   > Is that Maven plugin relevant to this PR, @LuciferYang ?
   
   No, just feed back another issue about the log4j2 will be download in Spark 
build process, I will file a new Jira to tracking this later
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34825: [SPARK-37563][PYTHON] Implement days, seconds, microseconds properties of TimedeltaIndex

2021-12-12 Thread GitBox



SparkQA commented on pull request #34825:
URL: https://github.com/apache/spark/pull/34825#issuecomment-992177100


   **[Test build #146124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146124/testReport)**
 for PR 34825 at commit 
[`caad4b4`](https://github.com/apache/spark/commit/caad4b458622c8c65901c40fbf26811069465a28).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34738:
URL: https://github.com/apache/spark/pull/34738#discussion_r767462841



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
##
@@ -196,6 +196,10 @@ class JDBCOptions(
   // This only applies to Data Source V2 JDBC
   val pushDownLimit = parameters.getOrElse(JDBC_PUSHDOWN_LIMIT, 
"false").toBoolean
 
+  // An option to allow/disallow pushing down query of top N into V2 JDBC data 
source
+  // This only applies to Data Source V2 JDBC
+  val pushDownTopN = parameters.getOrElse(JDBC_PUSHDOWN_TOP_N, 
"false").toBoolean

Review comment:
   I'm wondering if we should just reuse the limit pushdown option, as it's 
kind of a special case of LIMIT.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34738:
URL: https://github.com/apache/spark/pull/34738#discussion_r767462496



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -142,13 +142,25 @@ case class RowDataSourceScanExec(
   handledFilters
 }
 
+val topNOrLimitInfo =
+  if (pushedDownOperators.limit.isDefined && 
pushedDownOperators.sortValues.nonEmpty) {
+val pushedTopN =
+  s"""
+ |ORDER BY 
${seqToString(pushedDownOperators.sortValues.map(_.describe()))}
+ |LIMIT ${pushedDownOperators.limit.get}
+ |""".stripMargin.replaceAll("\n", " ")
+Some("pushedTopN" -> pushedTopN)
+} else {
+pushedDownOperators.limit.map(value => "PushedLimit" -> s"LIMIT 
$value")

Review comment:
   indentation is wrong




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34738:
URL: https://github.com/apache/spark/pull/34738#discussion_r767462354



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -142,13 +142,25 @@ case class RowDataSourceScanExec(
   handledFilters
 }
 
+val topNOrLimitInfo =
+  if (pushedDownOperators.limit.isDefined && 
pushedDownOperators.sortValues.nonEmpty) {
+val pushedTopN =

Review comment:
   ```
   val pushedTopN = s"ORDER BY 
${seqToString(pushedDownOperators.sortValues.map(_.describe()))}" +
 " LIMIT ${pushedDownOperators.limit.get}"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #34845: [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

2021-12-12 Thread GitBox



dongjoon-hyun commented on pull request #34845:
URL: https://github.com/apache/spark/pull/34845#issuecomment-992175367


   Thank you, @viirya , @HyukjinKwon , @cloud-fan . Merged to master.
   
   Could you make a backport to branch-3.2, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34825: [SPARK-37563][PYTHON] Implement days, seconds, microseconds properties of TimedeltaIndex

2021-12-12 Thread GitBox



SparkQA commented on pull request #34825:
URL: https://github.com/apache/spark/pull/34825#issuecomment-992174495


   **[Test build #146122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146122/testReport)**
 for PR 34825 at commit 
[`48170bd`](https://github.com/apache/spark/commit/48170bd231b0bc140876160e2d4362a8f3249afb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



SparkQA commented on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992174522


   **[Test build #146123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146123/testReport)**
 for PR 34821 at commit 
[`a06b166`](https://github.com/apache/spark/commit/a06b1662c61639a5b2babc9f37f2e82dd3789607).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767460722



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,193 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf 
hiv

[GitHub] [spark] dongjoon-hyun closed pull request #34845: [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

2021-12-12 Thread GitBox



dongjoon-hyun closed pull request #34845:
URL: https://github.com/apache/spark/pull/34845


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34825: [SPARK-37563][PYTHON] Implement days, seconds, microseconds properties of TimedeltaIndex

2021-12-12 Thread GitBox



xinrong-databricks commented on a change in pull request #34825:
URL: https://github.com/apache/spark/pull/34825#discussion_r767460516



##
File path: python/pyspark/pandas/tests/indexes/test_timedelta.py
##
@@ -0,0 +1,84 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from datetime import timedelta
+
+import pandas as pd
+
+import pyspark.pandas as ps
+from pyspark.testing.pandasutils import PandasOnSparkTestCase, TestUtils
+
+
+class TimedeltaIndexTest(PandasOnSparkTestCase, TestUtils):

Review comment:
   Makes sense, added!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



sarutak commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992172755


   @HyukjinKwon 
   LGTM. BTW, the notebook raises an error which is not relevant to this change 
(the runtime is 3.2.0, while `pandas_api` is not in this version).
   ![Screenshot from 2021-12-13 
16-09-18](https://user-images.githubusercontent.com/4736016/145768084-5a054996-d444-4d12-956d-21fb1c744e1f.png)
   
   So, I applied this change to `branch-3.2` and confirmed it works.
   
https://hub.gke2.mybinder.org/user/sarutak-spark-wbwn4kx0/doc/workspaces/auto-O/tree/python/docs/source/getting_started/quickstart_ps.ipynb
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #34870: [SPARK-37615][BUILD][FOLLOWUP] Upgrade SBT to 1.5.6 in AppVeyor

2021-12-12 Thread GitBox



dongjoon-hyun commented on pull request #34870:
URL: https://github.com/apache/spark/pull/34870#issuecomment-992172435


   Is that Maven plugin relevant to this PR, @LuciferYang ?
   > There are some Maven plugin will also implicitly introduce lower versions 
of log4j2, like scala-maven-plugin, the import relationship is as follows:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34877: [SPARK-37625][WIP] update log4j to 2.15

2021-12-12 Thread GitBox



AmplabJenkins commented on pull request #34877:
URL: https://github.com/apache/spark/pull/34877#issuecomment-992172402


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AngersZh commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767434768



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,192 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34845: [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

2021-12-12 Thread GitBox



AmplabJenkins removed a comment on pull request #34845:
URL: https://github.com/apache/spark/pull/34845#issuecomment-992171056


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146114/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



AmplabJenkins removed a comment on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992171054






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #34825: [SPARK-37563][PYTHON] Implement days, seconds, microseconds properties of TimedeltaIndex

2021-12-12 Thread GitBox



xinrong-databricks commented on a change in pull request #34825:
URL: https://github.com/apache/spark/pull/34825#discussion_r767458829



##
File path: python/pyspark/pandas/indexes/timedelta.py
##
@@ -111,3 +126,80 @@ def __getattr__(self, item: str) -> Any:
 return partial(property_or_func, self)
 
 raise AttributeError("'TimedeltaIndex' object has no attribute 
'{}'".format(item))
+
+@property
+def days(self) -> Index:
+"""
+Number of days for each element.
+"""
+
+@no_type_check
+def pandas_days(x) -> int:
+return x.days
+
+return ps.Index(self.to_series().transform(pandas_days))
+
+@property
+def seconds(self) -> Index:
+"""
+Number of seconds (>= 0 and less than 1 day) for each element.
+"""
+sdf = self._internal.spark_frame
+hour_scol_name = verify_temp_column_name(sdf, "__hour_column__")
+minute_scol_name = verify_temp_column_name(sdf, "__minute_column__")
+second_scol_name = verify_temp_column_name(sdf, "__second_column__")
+sum_scol_name = verify_temp_column_name(sdf, "__sum_column__")
+
+# Extract the hours part, minutes part, seconds part and its 
fractional part with microseconds
+sdf = sdf.select(
+F.expr("date_part('HOUR', %s)" % SPARK_DEFAULT_INDEX_NAME),
+F.expr("date_part('MINUTE', %s)" % SPARK_DEFAULT_INDEX_NAME),
+F.expr("date_part('SECOND', %s)" % SPARK_DEFAULT_INDEX_NAME),

Review comment:
   Thank you @HyukjinKwon and @ueshin ! Updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34845: [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

2021-12-12 Thread GitBox



AmplabJenkins commented on pull request #34845:
URL: https://github.com/apache/spark/pull/34845#issuecomment-992171056


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146114/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



AmplabJenkins commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992171054






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



SparkQA commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992170302


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50593/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34844: [SPARK-37592][SQL] Improve performance of `JoinSelection`

2021-12-12 Thread GitBox



SparkQA commented on pull request #34844:
URL: https://github.com/apache/spark/pull/34844#issuecomment-992166601


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50595/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] 3333qwe opened a new pull request #34877: [SPARK-00000][WIP] update log4j to 2.15

2021-12-12 Thread GitBox



qwe opened a new pull request #34877:
URL: https://github.com/apache/spark/pull/34877


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AngersZh commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767453102



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,192 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf

[GitHub] [spark] SparkQA commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



SparkQA commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992163882


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50594/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AngersZh commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767453102



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,192 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf

[GitHub] [spark] dongjoon-hyun commented on pull request #34874: [SPARK-37622][K8S] Support K8s executor rolling policy

2021-12-12 Thread GitBox



dongjoon-hyun commented on pull request #34874:
URL: https://github.com/apache/spark/pull/34874#issuecomment-992163124


   Could you review this please, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



SparkQA commented on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992162971


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50596/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34845: [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

2021-12-12 Thread GitBox



SparkQA removed a comment on pull request #34845:
URL: https://github.com/apache/spark/pull/34845#issuecomment-992048197


   **[Test build #146114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146114/testReport)**
 for PR 34845 at commit 
[`a9862c3`](https://github.com/apache/spark/commit/a9862c39b9b37dadc988da33e86d1ed26f7defc3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34845: [SPARK-37577][SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning

2021-12-12 Thread GitBox



SparkQA commented on pull request #34845:
URL: https://github.com/apache/spark/pull/34845#issuecomment-992161795


   **[Test build #146114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146114/testReport)**
 for PR 34845 at commit 
[`a9862c3`](https://github.com/apache/spark/commit/a9862c39b9b37dadc988da33e86d1ed26f7defc3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 commented on a change in pull request #34831: [SPARK-37574][CORE][SHUFFLE] Simplify fetchBlocks w/o retry

2021-12-12 Thread GitBox



pan3793 commented on a change in pull request #34831:
URL: https://github.com/apache/spark/pull/34831#discussion_r767450865



##
File path: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
##
@@ -139,14 +139,7 @@ private[spark] class NettyBlockTransferService(
   }
 }
   }
-
-  if (maxRetries > 0) {
-// Note this Fetcher will correctly handle maxRetries == 0; we avoid 
it just in case there's
-// a bug in this code. We should remove the if statement once we're 
sure of the stability.

Review comment:
   There are several place overwrite the conf to `0`, please search 
`("spark.shuffle.io.maxRetries", "0")` in `ExternalShuffleIntegrationSuite` and 
`BlockManagerSuite`

##
File path: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
##
@@ -139,14 +139,7 @@ private[spark] class NettyBlockTransferService(
   }
 }
   }
-
-  if (maxRetries > 0) {
-// Note this Fetcher will correctly handle maxRetries == 0; we avoid 
it just in case there's
-// a bug in this code. We should remove the if statement once we're 
sure of the stability.

Review comment:
   There are several places overwrite the conf to `0`, please search 
`("spark.shuffle.io.maxRetries", "0")` in `ExternalShuffleIntegrationSuite` and 
`BlockManagerSuite`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan edited a comment on pull request #34862: [SPARK-30471][SQL]Fix issue when comparing String and IntegerType

2021-12-12 Thread GitBox



cloud-fan edited a comment on pull request #34862:
URL: https://github.com/apache/spark/pull/34862#issuecomment-992160298


   > But I still think SparkSQL should remind user where you wrong.
   
   This doesn't answer the question of why you only fix string integer 
comparison here.
   
   > Or we have to suggest user better to use cast explicitly
   
   This works for me. We should discourage relying on implicit cast.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #34862: [SPARK-30471][SQL]Fix issue when comparing String and IntegerType

2021-12-12 Thread GitBox



cloud-fan commented on pull request #34862:
URL: https://github.com/apache/spark/pull/34862#issuecomment-992160298


   > But I still think SparkSQL should remind user where you wrong.
   
   This doesn't answer the question of why you only fix string integer 
comparison.
   
   > Or we have to suggest user better to use cast explicitly
   
   This works for me. We should discourage relying on implicit cast.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767448960



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,192 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf 
hiv

[GitHub] [spark] cloud-fan commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767448688



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,192 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf 
hiv

[GitHub] [spark] cloud-fan commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767448688



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,192 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.
+
+## Spark SQL Command Line Options
+
+You may run `./bin/spark-sql --help` for a complete list of all available 
options.
+
+CLI options:
+ -d,--define   Variable substitution to apply to Hive
+  commands. e.g. -d A=B or --define A=B
+--database  Specify the database to use
+ -e  SQL from command line
+ -f SQL from files
+ -H,--helpPrint help information
+--hiveconfUse value for given property
+--hivevar  Variable substitution to apply to Hive
+  commands. e.g. --hivevar A=B
+ -i Initialization SQL file
+ -S,--silent  Silent mode in interactive shell
+ -v,--verbose Verbose mode (echo executed SQL to the
+  console)
+
+## The hiverc File
+
+When invoked without the `-i`, the Spark SQL CLI will attempt to load 
`$HIVE_HOME/bin/.hiverc` and `$HOME/.hiverc` as initialization files.
+
+## Supported comment types
+
+
+CommentExample
+
+  simple comment
+  
+  
+  -- This is a simple comment.
+  
+  SELECT 1;
+  
+  
+
+
+  bracketed comment
+  
+
+/* This is a bracketed comment. */
+
+SELECT 1;
+
+  
+
+
+  nested bracketed comment
+  
+
+/*  This is a /* nested bracketed comment*/ .*/
+
+SELECT 1;
+
+  
+
+
+
+## Spark SQL CLI Interactive Shell Commands
+
+When `./bin/spark-sql` is run without either the `-e` or `-f` option, it 
enters interactive shell mode.
+Use `;` (semicolon) to terminate commands. Notice:
+1. The CLI use `;` to terminate commands only when it's at the end of line, 
and it's not escaped by `\\;`.
+2. `;` is the only way to terminate commands. If the user types `SELECT 1` and 
presses enter, the console will just wait for input.
+3. If the user types multiple commands in one line like `SELECT 1; SELECT 2;`, 
the commands `SELECT 1` and `SELECT 2` will be executed separatly.
+4. If `;` appears within a SQL statement (not the end of the line), then it 
has no special meanings:
+   ```sql
+   -- This is a ; comment
+   SELECT ';' as a;
+   ```
+   This is just a comment line followed by a SQL query which returns a string 
literal.
+   ```sql
+   /* This is a comment contains ;
+   */ SELECT 1;
+   ```
+   However, if ';' is the end of the line, it terminates the SQL statement. 
The example above will be terminated into  `/* This is a comment contains ` and 
`*/ SELECT 1`, Spark will submit these two command and throw parser error 
(unclosed bracketed comment).
+
+
+
+CommandDescription
+
+  quit or exit
+  Exits the interactive shell.
+
+
+  !
+  Executes a shell command from the Spark SQL CLI shell.
+
+
+  dfs 
+  Executes a HDFS https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfs";>dfs
 command from the Spark SQL CLI shell.
+
+
+  
+  Executes a Spark SQL query and prints results to standard output.
+
+
+  source 
+  Executes a script file inside the CLI.
+
+
+
+## Examples
+
+Example of running a query from the command line:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL'
+
+Example of setting Hive configuration variables:
+
+./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf 
hiv

[GitHub] [spark] cloud-fan commented on a change in pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34821:
URL: https://github.com/apache/spark/pull/34821#discussion_r767448205



##
File path: docs/sql-distributed-sql-engine-spark-sql-cli.md
##
@@ -0,0 +1,199 @@
+---
+layout: global
+title: Spark SQL CLI
+displayTitle: Spark SQL CLI
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+* Table of contents
+{:toc}
+
+
+The Spark SQL CLI is a convenient tool to run the Hive metastore service in 
local mode and execute SQL
+queries input from the command line. Note that the Spark SQL CLI cannot talk 
to the Thrift JDBC server.
+
+To start the Spark SQL CLI, run the following in the Spark directory:
+
+./bin/spark-sql
+
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` 
and `hdfs-site.xml` files in `conf/`.

Review comment:
   > The Spark SQL CLI is a convenient tool to run the Hive metastore 
service in local mode ...
   
   Then this is not accurate. Maybe `run the Hive metastore service (in local 
mode by default) ...`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #34831: [SPARK-37574][CORE][SHUFFLE] Simplify fetchBlocks w/o retry

2021-12-12 Thread GitBox



Ngone51 commented on a change in pull request #34831:
URL: https://github.com/apache/spark/pull/34831#discussion_r767447666



##
File path: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
##
@@ -139,14 +139,7 @@ private[spark] class NettyBlockTransferService(
   }
 }
   }
-
-  if (maxRetries > 0) {
-// Note this Fetcher will correctly handle maxRetries == 0; we avoid 
it just in case there's
-// a bug in this code. We should remove the if statement once we're 
sure of the stability.

Review comment:
   I wonder how do you prove this since `maxRetries` default to 3 rather 
than 0.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34852: [SPARK-37591][SQL] Support the GCM mode by `aes_encrypt()`/`aes_decrypt()`

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34852:
URL: https://github.com/apache/spark/pull/34852#discussion_r767447652



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
##
@@ -375,23 +375,25 @@ case class AesEncrypt(
 @ExpressionDescription(
   usage = """
 _FUNC_(expr, key[, mode[, padding]]) - Returns a decrepted value of `expr` 
using AES in `mode` with `padding`.
-  Key lengths of 16, 24 and 32 bits are supported.
+  Key lengths of 16, 24 and 32 bits are supported. Supported combinations 
of (`mode`, `padding`) are ('ECB', 'PKCS') and ('GCM', 'NONE').
   """,
   arguments = """
 Arguments:
   * expr - The binary value to decrypt.
   * key - The passphrase to use to decrypt the data.
   * mode - Specifies which block cipher mode should be used to decrypt 
messages.
-   Valid modes: ECB.
+   Valid modes: ECB, GCM.
   * padding - Specifies how to pad messages whose length is not a multiple 
of the block size.
-  Valid values: PKCS.
+  Valid values: PKCS, NONE.

Review comment:
   > I think of introducing the DEFAULT value for padding
   
   Yea this also works, and we need to document the default padding for each 
mode.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34875: [SPARK-37624][PYTHON][DOCS] Suppress warnings for live pandas-on-Spark quickstart notebooks

2021-12-12 Thread GitBox



SparkQA commented on pull request #34875:
URL: https://github.com/apache/spark/pull/34875#issuecomment-992157976


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50592/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #34729: [SPARK-37475][SQL] Add scale parameter to floor and ceil functions

2021-12-12 Thread GitBox



cloud-fan commented on a change in pull request #34729:
URL: https://github.com/apache/spark/pull/34729#discussion_r767446855



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
##
@@ -249,9 +249,9 @@ case class Cbrt(child: Expression) extends 
UnaryMathExpression(math.cbrt, "CBRT"
   """,
   since = "1.4.0",
   group = "math_funcs")
-case class Ceil(child: Expression) extends UnaryMathExpression(math.ceil, 
"CEIL") {
+ case class Ceil(child: Expression) extends UnaryMathExpression(math.ceil, 
"CEIL") {
   override def dataType: DataType = child.dataType match {

Review comment:
   Let's define the return type we want:
   1. if input is decimal type, we should follow `RoundBase` to define the 
return type
   2. if input is integral type, returning `LongType` as before should be good.
   3. if input is float/double, returning `LongType` is definitely wrong. I 
think returning `DoubleType` should be good.
   
   The problem is what to do if the `scala` parameter is not given. Shall we 
keep backward compatibility and use the same return type as before? Or we 
prefer consistency within the system and change the return type?
   
   It looks weird if `ceil(c_double)` and `ceil(c_double, 0)` have different 
data type. Another idea is to only accept integer constant as the scale 
parameter, then we can make `ceil` return long type for float/double input if 
`scale <= 0`.
   
   cc @maropu @viirya @srielau 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #34852: [SPARK-37591][SQL] Support the GCM mode by `aes_encrypt()`/`aes_decrypt()`

2021-12-12 Thread GitBox



MaxGekk commented on a change in pull request #34852:
URL: https://github.com/apache/spark/pull/34852#discussion_r767445549



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
##
@@ -375,23 +375,25 @@ case class AesEncrypt(
 @ExpressionDescription(
   usage = """
 _FUNC_(expr, key[, mode[, padding]]) - Returns a decrepted value of `expr` 
using AES in `mode` with `padding`.
-  Key lengths of 16, 24 and 32 bits are supported.
+  Key lengths of 16, 24 and 32 bits are supported. Supported combinations 
of (`mode`, `padding`) are ('ECB', 'PKCS') and ('GCM', 'NONE').
   """,
   arguments = """
 Arguments:
   * expr - The binary value to decrypt.
   * key - The passphrase to use to decrypt the data.
   * mode - Specifies which block cipher mode should be used to decrypt 
messages.
-   Valid modes: ECB.
+   Valid modes: ECB, GCM.
   * padding - Specifies how to pad messages whose length is not a multiple 
of the block size.
-  Valid values: PKCS.
+  Valid values: PKCS, NONE.

Review comment:
   > We can also require the mode and padding parameter to be constant, to 
simplify the implementation.
   
   I wouldn't do that because it is unnecessary restriction from my point of 
view. I could image an use case when data gathered from different sources and 
encrypted slightly differently (using different padding/modes), and process in 
one  places. What you propose requires to somehow split input data by 
dataframes (or selects + unions) and process them separately. Don't see any 
reasons to bring such pains to users. 
   
   > Shall we have a smarter way to decide the default padding? e.g. if the 
mode is GCM, the default padding is NONE.
   
   I think of introducing the `DEFAULT` value for padding which the AES 
implementation substitutes by concrete value (`NONE` or `PKCS`) depending on 
the mode.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #34870: [SPARK-37615][BUILD][FOLLOWUP] Upgrade SBT to 1.5.6 in AppVeyor

2021-12-12 Thread GitBox



LuciferYang commented on pull request #34870:
URL: https://github.com/apache/spark/pull/34870#issuecomment-992154407


   I try to give a pr to `scala-maven-plugin` 
(https://github.com/davidB/scala-maven-plugin/pull/534), If it will be accepted 
and a new version is released, I will try to solve SPARK-36547
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down

2021-12-12 Thread GitBox



Ngone51 commented on a change in pull request #34629:
URL: https://github.com/apache/spark/pull/34629#discussion_r767444251



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -620,11 +620,13 @@ private[spark] class BlockManager(
* Note that this method must be called without any BlockInfo locks held.
*/
   def reregister(): Unit = {
-// TODO: We might need to rate limit re-registering.
-logInfo(s"BlockManager $blockManagerId re-registering with master")
-master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString, maxOnHeapMemory,
-  maxOffHeapMemory, storageEndpoint)
-reportAllBlocks()
+if (!SparkEnv.get.isStopped) {

Review comment:
   You can either update in this PR or in a separate PR as you like.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wankunde commented on a change in pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down

2021-12-12 Thread GitBox



wankunde commented on a change in pull request #34629:
URL: https://github.com/apache/spark/pull/34629#discussion_r767442890



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -620,11 +620,15 @@ private[spark] class BlockManager(
* Note that this method must be called without any BlockInfo locks held.
*/
   def reregister(): Unit = {
-// TODO: We might need to rate limit re-registering.
-logInfo(s"BlockManager $blockManagerId re-registering with master")
-master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString, maxOnHeapMemory,
-  maxOffHeapMemory, storageEndpoint)
-reportAllBlocks()
+SparkContext.getActive.map { context =>

Review comment:
   @Ngone51  Thanks for your review.  It's wrong to judge whether the 
blockManager is stopping, maybe we can use `SparkEnv` instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AmplabJenkins removed a comment on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992149550


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146121/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



SparkQA removed a comment on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992142541


   **[Test build #146121 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146121/testReport)**
 for PR 34821 at commit 
[`6d6b0cf`](https://github.com/apache/spark/commit/6d6b0cf1675de6f8d05dbbaf25aacc0334ef4250).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



AmplabJenkins commented on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992149550


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146121/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34821: [SPARK-37558][DOC] Improve spark sql cli document

2021-12-12 Thread GitBox



SparkQA commented on pull request #34821:
URL: https://github.com/apache/spark/pull/34821#issuecomment-992149325


   **[Test build #146121 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146121/testReport)**
 for PR 34821 at commit 
[`6d6b0cf`](https://github.com/apache/spark/commit/6d6b0cf1675de6f8d05dbbaf25aacc0334ef4250).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wankunde commented on a change in pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down

2021-12-12 Thread GitBox



wankunde commented on a change in pull request #34629:
URL: https://github.com/apache/spark/pull/34629#discussion_r767441206



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -620,11 +620,13 @@ private[spark] class BlockManager(
* Note that this method must be called without any BlockInfo locks held.
*/
   def reregister(): Unit = {
-// TODO: We might need to rate limit re-registering.
-logInfo(s"BlockManager $blockManagerId re-registering with master")
-master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString, maxOnHeapMemory,
-  maxOffHeapMemory, storageEndpoint)
-reportAllBlocks()
+if (!SparkEnv.get.isStopped) {

Review comment:
   @Ngone51 Thanks for you review.
   
   Yes,  this PR can not fix the issue above,  but I also think  that adding 
`!SparkEnv.get.isStopped` constraint is helpful as I have found several 
executors re-register when they are shutting down by driver.
   
   I very agree to fix this issue in `HeartbeatReceiver` and this PR can be 
closed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34876: fix: test

2021-12-12 Thread GitBox



AmplabJenkins commented on pull request #34876:
URL: https://github.com/apache/spark/pull/34876#issuecomment-992147009


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] moshuni opened a new pull request #34876: fix: test

2021-12-12 Thread GitBox



moshuni opened a new pull request #34876:
URL: https://github.com/apache/spark/pull/34876


   **What changes were proposed in this pull request?**
   No changes. This is a test
   
   **Why are the changes needed?**
   There are no need for changes. 
   
   **Does this PR introduce any user-facing change?**
   No
   
   **How was this patch tested?**
   Tests not included


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Kwafoor commented on pull request #34862: [SPARK-30471][SQL]Fix issue when comparing String and IntegerType

2021-12-12 Thread GitBox



Kwafoor commented on pull request #34862:
URL: https://github.com/apache/spark/pull/34862#issuecomment-992144811


   > I don't agree to randomly pick string integer comparison and make it ANSI.
   
   ANSI is a solution to fix all this problems.But I still think SparkSQL 
should remind user where you wrong.User from other platform accept ANSI need 
time to change their sql but they don't want change.So a remind to tell where 
your sql wrong rather null result is need.
   Or we have to suggest user better to use cast explicitly and change their 
code to accept ANSI.
   But return null is truly unobvious，it's hard to find the error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #34852: [SPARK-37591][SQL] Support the GCM mode by `aes_encrypt()`/`aes_decrypt()`

2021-12-12 Thread GitBox



MaxGekk commented on a change in pull request #34852:
URL: https://github.com/apache/spark/pull/34852#discussion_r767438008



##
File path: sql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala
##
@@ -56,6 +57,23 @@ class MiscFunctionsSuite extends QueryTest with 
SharedSparkSession {
   assert(e.getMessage.contains("current_user"))
 }
   }
+
+  test("SPARK-37591: AES functions - GCM mode") {
+Seq(
+  ("abcdefghijklmnop", ""),
+  ("abcdefghijklmnop", "abcdefghijklmnop"),
+  ("abcdefghijklmnop12345678", "Spark"),
+  ("abcdefghijklmnop12345678ABCDEFGH", "GCM mode")
+).foreach { case (key, input) =>
+  val df = Seq((key, input)).toDF("key", "input")
+  val encrypted = df.selectExpr("aes_encrypt(input, key, 'GCM', 'NONE') AS 
enc", "input", "key")
+  assert(encrypted.schema("enc").dataType === BinaryType)
+  assert(encrypted.filter($"enc" === $"input").isEmpty)
+  val result = encrypted.selectExpr(
+"CAST(aes_decrypt(enc, key, 'GCM', 'NONE') AS STRING) AS res", "input")
+  assert(!result.filter($"res" === $"input").isEmpty)

Review comment:
   `DataFrameFunctionsSuite` has already had > 3800 lines. Don't think it 
makes sense to place new misc test there if there is a special test suite for 
such functions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #34844: [SPARK-37592][SQL] Improve performance of `JoinSelection`

2021-12-12 Thread GitBox



SparkQA commented on pull request #34844:
URL: https://github.com/apache/spark/pull/34844#issuecomment-992142522


   **[Test build #146120 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146120/testReport)**
 for PR 34844 at commit 
[`3ce77ee`](https://github.com/apache/spark/commit/3ce77ee19851d3b721d849314effa826cf6251d8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 330 matches

Mail list logo