[GitHub] spark issue #17507: [SPARK-20190]'/applications/[app-id]/jobs' in rest api,s...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/17507 @srowen Help code review,thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109300686 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -113,6 +114,67 @@ class IndexedRowMatrix @Since("1.0.0") ( } /** +* Converts to BlockMatrix. Creates blocks of `DenseMatrix` with size 1024 x 1024. +*/ + def toBlockMatrixDense(): BlockMatrix = { --- End diff -- Is it a good idea to have both `toBlockMatrix` and `toBlockMatrixDense` for converting to `BlockMatrix` ? Shall we combine them and have just one `toBlockMatrix` method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109300484 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala --- @@ -89,11 +89,42 @@ class IndexedRowMatrixSuite extends SparkFunSuite with MLlibTestSparkContext { test("toBlockMatrix") { val idxRowMat = new IndexedRowMatrix(indexedRows) + +// Tests when n % colsPerBlock != 0 +val blockMat = idxRowMat.toBlockMatrix(2, 2) +assert(blockMat.numRows() === m) +assert(blockMat.numCols() === n) +assert(blockMat.toBreeze() === idxRowMat.toBreeze()) + +// Tests when m % rowsPerBlock != 0 +val blockMat2 = idxRowMat.toBlockMatrix(3, 1) +assert(blockMat2.numRows() === m) +assert(blockMat2.numCols() === n) +assert(blockMat2.toBreeze() === idxRowMat.toBreeze()) + +intercept[IllegalArgumentException] { + idxRowMat.toBlockMatrix(-1, 2) +} +intercept[IllegalArgumentException] { + idxRowMat.toBlockMatrix(2, 0) +} + } + + test("toBlockMatrixDense") { --- End diff -- I don't see you test newly added `toBlockMatrixDense`, do you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109300476 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,259 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 31 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE TEMPORARY VIEW temp_Data_Source_View + USING org.apache.spark.sql.sources.DDLScanSource + OPTIONS ( +From '1', +To '10', +Table 'test1') -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information + + + +-- !query 3 +CREATE VIEW v AS SELECT * FROM t +-- !query 3 schema +struct<> +-- !query 3 output + + + +-- !query 4 +ALTER TABLE t ADD PARTITION (c='Us', d=1) +-- !query 4 schema +struct<> +-- !query 4 output + + + +-- !query 5 +DESCRIBE t +-- !query 5 schema +struct +-- !query 5 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 3 -DESC t --- !query 3 schema +-- !query 6 +DESC default.t +-- !query 6 schema struct --- !query 3 output -# Partition Information +-- !query 6 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 4 +-- !query 7 DESC TABLE t --- !query 4 schema +-- !query 7 schema struct --- !query 4 output -# Partition Information +-- !query 7 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 5 +-- !query 8 DESC FORMATTED t --- !query 5 schema +-- !query 8 schema struct --- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information +-- !query 8 output # col_name data_type comment -Comment: table_comment
[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17468 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75456/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109300438 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,259 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 31 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE TEMPORARY VIEW temp_Data_Source_View + USING org.apache.spark.sql.sources.DDLScanSource + OPTIONS ( +From '1', +To '10', +Table 'test1') -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information + + + +-- !query 3 +CREATE VIEW v AS SELECT * FROM t +-- !query 3 schema +struct<> +-- !query 3 output + + + +-- !query 4 +ALTER TABLE t ADD PARTITION (c='Us', d=1) +-- !query 4 schema +struct<> +-- !query 4 output + + + +-- !query 5 +DESCRIBE t +-- !query 5 schema +struct +-- !query 5 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 3 -DESC t --- !query 3 schema +-- !query 6 +DESC default.t +-- !query 6 schema struct --- !query 3 output -# Partition Information +-- !query 6 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 4 +-- !query 7 DESC TABLE t --- !query 4 schema +-- !query 7 schema struct --- !query 4 output -# Partition Information +-- !query 7 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 5 +-- !query 8 DESC FORMATTED t --- !query 5 schema +-- !query 8 schema struct --- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information +-- !query 8 output # col_name data_type comment -Comment: table_comment
[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17468 **[Test build #75456 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75456/testReport)** for PR 17468 at commit [`756825d`](https://github.com/apache/spark/commit/756825d8b2bd2ee053c2df583114bf86496738a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109299896 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -98,6 +98,7 @@ class IndexedRowMatrix @Since("1.0.0") ( toBlockMatrix(1024, 1024) } + --- End diff -- Please remove extra line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17459#discussion_r109299891 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala --- @@ -113,6 +114,67 @@ class IndexedRowMatrix @Since("1.0.0") ( } /** +* Converts to BlockMatrix. Creates blocks of `DenseMatrix` with size 1024 x 1024. +*/ + def toBlockMatrixDense(): BlockMatrix = { +toBlockMatrixDense(1024, 1024) + } + + /** +* Converts to BlockMatrix. Creates blocks of `DenseMatrix`. +* @param rowsPerBlock The number of rows of each block. The blocks at the bottom edge may have +* a smaller value. Must be an integer value greater than 0. +* @param colsPerBlock The number of columns of each block. The blocks at the right edge may have +* a smaller value. Must be an integer value greater than 0. +* @return a [[BlockMatrix]] +*/ + def toBlockMatrixDense(rowsPerBlock: Int, colsPerBlock: Int): BlockMatrix = { +require(rowsPerBlock > 0, + s"rowsPerBlock needs to be greater than 0. rowsPerBlock: $rowsPerBlock") +require(colsPerBlock > 0, + s"colsPerBlock needs to be greater than 0. colsPerBlock: $colsPerBlock") + +val m = numRows() +val n = numCols() +val lastRowBlockIndex = m / rowsPerBlock +val lastColBlockIndex = n / colsPerBlock +val lastRowBlockSize = (m % rowsPerBlock).toInt +val lastColBlockSize = (n % colsPerBlock).toInt +val numRowBlocks = math.ceil(m.toDouble / rowsPerBlock).toInt +val numColBlocks = math.ceil(n.toDouble / colsPerBlock).toInt + +val blocks: RDD[((Int, Int), Matrix)] = rows.flatMap({ ir => + val blockRow = ir.index / rowsPerBlock + val rowInBlock = ir.index % rowsPerBlock + + ir.vector.toArray +.grouped(colsPerBlock) +.zipWithIndex +.map({ case (values, blockColumn) => + ((blockRow.toInt, blockColumn), (rowInBlock.toInt, values)) +}) +}).groupByKey(GridPartitioner(numRowBlocks, numColBlocks, rowsPerBlock, colsPerBlock)).map({ --- End diff -- If I don't miss anything, the parameters of `GridPartitioner` are wrong. Should be: GridPartitioner(numRowBlocks, numColBlocks, rows.partitions.length) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109299664 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,259 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 31 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE TEMPORARY VIEW temp_Data_Source_View + USING org.apache.spark.sql.sources.DDLScanSource + OPTIONS ( +From '1', +To '10', +Table 'test1') -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information + + + +-- !query 3 +CREATE VIEW v AS SELECT * FROM t +-- !query 3 schema +struct<> +-- !query 3 output + + + +-- !query 4 +ALTER TABLE t ADD PARTITION (c='Us', d=1) +-- !query 4 schema +struct<> +-- !query 4 output + + + +-- !query 5 +DESCRIBE t +-- !query 5 schema +struct +-- !query 5 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 3 -DESC t --- !query 3 schema +-- !query 6 +DESC default.t +-- !query 6 schema struct --- !query 3 output -# Partition Information +-- !query 6 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 4 +-- !query 7 DESC TABLE t --- !query 4 schema +-- !query 7 schema struct --- !query 4 output -# Partition Information +-- !query 7 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 5 +-- !query 8 DESC FORMATTED t --- !query 5 schema +-- !query 8 schema struct --- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information +-- !query 8 output # col_name data_type comment -Comment: table_comment
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17394 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17394 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75455/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109299543 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,259 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 31 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE TEMPORARY VIEW temp_Data_Source_View + USING org.apache.spark.sql.sources.DDLScanSource + OPTIONS ( +From '1', +To '10', +Table 'test1') -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information + + + +-- !query 3 +CREATE VIEW v AS SELECT * FROM t +-- !query 3 schema +struct<> +-- !query 3 output + + + +-- !query 4 +ALTER TABLE t ADD PARTITION (c='Us', d=1) +-- !query 4 schema +struct<> +-- !query 4 output + + + +-- !query 5 +DESCRIBE t +-- !query 5 schema +struct +-- !query 5 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 3 -DESC t --- !query 3 schema +-- !query 6 +DESC default.t +-- !query 6 schema struct --- !query 3 output -# Partition Information +-- !query 6 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 4 +-- !query 7 DESC TABLE t --- !query 4 schema +-- !query 7 schema struct --- !query 4 output -# Partition Information +-- !query 7 output # col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string --- !query 5 +-- !query 8 DESC FORMATTED t --- !query 5 schema +-- !query 8 schema struct --- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information +-- !query 8 output # col_name data_type comment -Comment: table_comment
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17394 **[Test build #75455 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75455/testReport)** for PR 17394 at commit [`43668be`](https://github.com/apache/spark/commit/43668be3b290b61129162fee27d13a73cece794a). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DDLScanSource extends RelationProvider ` * `case class SimpleDDLScan(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17468 **[Test build #75456 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75456/testReport)** for PR 17468 at commit [`756825d`](https://github.com/apache/spark/commit/756825d8b2bd2ee053c2df583114bf86496738a5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17468 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109298159 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,225 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // +--++-+---> + // - complete overlap: (If null values exists, we set it to partial overlap.) + // minLeftmaxLeft minRight maxRight + // +--++-+---> + case _: LessThan => +(minLeft >= maxRight, + maxLeft < minRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) + case _: LessThanOrEqual => +(minLeft > maxRight, + maxLeft <= minRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) + + // Left > Right or Left >= Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // +--++-+---> + // - complete overlap: (If null values exists, we set it to partial overlap.) + // minRight maxRight minLeft maxLeft + // +--++-+---> + case _: GreaterThan => +(maxLeft <= minRight, + minLeft > maxRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) + case _: GreaterThanOrEqual => +(maxLeft < minRight, + minLeft >= maxRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) + + // Left = Right or Left <=> Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // +--++-+---> + // minRight maxRight minLeft maxLeft + // +--++-+---> + // - comple
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109298152 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,225 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // +--++-+---> + // - complete overlap: (If null values exists, we set it to partial overlap.) + // minLeftmaxLeft minRight maxRight + // +--++-+---> + case _: LessThan => +(minLeft >= maxRight, + maxLeft < minRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) + case _: LessThanOrEqual => +(minLeft > maxRight, + maxLeft <= minRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) + + // Left > Right or Left >= Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // +--++-+---> + // - complete overlap: (If null values exists, we set it to partial overlap.) + // minRight maxRight minLeft maxLeft + // +--++-+---> + case _: GreaterThan => +(maxLeft <= minRight, + minLeft > maxRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) + case _: GreaterThanOrEqual => +(maxLeft < minRight, + minLeft >= maxRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) + + // Left = Right or Left <=> Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // +--++-+---> + // minRight maxRight minLeft maxLeft + // +--++-+---> + // - comple
[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17487 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17487 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75453/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17487 **[Test build #75453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75453/testReport)** for PR 17487 at commit [`407bdbf`](https://github.com/apache/spark/commit/407bdbf1ae66b73d47611477b9ce0f03dc37ff7b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedTableValuedFunction(conf: SQLConf,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17394 **[Test build #75455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75455/testReport)** for PR 17394 at commit [`43668be`](https://github.com/apache/spark/commit/43668be3b290b61129162fee27d13a73cece794a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109297793 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/DDLTestSuite.scala --- @@ -1,123 +0,0 @@ -/* -* Licensed to the Apache Software Foundation (ASF) under one or more -* contributor license agreements. See the NOTICE file distributed with -* this work for additional information regarding copyright ownership. -* The ASF licenses this file to You under the Apache License, Version 2.0 -* (the "License"); you may not use this file except in compliance with -* the License. You may obtain a copy of the License at -* -*http://www.apache.org/licenses/LICENSE-2.0 -* -* Unless required by applicable law or agreed to in writing, software -* distributed under the License is distributed on an "AS IS" BASIS, -* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -* See the License for the specific language governing permissions and -* limitations under the License. -*/ - -package org.apache.spark.sql.sources - -import org.apache.spark.rdd.RDD -import org.apache.spark.sql._ -import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.test.SharedSQLContext -import org.apache.spark.sql.types._ -import org.apache.spark.unsafe.types.UTF8String - -class DDLScanSource extends RelationProvider { - override def createRelation( - sqlContext: SQLContext, - parameters: Map[String, String]): BaseRelation = { -SimpleDDLScan( - parameters("from").toInt, - parameters("TO").toInt, - parameters("Table"))(sqlContext.sparkSession) - } -} - -case class SimpleDDLScan( --- End diff -- These two classes are moved to `DataSourceTest.scala` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109297784 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,225 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // +--++-+---> + // - complete overlap: (If null values exists, we set it to partial overlap.) + // minLeftmaxLeft minRight maxRight + // +--++-+---> + case _: LessThan => +(minLeft >= maxRight, + maxLeft < minRight && colStatLeft.nullCount == 0 && colStatRight.nullCount == 0) --- End diff -- we can have a `val allNotNull = colStatLeft.nullCount == 0 && colStatRight.nullCount == 0` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17508 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17508 **[Test build #75454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75454/testReport)** for PR 17508 at commit [`70e48fb`](https://github.com/apache/spark/commit/70e48fb7cce549ab0f5f06e7596e94b228cea824). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17508 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75454/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17508 **[Test build #75454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75454/testReport)** for PR 17508 at commit [`70e48fb`](https://github.com/apache/spark/commit/70e48fb7cce549ab0f5f06e7596e94b228cea824). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17508: [SPARK-20191][yarn] Crate wrapper for RackResolve...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/17508 [SPARK-20191][yarn] Crate wrapper for RackResolver so tests can override it. Current test code tries to override the RackResolver used by setting configuration params, but because YARN libs statically initialize the resolver the first time it's used, that means that those configs don't really take effect during Spark tests. This change adds a wrapper class that easily allows tests to override the behavior of the resolver for the Spark code that uses it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-20191 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17508.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17508 commit 70e48fb7cce549ab0f5f06e7596e94b228cea824 Author: Marcelo Vanzin Date: 2017-04-02T00:22:07Z [SPARK-20191][yarn] Crate wrapper for RackResolver so tests can override it. Current test code tries to override the RackResolver used by setting configuration params, but because YARN libs statically initialize the resolver the first time it's used, that means that those configs don't really take effect during Spark tests. This change adds a wrapper class that easily allows tests to override the behavior of the resolver for the Spark code that uses it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17468 @gatorsmile, could this get merged maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17507: [SPARK-20190]'/applications/[app-id]/jobs' in rest api,s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17507 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17507: [SPARK-20190]'/applications/[app-id]/jobs' in res...
GitHub user guoxiaolongzte opened a pull request: https://github.com/apache/spark/pull/17507 [SPARK-20190]'/applications/[app-id]/jobs' in rest api,status should be [running|s… …ucceeded|failed|unknown] ## What changes were proposed in this pull request? '/applications/[app-id]/jobs' in rest api.status should be'[running|succeeded|failed|unknown]'. now status is '[complete|succeeded|failed]'. but '/applications/[app-id]/jobs?status=complete' the server return 'HTTP ERROR 404'. Added '?status=running' and '?status=unknown'. code : public enum JobExecutionStatus { RUNNING, SUCCEEDED, FAILED, UNKNOWN; ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guoxiaolongzte/spark SPARK-20190 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17507.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17507 commit 555cef88fe09134ac98fd0ad056121c7df2539aa Author: guoxiaolongzte Date: 2017-04-02T00:16:08Z '/applications/[app-id]/jobs' in rest api,status should be [running|succeeded|failed|unknown] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL
Github user samelamin commented on the issue: https://github.com/apache/spark/pull/17487 based on comments from @hvanhovell I am depending on the case sensitivity setting of the analyser. That said I had to make the functionName a var to change the value to lower case which feels like a code smell to me. I am happy with suggestions to how I can improve on it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17487: [Spark-20145] [WIP] Fix range case insensitive bug in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17487 **[Test build #75453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75453/testReport)** for PR 17487 at commit [`407bdbf`](https://github.com/apache/spark/commit/407bdbf1ae66b73d47611477b9ce0f03dc37ff7b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17506: [SPARK-20189][DStream] Fix spark kinesis testcases to re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17506 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17506: [SPARK-20189][DStream] Fix spark kinesis testcase...
GitHub user yssharma opened a pull request: https://github.com/apache/spark/pull/17506 [SPARK-20189][DStream] Fix spark kinesis testcases to remove deprecated createStream and use Builders ## What changes were proposed in this pull request? The spark-kinesis testcases use the KinesisUtils.createStream which are deprecated now. Modify the testcases to use the recommended KinesisInputDStream.builder instead. This change will also enable the testcases to automatically use the session tokens automatically. ## How was this patch tested? All the existing testcases work fine as expected with the changes. https://issues.apache.org/jira/browse/SPARK-20189 You can merge this pull request into a Git repository by running: $ git pull https://github.com/yssharma/spark ysharma/cleanup_kinesis_testcases Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17506.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17506 commit 9ceab24d03c0eb226511b3e5e7917ce17cdaf395 Author: Yash Sharma Date: 2017-04-01T23:26:19Z SPARK-20189 - Fix spark kinesis testcases to remove deprecated createStream and use Builders --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17483 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17483 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75452/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17483 **[Test build #75452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75452/testReport)** for PR 17483 at commit [`aff13a8`](https://github.com/apache/spark/commit/aff13a860282375a650f6323987c73364ed439cd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17483 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17483 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75451/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17483 **[Test build #75451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75451/testReport)** for PR 17483 at commit [`2aea0cb`](https://github.com/apache/spark/commit/2aea0cb0f9b55acf741788aa72a573853560f3d9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75449/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17415 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17415 **[Test build #75449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75449/testReport)** for PR 17415 at commit [`bf440db`](https://github.com/apache/spark/commit/bf440db0ee760de1e1cabe265a5129254a885a51). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17451 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75450/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17451 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17451 **[Test build #75450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75450/testReport)** for PR 17451 at commit [`3ceaca0`](https://github.com/apache/spark/commit/3ceaca02591dc1f11722f397a296ffac88c90448). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17483 **[Test build #75452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75452/testReport)** for PR 17483 at commit [`aff13a8`](https://github.com/apache/spark/commit/aff13a860282375a650f6323987c73364ed439cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17483 @gatorsmile I added a line about recoverPartitions, I think we should also be more clear in other language bindings? Also open https://issues.apache.org/jira/browse/SPARK-20188 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17483 **[Test build #75451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75451/testReport)** for PR 17483 at commit [`2aea0cb`](https://github.com/apache/spark/commit/2aea0cb0f9b55acf741788aa72a573853560f3d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user keypointt commented on the issue: https://github.com/apache/spark/pull/17451 hi @MLnick , I'm stuck when trying to add test cases for python I tried below code chunk in pyspark terminal via `./bin/pyspark` ``` from pyspark.ml.feature import Word2Vec sent = ("a b " * 100 + "a c " * 10).split(" ") doc = spark.createDataFrame([(sent,), (sent,)], ["sentence"]) word2Vec = Word2Vec(vectorSize=5, seed=42, inputCol="sentence", outputCol="model") model = word2Vec.fit(doc) model.findSynonyms("a", 2) model.findSynonymsArray("a", 2) ``` and for `findSynonyms()`, I got results as expected: ``` >>> model.findSynonyms("a", 2) hahaha: Dataset JavaObject id=o143 DataFrame[word: string, similarity: double] ``` but for `findSynonymsArray()` I got below, which has no data ``` >>> model.findSynonymsArray("a", 2) [{u'__class__': u'scala.Tuple2'}, {u'__class__': u'scala.Tuple2'}] ``` I tried to debug and found `r` is in `elif isinstance(r, (JavaArray, JavaList)):` and dumped directly. It seems `Py4J` is not handling the returned object properly?https://github.com/apache/spark/blob/master/python/pyspark/ml/common.py#L90 could you please give me a hint here? I'm now trying to dig more into Py4J but it could take me some time. Thank you very much --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17451 **[Test build #75450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75450/testReport)** for PR 17451 at commit [`3ceaca0`](https://github.com/apache/spark/commit/3ceaca02591dc1f11722f397a296ffac88c90448). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17336 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75448/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17336 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17336 **[Test build #75448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75448/testReport)** for PR 17336 at commit [`a95a07a`](https://github.com/apache/spark/commit/a95a07ac1a430c67b13186d6dc383193ac3c3119). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17415 **[Test build #75449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75449/testReport)** for PR 17415 at commit [`bf440db`](https://github.com/apache/spark/commit/bf440db0ee760de1e1cabe265a5129254a885a51). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17336 **[Test build #75448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75448/testReport)** for PR 17336 at commit [`a95a07a`](https://github.com/apache/spark/commit/a95a07ac1a430c67b13186d6dc383193ac3c3119). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109293607 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> + // - complete overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + case _: LessThan => +(minLeft >= maxRight, maxLeft < minRight) + case _: LessThanOrEqual => +(minLeft > maxRight, maxLeft <= minRight) + + // Left > Right or Left >= Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + // - complete overlap: + // minRight maxRight minLeft maxLeft --- End diff -- Good point. fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17394 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75446/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17394 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17394 **[Test build #75446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75446/testReport)** for PR 17394 at commit [`a6db8a3`](https://github.com/apache/spark/commit/a6db8a32b6ad498dde89d0e6358034ece21a5f8f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17483 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17483 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75447/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17483 **[Test build #75447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75447/testReport)** for PR 17483 at commit [`3c66930`](https://github.com/apache/spark/commit/3c6693035b023d7c9c9e2caa3014247c130eb037). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17483#discussion_r109291089 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2977,6 +2981,51 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) }) +test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", { + expect_equal(currentDatabase(), "default") + expect_error(setCurrentDatabase("default"), NA) + expect_error(setCurrentDatabase("foo"), + "Error in setCurrentDatabase : analysis error - Database 'foo' does not exist") + dbs <- collect(listDatabases()) + expect_equal(names(dbs), c("name", "description", "locationUri")) + expect_equal(dbs[[1]], "default") +}) + +test_that("catalog APIs, listTables, listColumns, listFunctions", { + tb <- listTables() + count <- count(suppressWarnings(tables())) + expect_equal(nrow(tb), count) + expect_equal(colnames(tb), c("name", "database", "description", "tableType", "isTemporary")) + + createOrReplaceTempView(as.DataFrame(cars), "cars") + + tb <- listTables() + expect_equal(nrow(tb), count + 1) + tbs <- collect(tb) + expect_true(nrow(tbs[tbs$name == "cars", ]) > 0) + expect_error(listTables("bar"), + "Error in listTables : no such database - Database 'bar' not found") + + c <- listColumns("cars") + expect_equal(nrow(c), 2) + expect_equal(colnames(c), + c("name", "description", "dataType", "nullable", "isPartition", "isBucket")) + expect_equal(collect(c)[[1]][[1]], "speed") + expect_error(listColumns("foo", "default"), + "Error in listColumns : analysis error - Table 'foo' does not exist in database 'default'") + + dropTempView("cars") + + f <- listFunctions() + expect_true(nrow(f) >= 200) # 250 + expect_equal(colnames(f), + c("name", "database", "description", "className", "isTemporary")) + expect_equal(take(orderBy(f, "className"), 1)$className, + "org.apache.spark.sql.catalyst.expressions.Abs") + expect_error(listFunctions("foo_db"), + "Error in listFunctions : analysis error - Database 'foo_db' does not exist") +}) --- End diff -- sharp eyes :) I was planning to add tests. I tested these manually, but the steps are more involved and these are only thin wrappers in R I think we should defer to scala tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17483#discussion_r109290624 --- Diff: R/pkg/R/utils.R --- @@ -846,6 +846,24 @@ captureJVMException <- function(e, method) { # Extract the first message of JVM exception. first <- strsplit(msg[2], "\r?\n\tat")[[1]][1] stop(paste0(rmsg, "analysis error - ", first), call. = FALSE) + } else +if (any(grep("org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: ", stacktrace))) { --- End diff -- ok, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17483#discussion_r109290616 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -645,16 +645,17 @@ test_that("test tableNames and tables", { df <- read.json(jsonPath) createOrReplaceTempView(df, "table1") expect_equal(length(tableNames()), 1) - tables <- tables() + tables <- listTables() --- End diff -- changed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17483#discussion_r109290611 --- Diff: R/pkg/R/catalog.R --- @@ -0,0 +1,478 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# catalog.R: SparkSession catalog functions + +#' Create an external table +#' +#' Creates an external table based on the dataset in a data source, +#' Returns a SparkDataFrame associated with the external table. +#' +#' The data source is specified by the \code{source} and a set of options(...). +#' If \code{source} is not specified, the default data source configured by +#' "spark.sql.sources.default" will be used. +#' +#' @param tableName a name of the table. +#' @param path the path of files to load. +#' @param source the name of external data source. +#' @param schema the schema of the data for certain data source. +#' @param ... additional argument(s) passed to the method. +#' @return A SparkDataFrame. +#' @rdname createExternalTable +#' @export +#' @examples +#'\dontrun{ +#' sparkR.session() +#' df <- createExternalTable("myjson", path="path/to/json", source="json", schema) +#' } +#' @name createExternalTable +#' @method createExternalTable default +#' @note createExternalTable since 1.4.0 +createExternalTable.default <- function(tableName, path = NULL, source = NULL, schema = NULL, ...) { + sparkSession <- getSparkSession() + options <- varargsToStrEnv(...) + if (!is.null(path)) { +options[["path"]] <- path + } + catalog <- callJMethod(sparkSession, "catalog") + if (!is.null(schema)) { +sdf <- callJMethod(catalog, "createExternalTable", tableName, source, options) + } else { +sdf <- callJMethod(catalog, "createExternalTable", tableName, source, schema$jobj, options) + } + dataFrame(sdf) +} + +createExternalTable <- function(x, ...) { --- End diff -- right, I was just concerned that with `data.table`, `read.table` etc, table == data.frame in R as supposed to `hive table` or `managed table`, which could be fairly confusing. anyway, I think I'll follow up with a PR for `createTable` but as of now `path` is optional for `createExternalTable`, even though it's potentially misleading, it does work now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17483 **[Test build #75447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75447/testReport)** for PR 17483 at commit [`3c66930`](https://github.com/apache/spark/commit/3c6693035b023d7c9c9e2caa3014247c130eb037). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109289876 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> + // - complete overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + case _: LessThan => +(minLeft >= maxRight, maxLeft < minRight) + case _: LessThanOrEqual => +(minLeft > maxRight, maxLeft <= minRight) + + // Left > Right or Left >= Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + // - complete overlap: + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> + case _: GreaterThan => +(maxLeft <= minRight, minLeft > maxRight) + case _: GreaterThanOrEqual => +(maxLeft < minRight, minLeft >= maxRight) + + // Left = Right or Left <=> Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> + // - complete overlap: + // minLeftmaxLeft + // minRight maxRight --- End diff -- How about? ``` (minRight == maxRight) && (minLeft == minRight) && (maxLeft == maxRight) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, o
[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17394 **[Test build #75446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75446/testReport)** for PR 17394 at commit [`a6db8a3`](https://github.com/apache/spark/commit/a6db8a32b6ad498dde89d0e6358034ece21a5f8f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109289750 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> --- End diff -- uh. I missed that. Please feel free to remove it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109289677 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# co
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109289263 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -637,21 +570,7 @@ case class DescribeTableCommand( } DDLUtils.verifyPartitionProviderIsHive(spark, metadata, "DESC PARTITION") val partition = catalog.getPartition(table, partitionSpec) -if (isExtended) { - describeExtendedDetailedPartitionInfo(table, metadata, partition, result) -} else if (isFormatted) { - describeFormattedDetailedPartitionInfo(table, metadata, partition, result) - describeStorageInfo(metadata, result) -} - } - - private def describeExtendedDetailedPartitionInfo( - tableIdentifier: TableIdentifier, - table: CatalogTable, - partition: CatalogTablePartition, - buffer: ArrayBuffer[Row]): Unit = { -append(buffer, "", "", "") -append(buffer, "Detailed Partition Information " + partition.toString, "", "") +if (isExtended) describeFormattedDetailedPartitionInfo(table, metadata, partition, result) --- End diff -- This function `describeDetailedPartitionInfo ` will only be called for the DDL command ```SQL DESCRIBE [EXTENDED|FORMATTED] table_name PARTITION (partitionVal*) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109289119 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# co
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109289117 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# co
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109289114 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# co
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109289107 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# co
[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @mridulm Sorry for late reply. I opened the pr for SPARK-19659(https://github.com/apache/spark/pull/16989) and make these two PRs independent. Basically this pr is is to evaluate the performance(blocks are shuffled to disk) and stability(size in `MapStatus` is inaccurate and OOM can happen) of the implementation proposed in SPARK-19659. I'd be so thankful if you have time to comment on these two PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17501: [SPARK-20183][ML] Added outlierRatio arg to MLTestingUti...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17501 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109286524 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> + // - complete overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + case _: LessThan => +(minLeft >= maxRight, maxLeft < minRight) + case _: LessThanOrEqual => +(minLeft > maxRight, maxLeft <= minRight) + + // Left > Right or Left >= Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + // - complete overlap: + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> + case _: GreaterThan => +(maxLeft <= minRight, minLeft > maxRight) + case _: GreaterThanOrEqual => +(maxLeft < minRight, minLeft >= maxRight) + + // Left = Right or Left <=> Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> + // - complete overlap: + // minLeftmaxLeft + // minRight maxRight --- End diff -- I think `Left = Right` is different from the other 2 cases, even the range completely overlaps, the filter selectivity is not 100%. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109286505 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> + // - complete overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + case _: LessThan => +(minLeft >= maxRight, maxLeft < minRight) + case _: LessThanOrEqual => +(minLeft > maxRight, maxLeft <= minRight) + + // Left > Right or Left >= Right + // - no overlap: + // minLeftmaxLeft minRight maxRight + // 0 --+--++-+---> + // - complete overlap: + // minRight maxRight minLeft maxLeft --- End diff -- doesn't the `complete overlap` here need to consider null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17415#discussion_r109286468 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(percent.toDouble) } + /** + * Returns a percentage of rows meeting a binary comparison expression containing two columns. + * In SQL queries, we also see predicate expressions involving two columns + * such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. + * Note that, if column-1 and column-2 belong to different tables, then it is a join + * operator's work, NOT a filter operator's work. + * + * @param op a binary comparison operator, including =, <=>, <, <=, >, >= + * @param attrLeft the left Attribute (or a column) + * @param attrRight the right Attribute (or a column) + * @param update a boolean flag to specify if we need to update ColumnStat of the given columns + * for subsequent conditions + * @return an optional double value to show the percentage of rows meeting a given condition + */ + def evaluateBinaryForTwoColumns( + op: BinaryComparison, + attrLeft: Attribute, + attrRight: Attribute, + update: Boolean): Option[Double] = { + +if (!colStatsMap.contains(attrLeft)) { + logDebug("[CBO] No statistics for " + attrLeft) + return None +} +if (!colStatsMap.contains(attrRight)) { + logDebug("[CBO] No statistics for " + attrRight) + return None +} + +attrLeft.dataType match { + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attrLeft) +return None + case _ => +} + +val colStatLeft = colStatsMap(attrLeft) +val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) + .asInstanceOf[NumericRange] +val maxLeft = BigDecimal(statsRangeLeft.max) +val minLeft = BigDecimal(statsRangeLeft.min) + +val colStatRight = colStatsMap(attrRight) +val statsRangeRight = Range(colStatRight.min, colStatRight.max, attrRight.dataType) + .asInstanceOf[NumericRange] +val maxRight = BigDecimal(statsRangeRight.max) +val minRight = BigDecimal(statsRangeRight.min) + +// determine the overlapping degree between predicate range and column's range +val (noOverlap: Boolean, completeOverlap: Boolean) = op match { + // Left < Right or Left <= Right + // - no overlap: + // minRight maxRight minLeft maxLeft + // 0 --+--++-+---> --- End diff -- the starting `0` looks confusing, the `max`, `min` values doesn't need to be positive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17487: [Spark-20145] [WIP] Fix range case insensitive bu...
Github user samelamin commented on a diff in the pull request: https://github.com/apache/spark/pull/17487#discussion_r109286453 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala --- @@ -105,7 +105,7 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case u: UnresolvedTableValuedFunction if u.functionArgs.forall(_.resolved) => - builtinFunctions.get(u.functionName) match { + builtinFunctions.get(u.functionName.toLowerCase) match { --- End diff -- @hvanhovell instead of creating a new case class, is there a way I can reuse the UnresolvedTableValuedFunction case class and just add in the SQLConf class? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17492: [SPARK-19641][SQL] JSON schema inference in DROPM...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17492#discussion_r109286081 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala --- @@ -217,26 +221,43 @@ private[sql] object JsonInferSchema { } } + private def withParseMode( --- End diff -- shall we embed this method in `withCorruptField`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17504: [SPARK-20186][SQL] BroadcastHint should use child...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17504 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17504: [SPARK-20186][SQL] BroadcastHint should use child's stat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17504 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109285879 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# col
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109285811 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -637,21 +570,7 @@ case class DescribeTableCommand( } DDLUtils.verifyPartitionProviderIsHive(spark, metadata, "DESC PARTITION") val partition = catalog.getPartition(table, partitionSpec) -if (isExtended) { - describeExtendedDetailedPartitionInfo(table, metadata, partition, result) -} else if (isFormatted) { - describeFormattedDetailedPartitionInfo(table, metadata, partition, result) - describeStorageInfo(metadata, result) -} - } - - private def describeExtendedDetailedPartitionInfo( - tableIdentifier: TableIdentifier, - table: CatalogTable, - partition: CatalogTablePartition, - buffer: ArrayBuffer[Row]): Unit = { -append(buffer, "", "", "") -append(buffer, "Detailed Partition Information " + partition.toString, "", "") +if (isExtended) describeFormattedDetailedPartitionInfo(table, metadata, partition, result) --- End diff -- not related to this PR, but it looks weird that `DESC tbl` and `DESC tbl PARTITION (xxx)` has the same result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109285540 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# col
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109285423 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# col
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109285410 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# col
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109285396 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# col
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109285367 --- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out --- @@ -1,205 +1,248 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14 +-- Number of queries: 28 -- !query 0 -CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet PARTITIONED BY (c, d) COMMENT 'table_comment' +CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet + PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS + COMMENT 'table_comment' -- !query 0 schema struct<> -- !query 0 output -- !query 1 -ALTER TABLE t ADD PARTITION (c='Us', d=1) +CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t -- !query 1 schema struct<> -- !query 1 output -- !query 2 -DESCRIBE t +CREATE VIEW v AS SELECT * FROM t -- !query 2 schema -struct +struct<> -- !query 2 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 3 -DESC t +ALTER TABLE t ADD PARTITION (c='Us', d=1) -- !query 3 schema -struct +struct<> -- !query 3 output -# Partition Information -# col_name data_type comment -a string -b int -c string -c string -d string -d string + -- !query 4 -DESC TABLE t +DESCRIBE t -- !query 4 schema struct -- !query 4 output -# Partition Information -# col_name data_type comment a string b int c string -c string d string +# Partition Information +# col_name data_type comment +c string d string -- !query 5 -DESC FORMATTED t +DESC t -- !query 5 schema struct -- !query 5 output -# Detailed Table Information -# Partition Information -# Storage Information -# col_name data_type comment -Comment: table_comment -Compressed:No -Created: -Database: default -Last Access: -Location: sql/core/spark-warehouse/t -Owner: -Partition Provider:Catalog -Storage Desc Parameters: -Table Parameters: -Table Type:MANAGED a string b int c string -c string d string +# Partition Information +# col
[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17394#discussion_r109285298 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -214,6 +215,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { // Returns true if the plan is supposed to be sorted. def isSorted(plan: LogicalPlan): Boolean = plan match { --- End diff -- maybe call it `needSort`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14617 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75444/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14617 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14617 **[Test build #75444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75444/testReport)** for PR 14617 at commit [`b30e7d0`](https://github.com/apache/spark/commit/b30e7d0c2e950179ef5801a697215ec9afd88226). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org