date:20170401

[GitHub] spark issue #17507: [SPARK-20190]'/applications/[app-id]/jobs' in rest api,s...

2017-04-01 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/17507
  
@srowen Help code review,thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17459#discussion_r109300686
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala
 ---
@@ -113,6 +114,67 @@ class IndexedRowMatrix @Since("1.0.0") (
   }
 
   /**
+* Converts to BlockMatrix. Creates blocks of `DenseMatrix` with size 
1024 x 1024.
+*/
+  def toBlockMatrixDense(): BlockMatrix = {
--- End diff --

Is it a good idea to have both `toBlockMatrix` and `toBlockMatrixDense` for 
converting to `BlockMatrix` ?

Shall we combine them and have just one `toBlockMatrix` method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17459#discussion_r109300484
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrixSuite.scala
 ---
@@ -89,11 +89,42 @@ class IndexedRowMatrixSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 
   test("toBlockMatrix") {
 val idxRowMat = new IndexedRowMatrix(indexedRows)
+
+// Tests when n % colsPerBlock != 0
+val blockMat = idxRowMat.toBlockMatrix(2, 2)
+assert(blockMat.numRows() === m)
+assert(blockMat.numCols() === n)
+assert(blockMat.toBreeze() === idxRowMat.toBreeze())
+
+// Tests when m % rowsPerBlock != 0
+val blockMat2 = idxRowMat.toBlockMatrix(3, 1)
+assert(blockMat2.numRows() === m)
+assert(blockMat2.numCols() === n)
+assert(blockMat2.toBreeze() === idxRowMat.toBreeze())
+
+intercept[IllegalArgumentException] {
+  idxRowMat.toBlockMatrix(-1, 2)
+}
+intercept[IllegalArgumentException] {
+  idxRowMat.toBlockMatrix(2, 0)
+}
+  }
+
+  test("toBlockMatrixDense") {
--- End diff --

I don't see you test newly added `toBlockMatrixDense`, do you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109300476
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,259 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 31
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE TEMPORARY VIEW temp_Data_Source_View
+  USING org.apache.spark.sql.sources.DDLScanSource
+  OPTIONS (
+From '1',
+To '10',
+Table 'test1')
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
+
+
+
+-- !query 3
+CREATE VIEW v AS SELECT * FROM t
+-- !query 3 schema
+struct<>
+-- !query 3 output
+
+
+
+-- !query 4
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
+-- !query 4 schema
+struct<>
+-- !query 4 output
+
+
+
+-- !query 5
+DESCRIBE t
+-- !query 5 schema
+struct
+-- !query 5 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 3
-DESC t
--- !query 3 schema
+-- !query 6
+DESC default.t
+-- !query 6 schema
 struct
--- !query 3 output
-# Partition Information
+-- !query 6 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 4
+-- !query 7
 DESC TABLE t
--- !query 4 schema
+-- !query 7 schema
 struct
--- !query 4 output
-# Partition Information
+-- !query 7 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 5
+-- !query 8
 DESC FORMATTED t
--- !query 5 schema
+-- !query 8 schema
 struct
--- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
+-- !query 8 output
 # col_name data_type   comment 
-Comment:   table_comment

[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17468
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75456/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109300438
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,259 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 31
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE TEMPORARY VIEW temp_Data_Source_View
+  USING org.apache.spark.sql.sources.DDLScanSource
+  OPTIONS (
+From '1',
+To '10',
+Table 'test1')
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
+
+
+
+-- !query 3
+CREATE VIEW v AS SELECT * FROM t
+-- !query 3 schema
+struct<>
+-- !query 3 output
+
+
+
+-- !query 4
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
+-- !query 4 schema
+struct<>
+-- !query 4 output
+
+
+
+-- !query 5
+DESCRIBE t
+-- !query 5 schema
+struct
+-- !query 5 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 3
-DESC t
--- !query 3 schema
+-- !query 6
+DESC default.t
+-- !query 6 schema
 struct
--- !query 3 output
-# Partition Information
+-- !query 6 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 4
+-- !query 7
 DESC TABLE t
--- !query 4 schema
+-- !query 7 schema
 struct
--- !query 4 output
-# Partition Information
+-- !query 7 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 5
+-- !query 8
 DESC FORMATTED t
--- !query 5 schema
+-- !query 8 schema
 struct
--- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
+-- !query 8 output
 # col_name data_type   comment 
-Comment:   table_comment

[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17468
  
**[Test build #75456 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75456/testReport)**
 for PR 17468 at commit 
[`756825d`](https://github.com/apache/spark/commit/756825d8b2bd2ee053c2df583114bf86496738a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17459#discussion_r109299896
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala
 ---
@@ -98,6 +98,7 @@ class IndexedRowMatrix @Since("1.0.0") (
 toBlockMatrix(1024, 1024)
   }
 
+
--- End diff --

Please remove extra line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17459: [SPARK-20109][MLlib] Added toBlockMatrixDense to ...

2017-04-01 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17459#discussion_r109299891
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala
 ---
@@ -113,6 +114,67 @@ class IndexedRowMatrix @Since("1.0.0") (
   }
 
   /**
+* Converts to BlockMatrix. Creates blocks of `DenseMatrix` with size 
1024 x 1024.
+*/
+  def toBlockMatrixDense(): BlockMatrix = {
+toBlockMatrixDense(1024, 1024)
+  }
+
+  /**
+* Converts to BlockMatrix. Creates blocks of `DenseMatrix`.
+* @param rowsPerBlock The number of rows of each block. The blocks at 
the bottom edge may have
+* a smaller value. Must be an integer value 
greater than 0.
+* @param colsPerBlock The number of columns of each block. The blocks 
at the right edge may have
+* a smaller value. Must be an integer value 
greater than 0.
+* @return a [[BlockMatrix]]
+*/
+  def toBlockMatrixDense(rowsPerBlock: Int, colsPerBlock: Int): 
BlockMatrix = {
+require(rowsPerBlock > 0,
+  s"rowsPerBlock needs to be greater than 0. rowsPerBlock: 
$rowsPerBlock")
+require(colsPerBlock > 0,
+  s"colsPerBlock needs to be greater than 0. colsPerBlock: 
$colsPerBlock")
+
+val m = numRows()
+val n = numCols()
+val lastRowBlockIndex = m / rowsPerBlock
+val lastColBlockIndex = n / colsPerBlock
+val lastRowBlockSize = (m % rowsPerBlock).toInt
+val lastColBlockSize = (n % colsPerBlock).toInt
+val numRowBlocks = math.ceil(m.toDouble / rowsPerBlock).toInt
+val numColBlocks = math.ceil(n.toDouble / colsPerBlock).toInt
+
+val blocks: RDD[((Int, Int), Matrix)] = rows.flatMap({ ir =>
+  val blockRow = ir.index / rowsPerBlock
+  val rowInBlock = ir.index % rowsPerBlock
+
+  ir.vector.toArray
+.grouped(colsPerBlock)
+.zipWithIndex
+.map({ case (values, blockColumn) =>
+  ((blockRow.toInt, blockColumn), (rowInBlock.toInt, values))
+})
+}).groupByKey(GridPartitioner(numRowBlocks, numColBlocks, 
rowsPerBlock, colsPerBlock)).map({
--- End diff --

If I don't miss anything, the parameters of `GridPartitioner` are wrong. 
Should be:

GridPartitioner(numRowBlocks, numColBlocks, rows.partitions.length)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109299664
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,259 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 31
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE TEMPORARY VIEW temp_Data_Source_View
+  USING org.apache.spark.sql.sources.DDLScanSource
+  OPTIONS (
+From '1',
+To '10',
+Table 'test1')
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
+
+
+
+-- !query 3
+CREATE VIEW v AS SELECT * FROM t
+-- !query 3 schema
+struct<>
+-- !query 3 output
+
+
+
+-- !query 4
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
+-- !query 4 schema
+struct<>
+-- !query 4 output
+
+
+
+-- !query 5
+DESCRIBE t
+-- !query 5 schema
+struct
+-- !query 5 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 3
-DESC t
--- !query 3 schema
+-- !query 6
+DESC default.t
+-- !query 6 schema
 struct
--- !query 3 output
-# Partition Information
+-- !query 6 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 4
+-- !query 7
 DESC TABLE t
--- !query 4 schema
+-- !query 7 schema
 struct
--- !query 4 output
-# Partition Information
+-- !query 7 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 5
+-- !query 8
 DESC FORMATTED t
--- !query 5 schema
+-- !query 8 schema
 struct
--- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
+-- !query 8 output
 # col_name data_type   comment 
-Comment:   table_comment

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17394
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17394
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75455/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109299543
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,259 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 31
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE TEMPORARY VIEW temp_Data_Source_View
+  USING org.apache.spark.sql.sources.DDLScanSource
+  OPTIONS (
+From '1',
+To '10',
+Table 'test1')
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
+
+
+
+-- !query 3
+CREATE VIEW v AS SELECT * FROM t
+-- !query 3 schema
+struct<>
+-- !query 3 output
+
+
+
+-- !query 4
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
+-- !query 4 schema
+struct<>
+-- !query 4 output
+
+
+
+-- !query 5
+DESCRIBE t
+-- !query 5 schema
+struct
+-- !query 5 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 3
-DESC t
--- !query 3 schema
+-- !query 6
+DESC default.t
+-- !query 6 schema
 struct
--- !query 3 output
-# Partition Information
+-- !query 6 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 4
+-- !query 7
 DESC TABLE t
--- !query 4 schema
+-- !query 7 schema
 struct
--- !query 4 output
-# Partition Information
+-- !query 7 output
 # col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
--- !query 5
+-- !query 8
 DESC FORMATTED t
--- !query 5 schema
+-- !query 8 schema
 struct
--- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
+-- !query 8 output
 # col_name data_type   comment 
-Comment:   table_comment

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17394
  
**[Test build #75455 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75455/testReport)**
 for PR 17394 at commit 
[`43668be`](https://github.com/apache/spark/commit/43668be3b290b61129162fee27d13a73cece794a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DDLScanSource extends RelationProvider `
  * `case class SimpleDDLScan(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17468
  
**[Test build #75456 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75456/testReport)**
 for PR 17468 at commit 
[`756825d`](https://github.com/apache/spark/commit/756825d8b2bd2ee053c2df583114bf86496738a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17468
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109298159
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,225 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  // - complete overlap: (If null values exists, we set it to partial 
overlap.)
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  case _: LessThan =>
+(minLeft >= maxRight,
+  maxLeft < minRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
+  case _: LessThanOrEqual =>
+(minLeft > maxRight,
+  maxLeft <= minRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
+
+  // Left > Right or Left >= Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  // - complete overlap: (If null values exists, we set it to partial 
overlap.)
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  case _: GreaterThan =>
+(maxLeft <= minRight,
+  minLeft > maxRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
+  case _: GreaterThanOrEqual =>
+(maxLeft < minRight,
+  minLeft >= maxRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
+
+  // Left = Right or Left <=> Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  // - comple

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109298152
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,225 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  // - complete overlap: (If null values exists, we set it to partial 
overlap.)
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  case _: LessThan =>
+(minLeft >= maxRight,
+  maxLeft < minRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
+  case _: LessThanOrEqual =>
+(minLeft > maxRight,
+  maxLeft <= minRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
+
+  // Left > Right or Left >= Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  // - complete overlap: (If null values exists, we set it to partial 
overlap.)
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  case _: GreaterThan =>
+(maxLeft <= minRight,
+  minLeft > maxRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
+  case _: GreaterThanOrEqual =>
+(maxLeft < minRight,
+  minLeft >= maxRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
+
+  // Left = Right or Left <=> Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  // - comple

[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17487
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17487
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75453/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17487
  
**[Test build #75453 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75453/testReport)**
 for PR 17487 at commit 
[`407bdbf`](https://github.com/apache/spark/commit/407bdbf1ae66b73d47611477b9ce0f03dc37ff7b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedTableValuedFunction(conf: SQLConf,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17394
  
**[Test build #75455 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75455/testReport)**
 for PR 17394 at commit 
[`43668be`](https://github.com/apache/spark/commit/43668be3b290b61129162fee27d13a73cece794a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109297793
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/DDLTestSuite.scala ---
@@ -1,123 +0,0 @@
-/*
-* Licensed to the Apache Software Foundation (ASF) under one or more
-* contributor license agreements.  See the NOTICE file distributed with
-* this work for additional information regarding copyright ownership.
-* The ASF licenses this file to You under the Apache License, Version 2.0
-* (the "License"); you may not use this file except in compliance with
-* the License.  You may obtain a copy of the License at
-*
-*http://www.apache.org/licenses/LICENSE-2.0
-*
-* Unless required by applicable law or agreed to in writing, software
-* distributed under the License is distributed on an "AS IS" BASIS,
-* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-* See the License for the specific language governing permissions and
-* limitations under the License.
-*/
-
-package org.apache.spark.sql.sources
-
-import org.apache.spark.rdd.RDD
-import org.apache.spark.sql._
-import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.test.SharedSQLContext
-import org.apache.spark.sql.types._
-import org.apache.spark.unsafe.types.UTF8String
-
-class DDLScanSource extends RelationProvider {
-  override def createRelation(
-  sqlContext: SQLContext,
-  parameters: Map[String, String]): BaseRelation = {
-SimpleDDLScan(
-  parameters("from").toInt,
-  parameters("TO").toInt,
-  parameters("Table"))(sqlContext.sparkSession)
-  }
-}
-
-case class SimpleDDLScan(
--- End diff --

These two classes are moved to `DataSourceTest.scala`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109297784
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,225 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // +--++-+--->
+  // - complete overlap: (If null values exists, we set it to partial 
overlap.)
+  //  minLeftmaxLeft  minRight  maxRight
+  // +--++-+--->
+  case _: LessThan =>
+(minLeft >= maxRight,
+  maxLeft < minRight && colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0)
--- End diff --

we can have a `val allNotNull = colStatLeft.nullCount == 0 && 
colStatRight.nullCount == 0`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17508
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17508
  
**[Test build #75454 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75454/testReport)**
 for PR 17508 at commit 
[`70e48fb`](https://github.com/apache/spark/commit/70e48fb7cce549ab0f5f06e7596e94b228cea824).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17508
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75454/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17508: [SPARK-20191][yarn] Crate wrapper for RackResolver so te...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17508
  
**[Test build #75454 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75454/testReport)**
 for PR 17508 at commit 
[`70e48fb`](https://github.com/apache/spark/commit/70e48fb7cce549ab0f5f06e7596e94b228cea824).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17508: [SPARK-20191][yarn] Crate wrapper for RackResolve...

2017-04-01 Thread vanzin

GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/17508

[SPARK-20191][yarn] Crate wrapper for RackResolver so tests can override it.

Current test code tries to override the RackResolver used by setting
configuration params, but because YARN libs statically initialize the
resolver the first time it's used, that means that those configs don't
really take effect during Spark tests.

This change adds a wrapper class that easily allows tests to override the
behavior of the resolver for the Spark code that uses it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-20191

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17508.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17508


commit 70e48fb7cce549ab0f5f06e7596e94b228cea824
Author: Marcelo Vanzin 
Date:   2017-04-02T00:22:07Z

[SPARK-20191][yarn] Crate wrapper for RackResolver so tests can override it.

Current test code tries to override the RackResolver used by setting
configuration params, but because YARN libs statically initialize the
resolver the first time it's used, that means that those configs don't
really take effect during Spark tests.

This change adds a wrapper class that easily allows tests to override the
behavior of the resolver for the Spark code that uses it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17468: [SPARK-20143][SQL] DataType.fromJson should throw an exc...

2017-04-01 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17468
  
@gatorsmile, could this get merged maybe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17507: [SPARK-20190]'/applications/[app-id]/jobs' in rest api,s...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17507
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17507: [SPARK-20190]'/applications/[app-id]/jobs' in res...

2017-04-01 Thread guoxiaolongzte

GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/17507

[SPARK-20190]'/applications/[app-id]/jobs' in rest api,status should be 
[running|sâ¦

â¦ucceeded|failed|unknown]

## What changes were proposed in this pull request?

'/applications/[app-id]/jobs' in rest api.status should 
be'[running|succeeded|failed|unknown]'.
now status is '[complete|succeeded|failed]'.
but '/applications/[app-id]/jobs?status=complete' the server return 'HTTP 
ERROR 404'.
Added '?status=running' and '?status=unknown'.
code ï¼
public enum JobExecutionStatus {
RUNNING,
SUCCEEDED,
FAILED,
UNKNOWN;

## How was this patch tested?

 manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-20190

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17507


commit 555cef88fe09134ac98fd0ad056121c7df2539aa
Author: guoxiaolongzte 
Date:   2017-04-02T00:16:08Z

'/applications/[app-id]/jobs' in rest api,status should be 
[running|succeeded|failed|unknown]




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17487: [Spark-20145] Fix range case insensitive bug in SQL

2017-04-01 Thread samelamin

Github user samelamin commented on the issue:

https://github.com/apache/spark/pull/17487
  
based on comments from @hvanhovell  I am depending on the case sensitivity 
setting of the analyser. That said I had to make the functionName a var to 
change the value to lower case which feels like a code smell to me. I am happy 
with suggestions to how I can improve on it



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17487: [Spark-20145] [WIP] Fix range case insensitive bug in SQ...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17487
  
**[Test build #75453 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75453/testReport)**
 for PR 17487 at commit 
[`407bdbf`](https://github.com/apache/spark/commit/407bdbf1ae66b73d47611477b9ce0f03dc37ff7b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17506: [SPARK-20189][DStream] Fix spark kinesis testcases to re...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17506
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17506: [SPARK-20189][DStream] Fix spark kinesis testcase...

2017-04-01 Thread yssharma

GitHub user yssharma opened a pull request:

https://github.com/apache/spark/pull/17506

[SPARK-20189][DStream] Fix spark kinesis testcases to remove deprecated 
createStream and use Builders


## What changes were proposed in this pull request?

The spark-kinesis testcases use the KinesisUtils.createStream which are 
deprecated now. Modify the testcases to use the recommended 
KinesisInputDStream.builder instead.
This change will also enable the testcases to automatically use the session 
tokens automatically.

## How was this patch tested?

All the existing testcases work fine as expected with the changes.

https://issues.apache.org/jira/browse/SPARK-20189

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yssharma/spark 
ysharma/cleanup_kinesis_testcases

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17506


commit 9ceab24d03c0eb226511b3e5e7917ce17cdaf395
Author: Yash Sharma 
Date:   2017-04-01T23:26:19Z

SPARK-20189 - Fix spark kinesis testcases to remove deprecated createStream 
and use Builders




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17483
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17483
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75452/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17483
  
**[Test build #75452 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75452/testReport)**
 for PR 17483 at commit 
[`aff13a8`](https://github.com/apache/spark/commit/aff13a860282375a650f6323987c73364ed439cd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17483
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17483
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75451/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17483
  
**[Test build #75451 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75451/testReport)**
 for PR 17483 at commit 
[`2aea0cb`](https://github.com/apache/spark/commit/2aea0cb0f9b55acf741788aa72a573853560f3d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75449/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17415
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17415
  
**[Test build #75449 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75449/testReport)**
 for PR 17415 at commit 
[`bf440db`](https://github.com/apache/spark/commit/bf440db0ee760de1e1cabe265a5129254a885a51).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17451
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75450/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17451
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17451
  
**[Test build #75450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75450/testReport)**
 for PR 17451 at commit 
[`3ceaca0`](https://github.com/apache/spark/commit/3ceaca02591dc1f11722f397a296ffac88c90448).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17483
  
**[Test build #75452 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75452/testReport)**
 for PR 17483 at commit 
[`aff13a8`](https://github.com/apache/spark/commit/aff13a860282375a650f6323987c73364ed439cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17483
  
@gatorsmile I added a line about recoverPartitions, I think we should also 
be more clear in other language bindings?

Also open https://issues.apache.org/jira/browse/SPARK-20188



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17483
  
**[Test build #75451 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75451/testReport)**
 for PR 17483 at commit 
[`2aea0cb`](https://github.com/apache/spark/commit/2aea0cb0f9b55acf741788aa72a573853560f3d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-04-01 Thread keypointt

Github user keypointt commented on the issue:

https://github.com/apache/spark/pull/17451
  
hi @MLnick , I'm stuck when trying to add test cases for python

I tried below code chunk in pyspark terminal via `./bin/pyspark`

```
from pyspark.ml.feature import Word2Vec

sent = ("a b " * 100 + "a c " * 10).split(" ")
doc = spark.createDataFrame([(sent,), (sent,)], ["sentence"])
word2Vec = Word2Vec(vectorSize=5, seed=42, inputCol="sentence", 
outputCol="model")
model = word2Vec.fit(doc)

model.findSynonyms("a", 2)
model.findSynonymsArray("a", 2)
```
and for `findSynonyms()`, I got results as expected:
```
>>> model.findSynonyms("a", 2)
hahaha:  Dataset
JavaObject id=o143
DataFrame[word: string, similarity: double]
```
but for `findSynonymsArray()` I got below, which has no data
```
>>> model.findSynonymsArray("a", 2)
[{u'__class__': u'scala.Tuple2'}, {u'__class__': u'scala.Tuple2'}]
```

I tried to debug and found `r` is in `elif isinstance(r, (JavaArray, 
JavaList)):` and dumped directly. It seems `Py4J` is not handling the returned 
object 
properly?https://github.com/apache/spark/blob/master/python/pyspark/ml/common.py#L90

could you please give me a hint here? I'm now trying to dig more into Py4J 
but it could take me some time. Thank you very much


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17451: [SPARK-19866][ML][PySpark] Add local version of Word2Vec...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17451
  
**[Test build #75450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75450/testReport)**
 for PR 17451 at commit 
[`3ceaca0`](https://github.com/apache/spark/commit/3ceaca02591dc1f11722f397a296ffac88c90448).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17336
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75448/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17336
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17336
  
**[Test build #75448 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75448/testReport)**
 for PR 17336 at commit 
[`a95a07a`](https://github.com/apache/spark/commit/a95a07ac1a430c67b13186d6dc383193ac3c3119).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17415: [SPARK-19408][SQL] filter estimation on two columns of s...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17415
  
**[Test build #75449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75449/testReport)**
 for PR 17415 at commit 
[`bf440db`](https://github.com/apache/spark/commit/bf440db0ee760de1e1cabe265a5129254a885a51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17336
  
**[Test build #75448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75448/testReport)**
 for PR 17336 at commit 
[`a95a07a`](https://github.com/apache/spark/commit/a95a07ac1a430c67b13186d6dc383193ac3c3119).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread ron8hu

Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109293607
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  case _: LessThan =>
+(minLeft >= maxRight, maxLeft < minRight)
+  case _: LessThanOrEqual =>
+(minLeft > maxRight, maxLeft <= minRight)
+
+  // Left > Right or Left >= Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minRight   maxRight minLeft   maxLeft
--- End diff --

Good point.  fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17394
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75446/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17394
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17394
  
**[Test build #75446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75446/testReport)**
 for PR 17394 at commit 
[`a6db8a3`](https://github.com/apache/spark/commit/a6db8a32b6ad498dde89d0e6358034ece21a5f8f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17483
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17483
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75447/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17483
  
**[Test build #75447 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75447/testReport)**
 for PR 17483 at commit 
[`3c66930`](https://github.com/apache/spark/commit/3c6693035b023d7c9c9e2caa3014247c130eb037).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

2017-04-01 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17483#discussion_r109291089
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2977,6 +2981,51 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
   expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
 })
 
+test_that("catalog APIs, currentDatabase, setCurrentDatabase, 
listDatabases", {
+  expect_equal(currentDatabase(), "default")
+  expect_error(setCurrentDatabase("default"), NA)
+  expect_error(setCurrentDatabase("foo"),
+   "Error in setCurrentDatabase : analysis error - Database 
'foo' does not exist")
+  dbs <- collect(listDatabases())
+  expect_equal(names(dbs), c("name", "description", "locationUri"))
+  expect_equal(dbs[[1]], "default")
+})
+
+test_that("catalog APIs, listTables, listColumns, listFunctions", {
+  tb <- listTables()
+  count <- count(suppressWarnings(tables()))
+  expect_equal(nrow(tb), count)
+  expect_equal(colnames(tb), c("name", "database", "description", 
"tableType", "isTemporary"))
+
+  createOrReplaceTempView(as.DataFrame(cars), "cars")
+
+  tb <- listTables()
+  expect_equal(nrow(tb), count + 1)
+  tbs <- collect(tb)
+  expect_true(nrow(tbs[tbs$name == "cars", ]) > 0)
+  expect_error(listTables("bar"),
+   "Error in listTables : no such database - Database 'bar' 
not found")
+
+  c <- listColumns("cars")
+  expect_equal(nrow(c), 2)
+  expect_equal(colnames(c),
+   c("name", "description", "dataType", "nullable", 
"isPartition", "isBucket"))
+  expect_equal(collect(c)[[1]][[1]], "speed")
+  expect_error(listColumns("foo", "default"),
+   "Error in listColumns : analysis error - Table 'foo' does not exist 
in database 'default'")
+
+  dropTempView("cars")
+
+  f <- listFunctions()
+  expect_true(nrow(f) >= 200) # 250
+  expect_equal(colnames(f),
+   c("name", "database", "description", "className", 
"isTemporary"))
+  expect_equal(take(orderBy(f, "className"), 1)$className,
+   "org.apache.spark.sql.catalyst.expressions.Abs")
+  expect_error(listFunctions("foo_db"),
+   "Error in listFunctions : analysis error - Database 
'foo_db' does not exist")
+})
--- End diff --

sharp eyes :) I was planning to add tests.

I tested these manually, but the steps are more involved and these are only 
thin wrappers in R I think we should defer to scala tests.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

2017-04-01 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17483#discussion_r109290624
  
--- Diff: R/pkg/R/utils.R ---
@@ -846,6 +846,24 @@ captureJVMException <- function(e, method) {
 # Extract the first message of JVM exception.
 first <- strsplit(msg[2], "\r?\n\tat")[[1]][1]
 stop(paste0(rmsg, "analysis error - ", first), call. = FALSE)
+  } else
+if 
(any(grep("org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: ", 
stacktrace))) {
--- End diff --

ok, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

2017-04-01 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17483#discussion_r109290616
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -645,16 +645,17 @@ test_that("test tableNames and tables", {
   df <- read.json(jsonPath)
   createOrReplaceTempView(df, "table1")
   expect_equal(length(tableNames()), 1)
-  tables <- tables()
+  tables <- listTables()
--- End diff --

changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

2017-04-01 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17483#discussion_r109290611
  
--- Diff: R/pkg/R/catalog.R ---
@@ -0,0 +1,478 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# catalog.R: SparkSession catalog functions
+
+#' Create an external table
+#'
+#' Creates an external table based on the dataset in a data source,
+#' Returns a SparkDataFrame associated with the external table.
+#'
+#' The data source is specified by the \code{source} and a set of 
options(...).
+#' If \code{source} is not specified, the default data source configured by
+#' "spark.sql.sources.default" will be used.
+#'
+#' @param tableName a name of the table.
+#' @param path the path of files to load.
+#' @param source the name of external data source.
+#' @param schema the schema of the data for certain data source.
+#' @param ... additional argument(s) passed to the method.
+#' @return A SparkDataFrame.
+#' @rdname createExternalTable
+#' @export
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df <- createExternalTable("myjson", path="path/to/json", source="json", 
schema)
+#' }
+#' @name createExternalTable
+#' @method createExternalTable default
+#' @note createExternalTable since 1.4.0
+createExternalTable.default <- function(tableName, path = NULL, source = 
NULL, schema = NULL, ...) {
+  sparkSession <- getSparkSession()
+  options <- varargsToStrEnv(...)
+  if (!is.null(path)) {
+options[["path"]] <- path
+  }
+  catalog <- callJMethod(sparkSession, "catalog")
+  if (!is.null(schema)) {
+sdf <- callJMethod(catalog, "createExternalTable", tableName, source, 
options)
+  } else {
+sdf <- callJMethod(catalog, "createExternalTable", tableName, source, 
schema$jobj, options)
+  }
+  dataFrame(sdf)
+}
+
+createExternalTable <- function(x, ...) {
--- End diff --

right, I was just concerned that with `data.table`, `read.table` etc, table 
== data.frame in R as supposed to `hive table` or `managed table`, which could 
be fairly confusing.
anyway, I think I'll follow up with a PR for `createTable` but as of now 
`path` is optional for `createExternalTable`, even though it's potentially 
misleading, it does work now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17483
  
**[Test build #75447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75447/testReport)**
 for PR 17483 at commit 
[`3c66930`](https://github.com/apache/spark/commit/3c6693035b023d7c9c9e2caa3014247c130eb037).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109289876
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  case _: LessThan =>
+(minLeft >= maxRight, maxLeft < minRight)
+  case _: LessThanOrEqual =>
+(minLeft > maxRight, maxLeft <= minRight)
+
+  // Left > Right or Left >= Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
+  case _: GreaterThan =>
+(maxLeft <= minRight, minLeft > maxRight)
+  case _: GreaterThanOrEqual =>
+(maxLeft < minRight, minLeft >= maxRight)
+
+  // Left = Right or Left <=> Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minLeftmaxLeft
+  //  minRight   maxRight
--- End diff --

How about?
```
(minRight == maxRight) && (minLeft == minRight) && (maxLeft == maxRight)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, o

[GitHub] spark issue #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Commands Usi...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17394
  
**[Test build #75446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75446/testReport)**
 for PR 17394 at commit 
[`a6db8a3`](https://github.com/apache/spark/commit/a6db8a32b6ad498dde89d0e6358034ece21a5f8f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109289750
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
--- End diff --

uh. I missed that. Please feel free to remove it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109289677
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# co

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109289263
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -637,21 +570,7 @@ case class DescribeTableCommand(
 }
 DDLUtils.verifyPartitionProviderIsHive(spark, metadata, "DESC 
PARTITION")
 val partition = catalog.getPartition(table, partitionSpec)
-if (isExtended) {
-  describeExtendedDetailedPartitionInfo(table, metadata, partition, 
result)
-} else if (isFormatted) {
-  describeFormattedDetailedPartitionInfo(table, metadata, partition, 
result)
-  describeStorageInfo(metadata, result)
-}
-  }
-
-  private def describeExtendedDetailedPartitionInfo(
-  tableIdentifier: TableIdentifier,
-  table: CatalogTable,
-  partition: CatalogTablePartition,
-  buffer: ArrayBuffer[Row]): Unit = {
-append(buffer, "", "", "")
-append(buffer, "Detailed Partition Information " + partition.toString, 
"", "")
+if (isExtended) describeFormattedDetailedPartitionInfo(table, 
metadata, partition, result)
--- End diff --

This function `describeDetailedPartitionInfo ` will only be called for the 
DDL command
```SQL
  DESCRIBE [EXTENDED|FORMATTED] table_name PARTITION (partitionVal*)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109289119
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# co

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109289117
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# co

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109289114
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# co

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109289107
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# co

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-04-01 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/17276
  
@mridulm 
Sorry for late reply.  I opened the pr for 
SPARK-19659(https://github.com/apache/spark/pull/16989) and make these two PRs 
independent. Basically this pr is is to evaluate the performance(blocks are 
shuffled to disk) and stability(size in `MapStatus` is inaccurate and OOM can 
happen) of the implementation proposed in SPARK-19659.
I'd be so thankful if you have time to  comment on these two PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17501: [SPARK-20183][ML] Added outlierRatio arg to MLTestingUti...

2017-04-01 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17501
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109286524
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  case _: LessThan =>
+(minLeft >= maxRight, maxLeft < minRight)
+  case _: LessThanOrEqual =>
+(minLeft > maxRight, maxLeft <= minRight)
+
+  // Left > Right or Left >= Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
+  case _: GreaterThan =>
+(maxLeft <= minRight, minLeft > maxRight)
+  case _: GreaterThanOrEqual =>
+(maxLeft < minRight, minLeft >= maxRight)
+
+  // Left = Right or Left <=> Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minLeftmaxLeft
+  //  minRight   maxRight
--- End diff --

I think `Left = Right` is different from the other 2 cases, even the range 
completely overlaps, the filter selectivity is not 100%.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109286505
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  case _: LessThan =>
+(minLeft >= maxRight, maxLeft < minRight)
+  case _: LessThanOrEqual =>
+(minLeft > maxRight, maxLeft <= minRight)
+
+  // Left > Right or Left >= Right
+  // - no overlap:
+  //  minLeftmaxLeft  minRight  maxRight
+  // 0 --+--++-+--->
+  // - complete overlap:
+  //  minRight   maxRight minLeft   maxLeft
--- End diff --

doesn't the `complete overlap` here need to consider null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17415: [SPARK-19408][SQL] filter estimation on two colum...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17415#discussion_r109286468
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -550,6 +565,220 @@ case class FilterEstimation(plan: Filter, 
catalystConf: CatalystConf) extends Lo
 Some(percent.toDouble)
   }
 
+  /**
+   * Returns a percentage of rows meeting a binary comparison expression 
containing two columns.
+   * In SQL queries, we also see predicate expressions involving two 
columns
+   * such as "column-1 (op) column-2" where column-1 and column-2 belong 
to same table.
+   * Note that, if column-1 and column-2 belong to different tables, then 
it is a join
+   * operator's work, NOT a filter operator's work.
+   *
+   * @param op a binary comparison operator, including =, <=>, <, <=, >, >=
+   * @param attrLeft the left Attribute (or a column)
+   * @param attrRight the right Attribute (or a column)
+   * @param update a boolean flag to specify if we need to update 
ColumnStat of the given columns
+   *   for subsequent conditions
+   * @return an optional double value to show the percentage of rows 
meeting a given condition
+   */
+  def evaluateBinaryForTwoColumns(
+  op: BinaryComparison,
+  attrLeft: Attribute,
+  attrRight: Attribute,
+  update: Boolean): Option[Double] = {
+
+if (!colStatsMap.contains(attrLeft)) {
+  logDebug("[CBO] No statistics for " + attrLeft)
+  return None
+}
+if (!colStatsMap.contains(attrRight)) {
+  logDebug("[CBO] No statistics for " + attrRight)
+  return None
+}
+
+attrLeft.dataType match {
+  case StringType | BinaryType =>
+// TODO: It is difficult to support other binary comparisons for 
String/Binary
+// type without min/max and advanced statistics like histogram.
+logDebug("[CBO] No range comparison statistics for String/Binary 
type " + attrLeft)
+return None
+  case _ =>
+}
+
+val colStatLeft = colStatsMap(attrLeft)
+val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, 
attrLeft.dataType)
+  .asInstanceOf[NumericRange]
+val maxLeft = BigDecimal(statsRangeLeft.max)
+val minLeft = BigDecimal(statsRangeLeft.min)
+
+val colStatRight = colStatsMap(attrRight)
+val statsRangeRight = Range(colStatRight.min, colStatRight.max, 
attrRight.dataType)
+  .asInstanceOf[NumericRange]
+val maxRight = BigDecimal(statsRangeRight.max)
+val minRight = BigDecimal(statsRangeRight.min)
+
+// determine the overlapping degree between predicate range and 
column's range
+val (noOverlap: Boolean, completeOverlap: Boolean) = op match {
+  // Left < Right or Left <= Right
+  // - no overlap:
+  //  minRight   maxRight minLeft   maxLeft
+  // 0 --+--++-+--->
--- End diff --

the starting `0` looks confusing, the `max`, `min` values doesn't need to 
be positive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17487: [Spark-20145] [WIP] Fix range case insensitive bu...

2017-04-01 Thread samelamin

Github user samelamin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17487#discussion_r109286453
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 ---
@@ -105,7 +105,7 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
 case u: UnresolvedTableValuedFunction if 
u.functionArgs.forall(_.resolved) =>
-  builtinFunctions.get(u.functionName) match {
+  builtinFunctions.get(u.functionName.toLowerCase) match {
--- End diff --

@hvanhovell instead of creating a new case class, is there a way I can 
reuse the UnresolvedTableValuedFunction case class and just add in the SQLConf 
class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17492: [SPARK-19641][SQL] JSON schema inference in DROPM...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17492#discussion_r109286081
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
 ---
@@ -217,26 +221,43 @@ private[sql] object JsonInferSchema {
 }
   }
 
+  private def withParseMode(
--- End diff --

shall we embed this method in `withCorruptField`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17504: [SPARK-20186][SQL] BroadcastHint should use child...

2017-04-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17504


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17504: [SPARK-20186][SQL] BroadcastHint should use child's stat...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17504
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109285879
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109285811
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -637,21 +570,7 @@ case class DescribeTableCommand(
 }
 DDLUtils.verifyPartitionProviderIsHive(spark, metadata, "DESC 
PARTITION")
 val partition = catalog.getPartition(table, partitionSpec)
-if (isExtended) {
-  describeExtendedDetailedPartitionInfo(table, metadata, partition, 
result)
-} else if (isFormatted) {
-  describeFormattedDetailedPartitionInfo(table, metadata, partition, 
result)
-  describeStorageInfo(metadata, result)
-}
-  }
-
-  private def describeExtendedDetailedPartitionInfo(
-  tableIdentifier: TableIdentifier,
-  table: CatalogTable,
-  partition: CatalogTablePartition,
-  buffer: ArrayBuffer[Row]): Unit = {
-append(buffer, "", "", "")
-append(buffer, "Detailed Partition Information " + partition.toString, 
"", "")
+if (isExtended) describeFormattedDetailedPartitionInfo(table, 
metadata, partition, result)
--- End diff --

not related to this PR, but it looks weird that `DESC tbl` and `DESC tbl 
PARTITION (xxx)` has the same result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109285540
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109285423
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109285410
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109285396
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109285367
  
--- Diff: sql/core/src/test/resources/sql-tests/results/describe.sql.out ---
@@ -1,205 +1,248 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 14
+-- Number of queries: 28
 
 
 -- !query 0
-CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet 
PARTITIONED BY (c, d) COMMENT 'table_comment'
+CREATE TABLE t (a STRING, b INT, c STRING, d STRING) USING parquet
+  PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
+  COMMENT 'table_comment'
 -- !query 0 schema
 struct<>
 -- !query 0 output
 
 
 
 -- !query 1
-ALTER TABLE t ADD PARTITION (c='Us', d=1)
+CREATE TEMPORARY VIEW temp_v AS SELECT * FROM t
 -- !query 1 schema
 struct<>
 -- !query 1 output
 
 
 
 -- !query 2
-DESCRIBE t
+CREATE VIEW v AS SELECT * FROM t
 -- !query 2 schema
-struct
+struct<>
 -- !query 2 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 3
-DESC t
+ALTER TABLE t ADD PARTITION (c='Us', d=1)
 -- !query 3 schema
-struct
+struct<>
 -- !query 3 output
-# Partition Information
-# col_name data_type   comment 
-a  string  
-b  int 
-c  string  
-c  string  
-d  string  
-d  string
+
 
 
 -- !query 4
-DESC TABLE t
+DESCRIBE t
 -- !query 4 schema
 struct
 -- !query 4 output
-# Partition Information
-# col_name data_type   comment 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col_name data_type   comment 
+c  string  
 d  string
 
 
 -- !query 5
-DESC FORMATTED t
+DESC t
 -- !query 5 schema
 struct
 -- !query 5 output
-# Detailed Table Information   

-# Partition Information
-# Storage Information  
-# col_name data_type   comment 
-Comment:   table_comment   
-Compressed:No  
-Created: 
-Database:  default 
-Last Access: 
-Location: sql/core/spark-warehouse/t   
-Owner: 
-Partition Provider:Catalog 
-Storage Desc Parameters:   
-Table Parameters:  
-Table Type:MANAGED 
 a  string  
 b  int 
 c  string  
-c  string  
 d  string  
+# Partition Information
+# col

[GitHub] spark pull request #17394: [SPARK-20067] [SQL] Unify and Clean Up Desc Comma...

2017-04-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17394#discussion_r109285298
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -214,6 +215,7 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 // Returns true if the plan is supposed to be sorted.
 def isSorted(plan: LogicalPlan): Boolean = plan match {
--- End diff --

maybe call it `needSort`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14617
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75444/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-04-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14617
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-04-01 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14617
  
**[Test build #75444 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75444/testReport)**
 for PR 14617 at commit 
[`b30e7d0`](https://github.com/apache/spark/commit/b30e7d0c2e950179ef5801a697215ec9afd88226).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 133 matches

Mail list logo