date:20160830

[GitHub] spark issue #14874: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] create Alte...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14874
  
**[Test build #64703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64703/consoleFull)**
 for PR 14874 at commit 
[`8480945`](https://github.com/apache/spark/commit/8480945f2cc30972f33f1c55100c4263b83a3497).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bu...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14531#discussion_r76932912
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -58,18 +63,32 @@ case class CreateTableLikeCommand(
   throw new AnalysisException(
 s"Source table in CREATE TABLE LIKE does not exist: 
'$sourceTable'")
 }
-if (catalog.isTemporaryTable(sourceTable)) {
-  throw new AnalysisException(
-s"Source table in CREATE TABLE LIKE cannot be temporary: 
'$sourceTable'")
-}
 
-val tableToCreate = catalog.getTableMetadata(sourceTable).copy(
-  identifier = targetTable,
-  tableType = CatalogTableType.MANAGED,
-  createTime = System.currentTimeMillis,
-  lastAccessTime = -1).withNewStorage(locationUri = None)
+val sourceTableDesc = catalog.getTableMetadata(sourceTable)
 
-catalog.createTable(tableToCreate, ifNotExists)
+val newSerdeProp =
+  if (DDLUtils.isDatasourceTable(sourceTableDesc)) {
+val newPath = catalog.defaultTablePath(targetTable)
+sourceTableDesc.storage.properties.filterKeys(_.toLowerCase != 
"path") ++
+  Map("path" -> newPath)
+  } else {
+sourceTableDesc.storage.properties
+  }
+val newStorage = sourceTableDesc.storage.copy(
+  locationUri = None,
+  properties = newSerdeProp)
+
+val newTableDesc =
+  CatalogTable(
+identifier = targetTable,
+tableType = CatalogTableType.MANAGED,
+storage = newStorage,
+schema = sourceTableDesc.schema,
+provider = sourceTableDesc.provider,
--- End diff --

uh... You are right! So many things happened in the past 3 weeks. : ) Let 
me fix it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14823: [SPARK-17257][SQL] the physical plan of CREATE TABLE or ...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14823
  
I missed this ping. Will review it tomorrow. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14859
  
**[Test build #64702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64702/consoleFull)**
 for PR 14859 at commit 
[`b1a5076`](https://github.com/apache/spark/commit/b1a50764dcc71981fdc96e5a4b8d2e208f7692ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76932253
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala ---
@@ -140,7 +145,12 @@ private[hive] case class MetastoreRelation(
   sparkSession.sessionState.conf.defaultSizeInBytes
 })
 }
-  )
+if (catalogTable.catalogStats.isDefined) {
--- End diff --

Actually the `catalogStats` here is already obtained from Hive's number in 
`constructStatsFromHive` below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14889: [SPARK-17326][SPARKR] Fix tests with HiveContext in Spar...

2016-08-30 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14889
  
Thanks @HyukjinKwon - This is a great catch. LGTM pending tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14874: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] crea...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14874#discussion_r76932176
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -274,6 +276,75 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
 }
   }
 
+  test("should not allow ALTER VIEW AS when the view does not exist") {
+intercept[NoSuchTableException](
+  sql("ALTER VIEW testView AS SELECT 1, 2")
+)
+
+intercept[NoSuchTableException](
+  sql("ALTER VIEW default.testView AS SELECT 1, 2")
+)
+  }
+
+  test("ALTER VIEW AS should try to alter temp view first if view name has 
no database part") {
+withTempView("test_view") {
+  withView("test_view") {
--- End diff --

The same here. We just need to change the ordering:
```
withView("test_view") {
  withTempView("test_view") {
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14874: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] crea...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14874#discussion_r76932113
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -274,6 +276,75 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
 }
   }
 
+  test("should not allow ALTER VIEW AS when the view does not exist") {
+intercept[NoSuchTableException](
+  sql("ALTER VIEW testView AS SELECT 1, 2")
+)
+
+intercept[NoSuchTableException](
+  sql("ALTER VIEW default.testView AS SELECT 1, 2")
+)
+  }
+
+  test("ALTER VIEW AS should try to alter temp view first if view name has 
no database part") {
+withTempView("test_view") {
+  withView("test_view") {
+sql("CREATE VIEW test_view AS SELECT 1 AS a, 2 AS b")
+sql("CREATE TEMP VIEW test_view AS SELECT 1 AS a, 2 AS b")
+
+sql("ALTER VIEW test_view AS SELECT 3 AS i, 4 AS j")
+
+// The temporary view should be updated.
+checkAnswer(spark.table("test_view"), Row(3, 4))
+
+// The permanent view should stay same.
+checkAnswer(spark.table("default.test_view"), Row(1, 2))
+  }
+}
+  }
+
+  test("ALTER VIEW AS should alter permanent view if view name has 
database part") {
+withTempView("test_view") {
+  withView("test_view") {
--- End diff --

Based on my understanding, this will drop the temporary view because the 
resolution preference of `drop view` and then `withTempView` is unable to find 
any temporary view. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...

2016-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14859
  
HiveContext tests with SparkR is already being skipped due to 
https://github.com/apache/spark/pull/14889.

I manually fixed this and tested this, here. 
https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13873
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64698/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13873
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14874: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] crea...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14874#discussion_r76931872
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -274,6 +276,75 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
 }
   }
 
+  test("should not allow ALTER VIEW AS when the view does not exist") {
+intercept[NoSuchTableException](
+  sql("ALTER VIEW testView AS SELECT 1, 2")
+)
+
+intercept[NoSuchTableException](
+  sql("ALTER VIEW default.testView AS SELECT 1, 2")
+)
+  }
+
+  test("ALTER VIEW AS should try to alter temp view first if view name has 
no database part") {
+withTempView("test_view") {
+  withView("test_view") {
+sql("CREATE VIEW test_view AS SELECT 1 AS a, 2 AS b")
+sql("CREATE TEMP VIEW test_view AS SELECT 1 AS a, 2 AS b")
+
+sql("ALTER VIEW test_view AS SELECT 3 AS i, 4 AS j")
+
+// The temporary view should be updated.
+checkAnswer(spark.table("test_view"), Row(3, 4))
+
+// The permanent view should stay same.
+checkAnswer(spark.table("default.test_view"), Row(1, 2))
+  }
+}
+  }
+
+  test("ALTER VIEW AS should alter permanent view if view name has 
database part") {
+withTempView("test_view") {
+  withView("test_view") {
+sql("CREATE VIEW test_view AS SELECT 1 AS a, 2 AS b")
+sql("CREATE TEMP VIEW test_view AS SELECT 1 AS a, 2 AS b")
+
+sql("ALTER VIEW default.test_view AS SELECT 3 AS i, 4 AS j")
+
+// The temporary view should stay same.
+checkAnswer(spark.table("test_view"), Row(1, 2))
+
+// The permanent view should be updated.
+checkAnswer(spark.table("default.test_view"), Row(3, 4))
+  }
+}
+  }
+
+  test("ALTER VIEW AS should keep the previous table properties, comment, 
create_time, etc.") {
+withTempView("test_view") {
--- End diff --

`test_view` is not a temporary view, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14874: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] create Alte...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14874
  
LGTM except one minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14889: [SPARK-17326][SPARKR] Fix tests with HiveContext in Spar...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14889
  
**[Test build #64700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64700/consoleFull)**
 for PR 14889 at commit 
[`2cdcf4f`](https://github.com/apache/spark/commit/2cdcf4f17fd6023d35852f524e2826cc685814dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14862: [SPARK-17295][SQL] Create TestHiveSessionState use refle...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14862
  
**[Test build #64701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64701/consoleFull)**
 for PR 14862 at commit 
[`714e3a9`](https://github.com/apache/spark/commit/714e3a99c8af857f6ec275bba97160d5bd5d998c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13873
  
**[Test build #64698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64698/consoleFull)**
 for PR 13873 at commit 
[`b22867b`](https://github.com/apache/spark/commit/b22867b365dc679b71f8b7df8ce3516382f9f119).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76931629
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -235,14 +235,18 @@ class SessionCatalog(
* Note: If the underlying implementation does not support altering a 
certain field,
* this becomes a no-op.
*/
-  def alterTable(tableDefinition: CatalogTable): Unit = {
+  def alterTable(tableDefinition: CatalogTable, fromAnalyze: Boolean = 
false): Unit = {
--- End diff --

I'll fix this, thank you


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76931067
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala ---
@@ -140,7 +145,12 @@ private[hive] case class MetastoreRelation(
   sparkSession.sessionState.conf.defaultSizeInBytes
 })
 }
-  )
+if (catalogTable.catalogStats.isDefined) {
--- End diff --

When `catalogStats` is defined, why we still go to use Hive's number?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14883
  
After reading the code, it sounds like adding session-scoped ADD JAR is not 
simple if we want to pass it to every worker node after the PR: 
https://github.com/apache/spark/pull/8909

CC @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14889: [SPARK-17326][SPARKR] Fix tests with HiveContext in Spar...

2016-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14889
  
cc @rxin, @felixcheung and @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14889: [SPARK-17326][SPARKR] Fix tests with HiveContext ...

2016-08-30 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14889

[SPARK-17326][SPARKR] Fix tests with HiveContext in SparkR not to be 
skipped always

## What changes were proposed in this pull request?

Currently, `HiveContext` in SparkR is not being tested and always skipped.
This is because the initiation of `TestHiveContext` is being failed due to 
trying to load non-existing data paths (test tables).

This is introduced from https://github.com/apache/spark/pull/14005

This enables the tests with SparkR. 

## How was this patch tested?

Manually, 

**Before** (on Mac OS)

```
...
Skipped 

1. create DataFrame from RDD (@test_sparkSQL.R#200) - Hive is not build 
with SparkSQL, skipped
2. test HiveContext (@test_sparkSQL.R#1041) - Hive is not build with 
SparkSQL, skipped
3. read/write ORC files (@test_sparkSQL.R#1748) - Hive is not build with 
SparkSQL, skipped
4. enableHiveSupport on SparkSession (@test_sparkSQL.R#2480) - Hive is not 
build with SparkSQL, skipped
...
```

**After** (on Mac OS)

```
...
Skipped 

1. sparkJars tag in SparkContext (@test_Windows.R#21) - This test is only 
for Windows, skipped
...
```

Please refer the tests below (on Windows) 
 - Before: 
https://ci.appveyor.com/project/HyukjinKwon/spark/build/45-test123
 - After: https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-17326

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14889.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14889


commit 2cdcf4f17fd6023d35852f524e2826cc685814dd
Author: hyukjinkwon 
Date:   2016-08-31T06:32:29Z

Tests with HiveContext in SparkR being skipped always




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bu...

2016-08-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14531#discussion_r76930695
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -58,18 +63,32 @@ case class CreateTableLikeCommand(
   throw new AnalysisException(
 s"Source table in CREATE TABLE LIKE does not exist: 
'$sourceTable'")
 }
-if (catalog.isTemporaryTable(sourceTable)) {
-  throw new AnalysisException(
-s"Source table in CREATE TABLE LIKE cannot be temporary: 
'$sourceTable'")
-}
 
-val tableToCreate = catalog.getTableMetadata(sourceTable).copy(
-  identifier = targetTable,
-  tableType = CatalogTableType.MANAGED,
-  createTime = System.currentTimeMillis,
-  lastAccessTime = -1).withNewStorage(locationUri = None)
+val sourceTableDesc = catalog.getTableMetadata(sourceTable)
 
-catalog.createTable(tableToCreate, ifNotExists)
+val newSerdeProp =
+  if (DDLUtils.isDatasourceTable(sourceTableDesc)) {
+val newPath = catalog.defaultTablePath(targetTable)
+sourceTableDesc.storage.properties.filterKeys(_.toLowerCase != 
"path") ++
+  Map("path" -> newPath)
+  } else {
+sourceTableDesc.storage.properties
+  }
+val newStorage = sourceTableDesc.storage.copy(
+  locationUri = None,
+  properties = newSerdeProp)
+
+val newTableDesc =
+  CatalogTable(
+identifier = targetTable,
+tableType = CatalogTableType.MANAGED,
+storage = newStorage,
+schema = sourceTableDesc.schema,
+provider = sourceTableDesc.provider,
--- End diff --

permanent view too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76930682
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala ---
@@ -140,7 +145,12 @@ private[hive] case class MetastoreRelation(
   sparkSession.sessionState.conf.defaultSizeInBytes
 })
 }
-  )
+if (catalogTable.catalogStats.isDefined) {
--- End diff --

We can skip the above computation of `sizeInBytes` if `catalogStats` is 
defined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bu...

2016-08-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14531#discussion_r76930187
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -58,18 +63,32 @@ case class CreateTableLikeCommand(
   throw new AnalysisException(
 s"Source table in CREATE TABLE LIKE does not exist: 
'$sourceTable'")
 }
-if (catalog.isTemporaryTable(sourceTable)) {
-  throw new AnalysisException(
-s"Source table in CREATE TABLE LIKE cannot be temporary: 
'$sourceTable'")
-}
 
-val tableToCreate = catalog.getTableMetadata(sourceTable).copy(
-  identifier = targetTable,
-  tableType = CatalogTableType.MANAGED,
-  createTime = System.currentTimeMillis,
-  lastAccessTime = -1).withNewStorage(locationUri = None)
+val sourceTableDesc = catalog.getTableMetadata(sourceTable)
 
-catalog.createTable(tableToCreate, ifNotExists)
+val newSerdeProp =
+  if (DDLUtils.isDatasourceTable(sourceTableDesc)) {
+val newPath = catalog.defaultTablePath(targetTable)
+sourceTableDesc.storage.properties.filterKeys(_.toLowerCase != 
"path") ++
+  Map("path" -> newPath)
+  } else {
+sourceTableDesc.storage.properties
+  }
+val newStorage = sourceTableDesc.storage.copy(
+  locationUri = None,
+  properties = newSerdeProp)
+
+val newTableDesc =
+  CatalogTable(
+identifier = targetTable,
+tableType = CatalogTableType.MANAGED,
+storage = newStorage,
+schema = sourceTableDesc.schema,
+provider = sourceTableDesc.provider,
--- End diff --

if the source table is tmp view, the provider is `None`, we should set a 
default provider here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14710
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64697/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14710
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14710
  
**[Test build #64697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64697/consoleFull)**
 for PR 14710 at commit 
[`5a2f30f`](https://github.com/apache/spark/commit/5a2f30f7a31bd8edba1932cabcaf71332837b92d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14883
  
@cloud-fan @viirya After reading Hive's code, the JAR's scope should be 
session based. See the code:

https://github.com/apache/hive/blob/0438701395161325a429b4fd8211213276aa0fef/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L1188-L1200

Let me think how to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-30 Thread clarkfitzg

Github user clarkfitzg commented on the issue:

https://github.com/apache/spark/pull/14783
  
Yes, this is only for a bug fix. @shivaram mentioned in a previous email 
exchange it would be good to see some performance benchmarks as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76929362
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -401,6 +401,13 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 }
   }
 
+  override def alterTableStats(tableDefinition: CatalogTable): Unit = 
withClient {
--- End diff --

I see. OK, i'll remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14531
  
**[Test build #64699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64699/consoleFull)**
 for PR 14531 at commit 
[`eabf31f`](https://github.com/apache/spark/commit/eabf31fdc1b9491bca0f051808e7db0c1b6e12d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14531
  
Let me do it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14873: [SPARK-17308]Improved the spark core code by replacing a...

2016-08-30 Thread shiv4nsh

Github user shiv4nsh commented on the issue:

https://github.com/apache/spark/pull/14873
  
@srowen : We are good on merging on this Right? or do this PR require some 
additional changes ! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64696/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14868
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14868
  
**[Test build #64696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64696/consoleFull)**
 for PR 14868 at commit 
[`3f08c02`](https://github.com/apache/spark/commit/3f08c027add03c59251583420c76582a085b3573).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14888: [SPARK-17324] [SQL] Remove Direct Usage of HiveClient in...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14888
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64694/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14888: [SPARK-17324] [SQL] Remove Direct Usage of HiveClient in...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14888
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14888: [SPARK-17324] [SQL] Remove Direct Usage of HiveClient in...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14888
  
**[Test build #64694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64694/consoleFull)**
 for PR 14888 at commit 
[`d03e65d`](https://github.com/apache/spark/commit/d03e65d0f9b119ed767da124da360cfcf9e966b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14868
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64695/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14868
  
**[Test build #64695 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64695/consoleFull)**
 for PR 14868 at commit 
[`bc70a00`](https://github.com/apache/spark/commit/bc70a0023bb24175c06c03cb7acad7f9a6d34e36).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76926260
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
+  assert(rel.statistics.sizeInBytes === totalSize)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  private def checkLogicalRelationStats(tableName: String, rowCount: 
Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
LogicalRelation =>
+  assert(rel.statistics.sizeInBytes === rel.relation.sizeInBytes)
--- End diff --

If this is the reason, you need to leave a TODO task in the code. 
Otherwise, we might forget it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76926133
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -401,6 +401,13 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 }
   }
 
+  override def alterTableStats(tableDefinition: CatalogTable): Unit = 
withClient {
--- End diff --

Yeah, agree with @cloud-fan . 

If we want to set it using `alter table`, we should use the dedicated 
command (just like what Hive does):
```SQL
ALTER TABLE UPDATE STATISTICS SET
```
Let us remove `alterTableStats` and minimize the code changes? We can 
discuss how to do it properly when we start this JIRA: 
https://issues.apache.org/jira/browse/SPARK-17282


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13252
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64693/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13252
  
**[Test build #64693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64693/consoleFull)**
 for PR 13252 at commit 
[`031c9da`](https://github.com/apache/spark/commit/031c9dacba77c6197626d02ceb0e1081b18e187b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76924362
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -401,6 +401,13 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 }
   }
 
+  override def alterTableStats(tableDefinition: CatalogTable): Unit = 
withClient {
--- End diff --

I'm not sure we wanna support the second way to set properties. If users 
set them with ALTER TABLE, we should throw exception. cc @yhuai @gatorsmile 
what do you think?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...

2016-08-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14883
  

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources

Looks like it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r76923998
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -171,6 +171,7 @@ private[sql] class SessionState(sparkSession: 
SparkSession) {
   }
 
   def addJar(path: String): Unit = {
--- End diff --

hmm, I think the addition of resources should be session-scoped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Eliminate redundant cast from ...

2016-08-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13704


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13873: [SPARK-16167][SQL] RowEncoder should preserve array/map ...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13873
  
**[Test build #64698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64698/consoleFull)**
 for PR 13873 at commit 
[`b22867b`](https://github.com/apache/spark/commit/b22867b365dc679b71f8b7df8ce3516382f9f119).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10225: [SPARK-12196][Core] Store/retrieve blocks in diff...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10225#discussion_r76923680
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -50,35 +50,98 @@ private[spark] class DiskBlockManager(conf: SparkConf, 
deleteFilesOnStop: Boolea
 
   private val shutdownHook = addShutdownHook()
 
+  private abstract class FileAllocationStrategy {
+def apply(filename: String): File
+
+protected def getFile(filename: String, storageDirs: Array[File]): 
File = {
+  require(storageDirs.nonEmpty, "could not find file when the 
directories are empty")
+
+  // Figure out which local directory it hashes to, and which 
subdirectory in that
+  val hash = Utils.nonNegativeHash(filename)
+  val dirId = localDirs.indexOf(storageDirs(hash % storageDirs.length))
+  val subDirId = (hash / storageDirs.length) % subDirsPerLocalDir
+
+  // Create the subdirectory if it doesn't already exist
+  val subDir = subDirs(dirId).synchronized {
+val old = subDirs(dirId)(subDirId)
+if (old != null) {
+  old
+} else {
+  val newDir = new File(localDirs(dirId), "%02x".format(subDirId))
+  if (!newDir.exists() && !newDir.mkdir()) {
+throw new IOException(s"Failed to create local dir in 
$newDir.")
+  }
+  subDirs(dirId)(subDirId) = newDir
+  newDir
+}
+  }
+
+  new File(subDir, filename)
+}
+  }
+
   /** Looks up a file by hashing it into one of our local subdirectories. 
*/
   // This method should be kept in sync with
   // 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver#getFile().
-  def getFile(filename: String): File = {
-// Figure out which local directory it hashes to, and which 
subdirectory in that
-val hash = Utils.nonNegativeHash(filename)
-val dirId = hash % localDirs.length
-val subDirId = (hash / localDirs.length) % subDirsPerLocalDir
-
-// Create the subdirectory if it doesn't already exist
-val subDir = subDirs(dirId).synchronized {
-  val old = subDirs(dirId)(subDirId)
-  if (old != null) {
-old
-  } else {
-val newDir = new File(localDirs(dirId), "%02x".format(subDirId))
-if (!newDir.exists() && !newDir.mkdir()) {
-  throw new IOException(s"Failed to create local dir in $newDir.")
-}
-subDirs(dirId)(subDirId) = newDir
-newDir
+  private object hashAllocator extends FileAllocationStrategy {
+def apply(filename: String): File = getFile(filename, localDirs)
+  }
+
+  /** Looks up a file by hierarchy way in different speed storage devices. 
*/
+  private val hierarchyStore = 
conf.getOption("spark.storage.hierarchyStore")
+  private class HierarchyAllocator extends FileAllocationStrategy {
+case class LayerInfo(key: String, threshold: Long, dirs: Array[File])
+val hsSpecs: Array[(String, Long)] =
+  // e.g.: hierarchyStore = "ssd 200GB, hdd 100GB"
+  hierarchyStore.get.trim.split(",").map {
+s => val x = s.trim.split(" +")
+  (x(0).toLowerCase, Utils.byteStringAsBytes(x(1)))
   }
+val hsLayers: Array[LayerInfo] = hsSpecs.map(
+  s => LayerInfo(s._1, s._2, 
localDirs.filter(_.getPath.toLowerCase.containsSlice(s._1)))
+)
+val lastLayerDirs = localDirs.filter(dir => 
!hsLayers.exists(_.dirs.contains(dir)))
+val allLayers: Array[LayerInfo] = hsLayers :+
+  LayerInfo("Last Storage", 10.toLong, lastLayerDirs)
+val finalLayers: Array[LayerInfo] = allLayers.filter(_.dirs.nonEmpty)
+logInfo("Hierarchy store info:")
+for (layer <- finalLayers) {
+  logInfo("Layer: %s, Threshold: %s".format(layer.key, 
Utils.bytesToString(layer.threshold)))
+  layer.dirs.foreach { dir => 
logInfo("\t%s".format(dir.getCanonicalPath)) }
 }
 
-new File(subDir, filename)
+def apply(filename: String): File = {
+  var availableFile: File = null
+  for (layer <- finalLayers) {
--- End diff --

Once you get `availableFile`, you can stop this loop early to prevent 
creating useless subdirs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apach

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

2016-08-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13704
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76923549
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -21,25 +21,55 @@ import scala.util.control.NonFatal
 
 import org.apache.hadoop.fs.{FileSystem, Path}
 
-import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.{AnalysisException, Dataset, Row, SparkSession}
 import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
 import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.Statistics
+import org.apache.spark.sql.execution.datasources.LogicalRelation
 
 
 /**
  * Analyzes the given table in the current database to generate 
statistics, which will be
  * used in query optimizations.
- *
- * Right now, it only supports Hive tables and it only updates the size of 
a Hive table
- * in the Hive metastore.
  */
-case class AnalyzeTableCommand(tableName: String) extends RunnableCommand {
+case class AnalyzeTableCommand(tableName: String, noscan: Boolean = true) 
extends RunnableCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val sessionState = sparkSession.sessionState
 val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
 val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
 
+def updateTableStats(
+catalogTable: CatalogTable,
+oldTotalSize: Long,
+oldRowCount: Long,
+newTotalSize: Long): Unit = {
+
+  var newStats: Option[Statistics] = None
+  if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+newStats = Some(Statistics(sizeInBytes = newTotalSize))
+  }
+  if (!noscan) {
+val newRowCount = Dataset.ofRows(sparkSession, relation).count()
+if (newRowCount >= 0 && newRowCount != oldRowCount) {
+  newStats = if (newStats.isDefined) {
+newStats.map(_.copy(rowCount = Some(BigInt(newRowCount
+  } else {
+Some(Statistics(sizeInBytes = oldTotalSize, rowCount = 
Some(BigInt(newRowCount
+  }
+}
+  }
+  // Update the metastore if the above statistics of the table are 
different from those
+  // recorded in the metastore.
+  if (newStats.isDefined) {
+sessionState.catalog.alterTable(
+  catalogTable.copy(catalogStats = newStats), fromAnalyze = true)
+
+// Refresh the cache of the table in the catalog.
--- End diff --

This comment is confusing. We have two caches. One is the data cache, 
another is logical plan cache for data source tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76923392
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -21,25 +21,55 @@ import scala.util.control.NonFatal
 
 import org.apache.hadoop.fs.{FileSystem, Path}
 
-import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.{AnalysisException, Dataset, Row, SparkSession}
 import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
 import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.Statistics
+import org.apache.spark.sql.execution.datasources.LogicalRelation
 
 
 /**
  * Analyzes the given table in the current database to generate 
statistics, which will be
  * used in query optimizations.
- *
- * Right now, it only supports Hive tables and it only updates the size of 
a Hive table
- * in the Hive metastore.
  */
-case class AnalyzeTableCommand(tableName: String) extends RunnableCommand {
+case class AnalyzeTableCommand(tableName: String, noscan: Boolean = true) 
extends RunnableCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val sessionState = sparkSession.sessionState
 val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
 val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
 
+def updateTableStats(
--- End diff --

uh, this function interrupts the whole flow. Maybe you can move it out of 
this run function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...

2016-08-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14531
  
can you resolve the conflicts? thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionState to...

2016-08-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14883
  
@gatorsmile is the `ADD JAR` command in hive session-scoped? Our current 
implementation may be wrong..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r76922568
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -171,6 +171,7 @@ private[sql] class SessionState(sparkSession: 
SparkSession) {
   }
 
   def addJar(path: String): Unit = {
--- End diff --

so the `addJar` is not session-scoped by definition? I think 
`sparkSession.sparkContext.addJar(path)` is also cross-session


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r76922145
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -509,4 +509,10 @@ class InMemoryCatalog(
 StringUtils.filterPattern(catalog(db).functions.keysIterator.toSeq, 
pattern)
   }
 
+  // 
--
+  // Resources
+  // 
--
+
+  override def addJar(path: String): Unit = { /* no-op */ }
--- End diff --

Yea, I think throw exception is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14750
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14750
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64691/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76922102
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
+  assert(rel.statistics.sizeInBytes === totalSize)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  private def checkLogicalRelationStats(tableName: String, rowCount: 
Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
LogicalRelation =>
+  assert(rel.statistics.sizeInBytes === rel.relation.sizeInBytes)
--- End diff --

btw, spark 2.0 has some bugs on Windows during tests, mainly about paths.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14750
  
**[Test build #64691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64691/consoleFull)**
 for PR 14750 at commit 
[`52db0ed`](https://github.com/apache/spark/commit/52db0ed8dd7d9bb3f201b648a999068597942d26).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10225: [SPARK-12196][Core] Store/retrieve blocks in diff...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/10225#discussion_r76922041
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -50,35 +50,98 @@ private[spark] class DiskBlockManager(conf: SparkConf, 
deleteFilesOnStop: Boolea
 
   private val shutdownHook = addShutdownHook()
 
+  private abstract class FileAllocationStrategy {
+def apply(filename: String): File
+
+protected def getFile(filename: String, storageDirs: Array[File]): 
File = {
+  require(storageDirs.nonEmpty, "could not find file when the 
directories are empty")
+
+  // Figure out which local directory it hashes to, and which 
subdirectory in that
+  val hash = Utils.nonNegativeHash(filename)
+  val dirId = localDirs.indexOf(storageDirs(hash % storageDirs.length))
+  val subDirId = (hash / storageDirs.length) % subDirsPerLocalDir
+
+  // Create the subdirectory if it doesn't already exist
+  val subDir = subDirs(dirId).synchronized {
+val old = subDirs(dirId)(subDirId)
+if (old != null) {
+  old
+} else {
+  val newDir = new File(localDirs(dirId), "%02x".format(subDirId))
+  if (!newDir.exists() && !newDir.mkdir()) {
+throw new IOException(s"Failed to create local dir in 
$newDir.")
+  }
+  subDirs(dirId)(subDirId) = newDir
+  newDir
+}
+  }
+
+  new File(subDir, filename)
+}
+  }
+
   /** Looks up a file by hashing it into one of our local subdirectories. 
*/
   // This method should be kept in sync with
   // 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver#getFile().
-  def getFile(filename: String): File = {
-// Figure out which local directory it hashes to, and which 
subdirectory in that
-val hash = Utils.nonNegativeHash(filename)
-val dirId = hash % localDirs.length
-val subDirId = (hash / localDirs.length) % subDirsPerLocalDir
-
-// Create the subdirectory if it doesn't already exist
-val subDir = subDirs(dirId).synchronized {
-  val old = subDirs(dirId)(subDirId)
-  if (old != null) {
-old
-  } else {
-val newDir = new File(localDirs(dirId), "%02x".format(subDirId))
-if (!newDir.exists() && !newDir.mkdir()) {
-  throw new IOException(s"Failed to create local dir in $newDir.")
-}
-subDirs(dirId)(subDirId) = newDir
-newDir
+  private object hashAllocator extends FileAllocationStrategy {
+def apply(filename: String): File = getFile(filename, localDirs)
+  }
+
+  /** Looks up a file by hierarchy way in different speed storage devices. 
*/
+  private val hierarchyStore = 
conf.getOption("spark.storage.hierarchyStore")
+  private class HierarchyAllocator extends FileAllocationStrategy {
+case class LayerInfo(key: String, threshold: Long, dirs: Array[File])
+val hsSpecs: Array[(String, Long)] =
+  // e.g.: hierarchyStore = "ssd 200GB, hdd 100GB"
+  hierarchyStore.get.trim.split(",").map {
+s => val x = s.trim.split(" +")
+  (x(0).toLowerCase, Utils.byteStringAsBytes(x(1)))
--- End diff --

It is better to add error handling here to prevent wrong format.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-30 Thread sun-rui

Github user sun-rui commented on the issue:

https://github.com/apache/spark/pull/14783
  
@clarkfitzg, your patch is for bug fix but not for performance improvement, 
right? If so, since there is no performance regression according to your 
benchmark, let's focus on the functionality. We can address performance issue 
in other JIRA issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76921850
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -88,15 +116,21 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 }
   }.getOrElse(0L)
 
-// Update the Hive metastore if the total size of the table is 
different than the size
-// recorded in the Hive metastore.
-// This logic is based on 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
-if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
-  sessionState.catalog.alterTable(
-catalogTable.copy(
-  properties = relation.catalogTable.properties +
-(AnalyzeTableCommand.TOTAL_SIZE_FIELD -> 
newTotalSize.toString)))
-}
+updateTableStats(
+  catalogTable,
+  oldTotalSize = 
catalogTable.catalogStats.map(_.sizeInBytes.toLong).getOrElse(0L),
+  oldRowCount = 
catalogTable.catalogStats.flatMap(_.rowCount.map(_.toLong)).getOrElse(-1L),
+  newTotalSize = newTotalSize)
+
+  // data source tables have been converted into LogicalRelations
+  case logicalRel: LogicalRelation if 
logicalRel.metastoreTableIdentifier.isDefined =>
+val tableIdentifier = logicalRel.metastoreTableIdentifier.get
+val catalogTable = 
sessionState.catalog.getTableMetadata(tableIdentifier)
+updateTableStats(
+  catalogTable,
+  oldTotalSize = logicalRel.statistics.sizeInBytes.toLong,
+  oldRowCount = 
logicalRel.statistics.rowCount.map(_.toLong).getOrElse(-1L),
+  newTotalSize = logicalRel.relation.sizeInBytes)
 
   case otherRelation =>
 throw new AnalysisException(s"ANALYZE TABLE is only supported for 
Hive tables, " +
--- End diff --

This message is out of dated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr s...

2016-08-30 Thread sun-rui

Github user sun-rui commented on the issue:

https://github.com/apache/spark/pull/14744
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14887: [SPARK-17321][YARN] YARN shuffle service should use good...

2016-08-30 Thread SaintBacchus

Github user SaintBacchus commented on the issue:

https://github.com/apache/spark/pull/14887
  
If there are some bad disk in local-dirs, `NodeManager` will not pass these 
bad disk to spark executor. So it's not necessary to check it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76921588
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
+  assert(rel.statistics.sizeInBytes === totalSize)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  private def checkLogicalRelationStats(tableName: String, rowCount: 
Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
LogicalRelation =>
+  assert(rel.statistics.sizeInBytes === rel.relation.sizeInBytes)
--- End diff --

seems that parquet size is different in Windows and Linux. I set an 
expected value initially, it worked on Windows but it went wrong in Spark CI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14710
  
**[Test build #64697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64697/consoleFull)**
 for PR 14710 at commit 
[`5a2f30f`](https://github.com/apache/spark/commit/5a2f30f7a31bd8edba1932cabcaf71332837b92d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14883#discussion_r76921445
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -509,4 +509,10 @@ class InMemoryCatalog(
 StringUtils.filterPattern(catalog(db).functions.keysIterator.toSeq, 
pattern)
   }
 
+  // 
--
+  // Resources
+  // 
--
+
+  override def addJar(path: String): Unit = { /* no-op */ }
--- End diff --

Will no-op here make user think that the Jar is loaded?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-30 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/14710
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76921301
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -33,7 +33,8 @@ import org.apache.spark.util.Utils
 case class LogicalRelation(
 relation: BaseRelation,
 expectedOutputAttributes: Option[Seq[Attribute]] = None,
-metastoreTableIdentifier: Option[TableIdentifier] = None)
+metastoreTableIdentifier: Option[TableIdentifier] = None,
+inheritedStats: Option[Statistics] = None)
--- End diff --

it uses catalogStats of CatalogTable in MetastoreRelation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76921020
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
+  assert(rel.statistics.sizeInBytes === totalSize)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  private def checkLogicalRelationStats(tableName: String, rowCount: 
Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
LogicalRelation =>
+  assert(rel.statistics.sizeInBytes === rel.relation.sizeInBytes)
--- End diff --

Can you put a comment to explain why you just compare these two values, 
instead of comparing them with the expected values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76920921
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
--- End diff --

this is useless, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14710
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64692/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14710: [SPARK-16533][CORE] resolve deadlocking in driver when e...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14710
  
**[Test build #64692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64692/consoleFull)**
 for PR 14710 at commit 
[`5a2f30f`](https://github.com/apache/spark/commit/5a2f30f7a31bd8edba1932cabcaf71332837b92d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76920817
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
+  assert(rel.statistics.sizeInBytes === totalSize)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  private def checkLogicalRelationStats(tableName: String, rowCount: 
Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
LogicalRelation =>
+  assert(rel.statistics.sizeInBytes === rel.relation.sizeInBytes)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  test("test table-level statistics for hive tables created in 
HiveExternalCatalog") {
+val textTable = "textTable"
+val parquetTable = "parquetTable"
+val orcTable = "orcTable"
+withTable(textTable, parquetTable, orcTable) {
+  sql(s"CREATE TABLE $textTable (key STRING, value STRING) STORED AS 
TEXTFILE")
+  sql(s"INSERT INTO TABLE $textTable SELECT * FROM src")
+
+  // noscan won't count the number of rows
+  sql(s"ANALYZE TABLE $textTable COMPUTE STATISTICS noscan")
+  checkMetastoreRelationStats(textTable, 5812, None)
+
+  // without noscan, we count the number of rows
+  sql(s"ANALYZE TABLE $textTable COMPUTE STATISTICS")
+  checkMetastoreRelationStats(textTable, 5812, Some(500))
+
+  // test whether the old stats are removed
+  sql(s"INSERT INTO TABLE $textTable SELECT * FROM src")
+  sql(s"ANALYZE TABLE $textTable COMPUTE STATISTICS noscan")
+  checkMetastoreRelationStats(textTable, 11624, None)
+
+  // test statistics of LogicalRelation inherited from 
MetastoreRelation
+  sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) STORED 
AS PARQUET")
+  sql(s"CREATE TABLE $orcTable (key STRING, value STRING) STORED AS 
ORC")
+  sql(s"INSERT INTO TABLE $parquetTable SELECT * FROM src")
+  sql(s"INSERT INTO TABLE $orcTable SELECT * FROM src")
+  sql(s"ANALYZE TABLE $parquetTable COMPUTE STATISTICS")
+  sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS")
+
+  checkLogicalRelationStats(parquetTable, Some(500))
+
+  withSQLConf("spark.sql.hive.convertMetastoreOrc" -> "true") {
+checkLogicalRelationStats(orcTable, Some(500))
+  }
+}
+  }
+
+  test("test table-level statistics for data source table created in 
HiveExternalCatalog") {
+val parquetTable = "parquetTable"
+withTable(parquetTable) {
+  sql(s"CREATE TABLE $parquetTable (key STRING, value STRING) USING 
PARQUET")
--- End diff --

Maybe you can check its `CatalogTable` and confirm it is a datasource table 
thought `DDLUtils.isDatasourceTable(table)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76920785
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -33,7 +33,8 @@ import org.apache.spark.util.Utils
 case class LogicalRelation(
 relation: BaseRelation,
 expectedOutputAttributes: Option[Seq[Attribute]] = None,
-metastoreTableIdentifier: Option[TableIdentifier] = None)
+metastoreTableIdentifier: Option[TableIdentifier] = None,
+inheritedStats: Option[Statistics] = None)
--- End diff --

For MetastoreRelation, isn't LogicalRelation simply using 
MetastoreRelation's statistics?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14874: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] create Alte...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14874
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64690/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14874: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] create Alte...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14874
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14874: [SPARK-17180][SPARK-17309][SPARK-17323][SQL] create Alte...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14874
  
**[Test build #64690 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64690/consoleFull)**
 for PR 14874 at commit 
[`51726ff`](https://github.com/apache/spark/commit/51726ff82fa818717f9ec52b89ca17a62ca8bb14).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AlterViewAsCommand(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76920593
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
+  assert(rel.statistics.sizeInBytes === totalSize)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  private def checkLogicalRelationStats(tableName: String, rowCount: 
Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
LogicalRelation =>
+  assert(rel.statistics.sizeInBytes === rel.relation.sizeInBytes)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  test("test table-level statistics for hive tables created in 
HiveExternalCatalog") {
+val textTable = "textTable"
+val parquetTable = "parquetTable"
+val orcTable = "orcTable"
+withTable(textTable, parquetTable, orcTable) {
+  sql(s"CREATE TABLE $textTable (key STRING, value STRING) STORED AS 
TEXTFILE")
+  sql(s"INSERT INTO TABLE $textTable SELECT * FROM src")
--- End diff --

To ensure the correctness, we also `checkMetastoreRelationStats` before 
data changes (`INSERT`) and before statistics collection (`ANALYZE`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76920340
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
+  assert(rel.statistics.sizeInBytes === totalSize)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  private def checkLogicalRelationStats(tableName: String, rowCount: 
Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
LogicalRelation =>
+  assert(rel.statistics.sizeInBytes === rel.relation.sizeInBytes)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  test("test table-level statistics for hive tables created in 
HiveExternalCatalog") {
--- End diff --

Can you split the test case to multiple smaller independent ones?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76920328
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -33,7 +33,8 @@ import org.apache.spark.util.Utils
 case class LogicalRelation(
 relation: BaseRelation,
 expectedOutputAttributes: Option[Seq[Attribute]] = None,
-metastoreTableIdentifier: Option[TableIdentifier] = None)
+metastoreTableIdentifier: Option[TableIdentifier] = None,
+inheritedStats: Option[Statistics] = None)
--- End diff --

Since LogicalRelation is converted from Parquet/Orc MetastoreRelation or 
SimpleCatalogRelation, I think the current name is more indicative


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76920300
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
MetastoreRelation =>
+  rel.statistics
+  assert(rel.statistics.sizeInBytes === totalSize)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  private def checkLogicalRelationStats(tableName: String, rowCount: 
Option[BigInt]): Unit = {
+val df = sql(s"SELECT * FROM $tableName")
+val relations = df.queryExecution.analyzed.collect { case rel: 
LogicalRelation =>
+  assert(rel.statistics.sizeInBytes === rel.relation.sizeInBytes)
+  assert(rel.statistics.rowCount === rowCount)
+}
+assert(relations.size === 1)
+  }
+
+  test("test table-level statistics for hive tables created in 
HiveExternalCatalog") {
+val textTable = "textTable"
+val parquetTable = "parquetTable"
+val orcTable = "orcTable"
+withTable(textTable, parquetTable, orcTable) {
+  sql(s"CREATE TABLE $textTable (key STRING, value STRING) STORED AS 
TEXTFILE")
+  sql(s"INSERT INTO TABLE $textTable SELECT * FROM src")
+
+  // noscan won't count the number of rows
+  sql(s"ANALYZE TABLE $textTable COMPUTE STATISTICS noscan")
+  checkMetastoreRelationStats(textTable, 5812, None)
--- End diff --

`checkMetastoreRelationStats(textTable, 5812, None)`
=> `checkMetastoreRelationStats(textTable, expectedTotalSize =5812, 
expectedRowCount = None)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76920234
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +169,81 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  private def checkMetastoreRelationStats(
+  tableName: String,
+  totalSize: Long,
+  rowCount: Option[BigInt]): Unit = {
--- End diff --

`totalSize` => `expectedTotalSize`
`rowCount` => `expectedRowCount`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Autom...

2016-08-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14859#discussion_r76920073
  
--- Diff: appveyor.yml ---
@@ -0,0 +1,43 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+version: "{build}-{branch}"
+
+shallow_clone: true
+
+platform: x64
+configuration: Debug
+
+cache:
+  - C:\Users\appveyor\.m2
+
+install:
+  # Install maven and dependencies
+  - ps: .\dev\appveyor-install-dependencies.ps1
+  # Required package for R unit tests
+  - cmd: R -e "install.packages('testthat', 
repos='http://cran.us.r-project.org')"
+
+build_script:
+  - cmd: mvn -DskipTests -Psparkr package
--- End diff --

Thanks @dongjoon-hyun ! I am testing with extra profiles. I will take a 
look and address your comment as far as I can!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76919947
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -235,14 +235,18 @@ class SessionCatalog(
* Note: If the underlying implementation does not support altering a 
certain field,
* this becomes a no-op.
*/
-  def alterTable(tableDefinition: CatalogTable): Unit = {
+  def alterTable(tableDefinition: CatalogTable, fromAnalyze: Boolean = 
false): Unit = {
--- End diff --

please see my 
[comment](https://github.com/apache/spark/pull/14712#discussion_r76919518)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76919518
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -401,6 +401,13 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 }
   }
 
+  override def alterTableStats(tableDefinition: CatalogTable): Unit = 
withClient {
--- End diff --

@viirya There's two ways to set/replace properties:
1. We use statistics info in CatalogTable to set these properties - this is 
called by analyze command and I put this logic into alterTableStats method;
2. We set these properties directly - this is the path of alter table 
command.
If we put alterTableStats logic in the original alterTable method, we can't 
set the properties by the second way, because they will be always replaced by 
stats in CatalogTable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14868: [SPARK-16283][SQL][WIP] Implements percentile_app...

2016-08-30 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14868#discussion_r76918794
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ---
@@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.nio.ByteBuffer
+
+import com.google.common.primitives.{Doubles, Ints, Longs}
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.{InternalRow}
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile.{PercentileDigest}
+import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData}
+import org.apache.spark.sql.catalyst.util.QuantileSummaries
+import 
org.apache.spark.sql.catalyst.util.QuantileSummaries.{defaultCompressThreshold, 
Stats}
+import org.apache.spark.sql.types._
+
+/**
+ * The ApproximatePercentile function returns the approximate 
percentile(s) of a column at the given
+ * percentage(s). A percentile is a watermark value below which a given 
percentage of the column
+ * values fall. For example, the percentile of column `col` at percentage 
50% is the median of
+ * column `col`.
+ *
+ * This function supports partial aggregation.
+ *
+ * @param child child expression that can produce column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or
+ * an array of percentage values. Each 
percentage value must be between
+ * 0.0 and 1.0.
+ * @param accuracyExpression Integer literal expression of approximation 
accuracy. Higher value
+ *   yields better accuracy, the default value is
+ *   DEFAULT_PERCENTILE_ACCURACY.
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage [, accuracy]) - Returns the approximate 
percentile value of numeric
+  column `col` at the given percentage. The value of percentage must 
be between 0.0
+  and 1.0. The `accuracy` parameter (default: 1) is a positive 
integer literal which
+  controls approximation accuracy at the cost of memory. Higher value 
of `accuracy` yields
+  better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...) [, accuracy]) - 
Returns the approximate
+  percentile array of column `col` at the given percentage array. Each 
value of the
+  percentage array must be between 0.0 and 1.0. The `accuracy` 
parameter (default: 1) is
+   a positive integer literal which controls approximation accuracy at 
the cost of memory.
+   Higher value of `accuracy` yields better accuracy, `1.0/accuracy` 
is the relative error of
+   the approximation.
+""")
+case class ApproximatePercentile(
+child: Expression,
+percentageExpression: Expression,
+accuracyExpression: Expression,
+override val mutableAggBufferOffset: Int,
+override val inputAggBufferOffset: Int) extends 
TypedImperativeAggregate[PercentileDigest] {
+
+  def this(child: Expression, percentageExpression: Expression, 
accuracyExpression: Expression) = {
+this(child, percentageExpression, accuracyExpression, 0, 0)
+  }
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 
Literal(ApproximatePercentile.DEFAULT_PERCENTILE_ACCURACY))
+  }
+
+  // Mark as lazy so that accuracyExpression is not evaluated during tree 
transformation.
+  private lazy val

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14868
  
**[Test build #64696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64696/consoleFull)**
 for PR 14868 at commit 
[`3f08c02`](https://github.com/apache/spark/commit/3f08c027add03c59251583420c76582a085b3573).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76918381
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -33,7 +33,8 @@ import org.apache.spark.util.Utils
 case class LogicalRelation(
 relation: BaseRelation,
 expectedOutputAttributes: Option[Seq[Attribute]] = None,
-metastoreTableIdentifier: Option[TableIdentifier] = None)
+metastoreTableIdentifier: Option[TableIdentifier] = None,
+inheritedStats: Option[Statistics] = None)
--- End diff --

How about `expectedStatistics`, or `catalogStatistics`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14888: [SPARK-17324] [SQL] Remove Direct Usage of HiveClient in...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14888
  
**[Test build #64694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64694/consoleFull)**
 for PR 14888 at commit 
[`d03e65d`](https://github.com/apache/spark/commit/d03e65d0f9b119ed767da124da360cfcf9e966b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14868
  
**[Test build #64695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64695/consoleFull)**
 for PR 14868 at commit 
[`bc70a00`](https://github.com/apache/spark/commit/bc70a0023bb24175c06c03cb7acad7f9a6d34e36).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r76918174
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -235,14 +235,18 @@ class SessionCatalog(
* Note: If the underlying implementation does not support altering a 
certain field,
* this becomes a no-op.
*/
-  def alterTable(tableDefinition: CatalogTable): Unit = {
+  def alterTable(tableDefinition: CatalogTable, fromAnalyze: Boolean = 
false): Unit = {
--- End diff --

The additional flag parameter `fromAnalyze` looks weird for me. Why would 
we need to have two APIs for alter table and use a flag like this to choose 
among them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14868
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64687/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14868: [SPARK-16283][SQL][WIP] Implements percentile_approx agg...

2016-08-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14868
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 550 matches

Mail list logo