date:20161230

[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...

2016-12-30 Thread zhaorongsheng

Github user zhaorongsheng commented on the issue:

https://github.com/apache/spark/pull/16389
  
@zsxwing I think it may cause some other problem.
For example, if we got some ExecutorLostFailure and the speculated task was 
running on it, the `numRunningTasks` will never be zero.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16422
  
After rethinking about it, `DESC EXTENDED/FORMATTED COLUMN` discloses the 
data patterns/statistics info. These info are pretty sensitive. Not all the 
users should be allowed to access it. 

We might face the security-related complaints about this feature. Also cc 
@rxin @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16422
  
To get the column names and types, we do not need `DESC COLUMN`. 

For retrieving the statistics, each vendor has different ways. Normally, 
users can access the statistics from the catalog tables/views or data 
dictionary views. AFAIK, I do not know any system offers `DESC COLUMN`, except 
the Hive-like system. [Hive 2.x also has a different syntax from Hive 
1.x](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Hive2.0+:SyntaxChange).
 In this PR, we follow Hive 2.x. 

The complex types can be achieved in RDBMS by UDT. For example, in Oracle, 
the logical mapping of structured type is abstract data types. Also, DB2 
documents how to use the structured type in the 
[link](http://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.structypes.doc/doc/t0006603.html).
 To access the nested field, it is using double dots (e.g., `col1..field1`). : 
) 





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16341
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16341
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70762/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16341
  
**[Test build #70762 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70762/testReport)**
 for PR 16341 at commit 
[`16b6030`](https://github.com/apache/spark/commit/16b6030ca56a538abd1c35d7949c6fa33a576f3f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94267919
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
+if (columnName.contains(".")) {
+  throw new ParseException(
+"DESC TABLE COLUMN for an inner column of a nested type is not 
supported", ctx)
--- End diff --

This might generate a confusing error message.
```
sql("describe formatted default.tab1.s").show(false)
org.apache.spark.sql.catalyst.parser.ParseException:
DESC TABLE COLUMN for an inner column of a nested type is not 
supported(line 1, pos 0)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16417
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70761/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16417
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16417
  
**[Test build #70761 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70761/testReport)**
 for PR 16417 at commit 
[`a805b41`](https://github.com/apache/spark/commit/a805b4103d16310bd751588985318e4e2a213660).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16341
  
**[Test build #70762 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70762/testReport)**
 for PR 16341 at commit 
[`16b6030`](https://github.com/apache/spark/commit/16b6030ca56a538abd1c35d7949c6fa33a576f3f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16441
  
Thanks for the PR; I do want to get this fixed.  However, I don't think 
this is the right way to make predictions of probabilities for GBTs.  I believe 
it should depend on the loss used.  E.g., check out page 8 of Friedman (1999) 
"Greedy Function Approximation? A Gradient Boosting Machine"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16417
  
**[Test build #70761 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70761/testReport)**
 for PR 16417 at commit 
[`a805b41`](https://github.com/apache/spark/commit/a805b4103d16310bd751588985318e4e2a213660).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2016-12-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16387
  
cc @rxin @zsxwing too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16417
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-30 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94264691
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -244,6 +251,18 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
   FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class);
   DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class);
+
+  // determine whether double access should be aligned.
+  String arch = System.getProperty("os.arch", "");
+  if (arch.matches("^(arm|arm32)")) {
--- End diff --

Thanks for your clarification. I was afraid that ARM 64 may return `arm`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94264468
  
--- Diff: 
examples/src/main/python/mllib/decision_tree_regression_example.py ---
@@ -44,7 +44,7 @@
 # Evaluate model on test instances and compute test error
 predictions = model.predict(testData.map(lambda x: x.features))
 labelsAndPredictions = testData.map(lambda lp: 
lp.label).zip(predictions)
-testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - 
p)).sum() /\
+testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] 
- lp[1])).sum() /\
--- End diff --

Ah ok, makes sense - I was looking at changes directly from pep8 but if we 
need it to be compiled with python3 to test py3 pep8 that makes sense (of 
course a follow up issue for proper py3 support is the best place to address 
the issues not blocking pep8 testing).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94263914
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

Yea, I think this is a valid point. Let me check the length and the length 
limitation first for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94263510
  
--- Diff: 
examples/src/main/python/mllib/decision_tree_regression_example.py ---
@@ -44,7 +44,7 @@
 # Evaluate model on test instances and compute test error
 predictions = model.predict(testData.map(lambda x: x.features))
 labelsAndPredictions = testData.map(lambda lp: 
lp.label).zip(predictions)
-testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - 
p)).sum() /\
+testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] 
- lp[1])).sum() /\
--- End diff --

That seems causing errors in python 3 when a tuple is used in lambda to 
unpack. It seems http://www.python.org/dev/peps/pep-3113 is related issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16424: [SPARK-19016][SQL][DOC] Document scalable partiti...

2016-12-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16424


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-30 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16424
  
OK, I'm merging this to master and branch-2.1. Thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16441
  
**[Test build #70760 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70760/testReport)**
 for PR 16441 at commit 
[`489e0e6`](https://github.com/apache/spark/commit/489e0e6db1d8c7ae519ee90f852cdfa3b7932e05).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16441
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70760/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16441
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16441
  
**[Test build #70760 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70760/testReport)**
 for PR 16441 at commit 
[`489e0e6`](https://github.com/apache/spark/commit/489e0e6db1d8c7ae519ee90f852cdfa3b7932e05).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/16441
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94259691
  
--- Diff: 
examples/src/main/python/mllib/decision_tree_regression_example.py ---
@@ -44,7 +44,7 @@
 # Evaluate model on test instances and compute test error
 predictions = model.predict(testData.map(lambda x: x.features))
 labelsAndPredictions = testData.map(lambda lp: 
lp.label).zip(predictions)
-testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - 
p)).sum() /\
+testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] 
- lp[1])).sum() /\
--- End diff --

Why did we get rid of the lambda (v, p) & similar elsewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94259548
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

I'm slightly concerned we might eventually have this be too long to pass in 
the shell (on Linux in bash ARG_MAX is pretty high but that's not the case 
everywhere, although we would probably have to double the number of Python 
files before this started being an issue in Cygwin).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-30 Thread michaelkamprath

Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94259189
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -22,10 +22,14 @@
 import java.lang.reflect.Method;
 import java.nio.ByteBuffer;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import sun.misc.Cleaner;
 import sun.misc.Unsafe;
 
 public final class Platform {
--- End diff --

I missed that. I'll address that when we determine the final path here (per 
my comment below).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16441
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70759/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16441
  
**[Test build #70759 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70759/testReport)**
 for PR 16441 at commit 
[`4468891`](https://github.com/apache/spark/commit/4468891cd83760a2f97ef257ef176a34bc79e5cd).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16441
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-30 Thread michaelkamprath

Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94257179
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -244,6 +251,18 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
   FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class);
   DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class);
+
+  // determine whether double access should be aligned.
+  String arch = System.getProperty("os.arch", "");
+  if (arch.matches("^(arm|arm32)")) {
--- End diff --

@kiszk I have tested on ARM 64 (`aarch64`). [Any alignment works for double 
access 
there](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch08s02.html),
 though 8-byte aligned access looks to be about 10% faster than unaligned 
access. Using an intermediate long buffer (your idea) is about 5% slower than 
direct access regardless of being aligned or not. In both cases, I tested with 
an ODROID C2 using Oracle Java 8.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13077
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70758/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16441
  
**[Test build #70759 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70759/testReport)**
 for PR 16441 at commit 
[`4468891`](https://github.com/apache/spark/commit/4468891cd83760a2f97ef257ef176a34bc79e5cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13077
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13077
  
**[Test build #70758 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70758/testReport)**
 for PR 13077 at commit 
[`ea896ef`](https://github.com/apache/spark/commit/ea896efb70b0bf7a78214f5817f83b2251c7bb83).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...

2016-12-30 Thread imatiach-msft

GitHub user imatiach-msft opened a pull request:

https://github.com/apache/spark/pull/16441

[SPARK-14975][ML][WIP] Fixed GBTClassifier to predict probability per 
training instance and fixed interfaces

## What changes were proposed in this pull request?

For all of the classifiers in MLLib we can predict probabilities except for 
GBTClassifier.  
Also, all classifiers inherit from ProbabilisticClassifier but 
GBTClassifier strangely inherits from Predictor, which is a bug.
This change corrects the interface and adds the ability for the classifier 
to give a probabilities vector.

## How was this patch tested?

The basic ML tests were run after making the changes.  I've marked this as 
WIP as I need to add more tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/imatiach-msft/spark ilmat/fix-GBT

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16441.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16441


commit 63a9574a0858ed9e4c27a4b698cb50d2475afc0b
Author: Ilya Matiach 
Date:   2016-12-30T20:15:12Z

[SPARK-14975][ML][WIP] Fixed GBTClassifier to predict probability per 
training instance and fixed interfaces

commit 4468891cd83760a2f97ef257ef176a34bc79e5cd
Author: Ilya Matiach 
Date:   2016-12-30T20:20:43Z

Fixed scala style empty line




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13077
  
**[Test build #70758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70758/testReport)**
 for PR 13077 at commit 
[`ea896ef`](https://github.com/apache/spark/commit/ea896efb70b0bf7a78214f5817f83b2251c7bb83).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94253091
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats == null) {
+estimatedStats = cboStatistics(conf)
+  }
+  estimatedStats
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  def cboStatistics(conf: CatalystConf): Statistics = statistics
--- End diff --

protected?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16233
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70757/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...

2016-12-30 Thread michaelkamprath

Github user michaelkamprath commented on the issue:

https://github.com/apache/spark/pull/16403
  
@srowen To answer the use case question, it is primarily academic for 
learning and testing. Students and researchers build clusters of Raspberry PI, 
ODROID, or other SBCs to have a cost effective access to a multi-node hardware 
cluster. [Here](http://likemagicappears.com/projects/raspberry-pi-cluster/) 
[are](http://coen.boisestate.edu/ece/research-areas/raspberry-pi/) 
[some](https://www.raspberrypi.org/magpi/pi-spark-supercomputer/) 
[examples](http://hackaday.com/2016/05/09/designing-a-high-performance-parallel-personal-cluster/)
 
[of](http://katie.atomicburn.com/2016/06/12/2016-school-gt-exhibition-raspberry-piodroid-c2-supercomputer/)
 [projects](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4803722/). There is 
even [a commercial vendor](https://www.picocluster.com/collections/) selling 
these SBC clusters. [In my own 
case](http://diybigdata.net/odroid-xu4-cluster/), its being used to 
economically learn how to deal problems of efficiency (it's easier to spot and 
work through patter
 ns of inefficiency on constrained systems than full powered systems). 

I am personally not aware if there are currently any server-class CPUs that 
requires double alignment. Double alignment SPARC processors used to be the 
bane of my existence in the early 2000's, but that was over a decade ago. My 
understanding is that today x86 supports unaligned double access with [a 
theoretical performance 
hit](https://developers.redhat.com/blog/2016/06/01/how-to-avoid-wasting-megabytes-of-memory-a-few-bytes-at-a-time/)
 that [in practice is rarely 
seen](http://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/).
 Typically you never concern yourself with alignment in Java because the JVM 
takes care of it for you, but here we are delving into the world of Unsafe, 
which bypasses the protections the JVM provides. Admittedly, it took me a long 
while to even figure out that my problem was related to alignment because as 
indicated, I haven't dealt with such issues in over a decade.

With all that said, maybe a better approach here is to create a patch that 
users can use to create a spark build when they want to run Spark on a system 
that requires double alignment, which to the best of my knowledge are currently 
just ARM 32-bit CPUs. That would even let to be more concise, without needing 
to determine at runtime which method to use. And if ever should arise a 
server-class CPU with alignment requirements, we know what to do.

Given that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16233
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16233
  
**[Test build #70757 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70757/testReport)**
 for PR 16233 at commit 
[`4af4a11`](https://github.com/apache/spark/commit/4af4a11caab2d7b777c2f0881c574c0bda703d5d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94251780
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala 
---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.estimation
--- End diff --

estimation? Any better name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94251558
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -642,6 +642,13 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val CBO_ENABLED =
+SQLConfigBuilder("spark.sql.cbo.enabled")
+  .internal()
--- End diff --

Internal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94251460
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats == null) {
+estimatedStats = cboStatistics(conf)
+  }
+  estimatedStats
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Statistics = _
--- End diff --

Use `Option` here? Or use `@Nullable` to explicitly mark it nullable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15664
  
Thank you, @gatorsmile .
Happy New Year! :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16320
  
LGTM cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-30 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94249364
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -244,6 +251,18 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
   FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class);
   DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class);
+
+  // determine whether double access should be aligned.
+  String arch = System.getProperty("os.arch", "");
+  if (arch.matches("^(arm|arm32)")) {
--- End diff --

What's happen on ARM 64-bit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16404
  
LGTM cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15664
  
Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15664


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15664
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16371
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70756/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16371
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16371
  
**[Test build #70756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70756/testReport)**
 for PR 16371 at commit 
[`5c6b02a`](https://github.com/apache/spark/commit/5c6b02af16ed1b960242af74932a050f1c390a6e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16233
  
**[Test build #70757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70757/testReport)**
 for PR 16233 at commit 
[`4af4a11`](https://github.com/apache/spark/commit/4af4a11caab2d7b777c2f0881c574c0bda703d5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70755/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15880
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15880
  
**[Test build #70755 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70755/testReport)**
 for PR 15880 at commit 
[`821cca6`](https://github.com/apache/spark/commit/821cca6cd836f11ea917c89938f288f126d633ab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16404
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70754/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16404
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16404
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70753/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16404
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16404
  
**[Test build #70753 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70753/testReport)**
 for PR 16404 at commit 
[`f145188`](https://github.com/apache/spark/commit/f1451883df9077ecbf31f3a86d2427b60262f863).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16404
  
**[Test build #70754 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70754/testReport)**
 for PR 16404 at commit 
[`f145188`](https://github.com/apache/spark/commit/f1451883df9077ecbf31f3a86d2427b60262f863).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan

Github user cjuexuan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94239683
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
* indicates a timestamp format. Custom date formats follow the formats 
at
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* 
+   * `writeEncoding`(default `utf-8`) save dataFrame 2 csv by giving 
encoding
--- End diff --

ok,I will write my unit test and modify this pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94239452
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
* indicates a timestamp format. Custom date formats follow the formats 
at
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* 
+   * `writeEncoding`(default `utf-8`) save dataFrame 2 csv by giving 
encoding
--- End diff --

We also should add the same documentation in `readwriter.py`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan

Github user cjuexuan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94239100
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -71,7 +71,9 @@ private[csv] class CSVOptions(@transient private val 
parameters: CaseInsensitive
   val delimiter = CSVTypeCast.toChar(
 parameters.getOrElse("sep", parameters.getOrElse("delimiter", ",")))
   private val parseMode = parameters.getOrElse("mode", "PERMISSIVE")
-  val charset = parameters.getOrElse("encoding",
+  val readCharSet = parameters.getOrElse("encoding",
+parameters.getOrElse("charset", StandardCharsets.UTF_8.name()))
+  val writeCharSet = parameters.getOrElse("writeEncoding",
--- End diff --

@HyukjinKwon I think so


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16428#discussion_r94238157
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -71,7 +71,9 @@ private[csv] class CSVOptions(@transient private val 
parameters: CaseInsensitive
   val delimiter = CSVTypeCast.toChar(
 parameters.getOrElse("sep", parameters.getOrElse("delimiter", ",")))
   private val parseMode = parameters.getOrElse("mode", "PERMISSIVE")
-  val charset = parameters.getOrElse("encoding",
+  val readCharSet = parameters.getOrElse("encoding",
+parameters.getOrElse("charset", StandardCharsets.UTF_8.name()))
+  val writeCharSet = parameters.getOrElse("writeEncoding",
--- End diff --

I think we should not necessarily introduce additional option. We could 
just use `charset` variable because other options such as `nullValue` are 
already applied to both reading and writing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16428
  
Ah, I meant to add a test there in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16417
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16417
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70752/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16417
  
**[Test build #70752 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70752/testReport)**
 for PR 16417 at commit 
[`a805b41`](https://github.com/apache/spark/commit/a805b4103d16310bd751588985318e4e2a213660).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan

Github user cjuexuan commented on the issue:

https://github.com/apache/spark/pull/16428
  
@HyukjinKwon ,I already run `CSVSuite` ,and all tests passed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16371
  
**[Test build #70756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70756/testReport)**
 for PR 16371 at commit 
[`5c6b02a`](https://github.com/apache/spark/commit/5c6b02af16ed1b960242af74932a050f1c390a6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param

2016-12-30 Thread cjuexuan

Github user cjuexuan commented on the issue:

https://github.com/apache/spark/pull/16428
  
@HyukjinKwon ,I see ,because my version is `2.0.2`,we use 
`ByteArrayOutputStream` and call toString method ,this will using  
`Charset.defaultCharset()` and bind with env ,and in master branch ,we are 
already fix itï¼so l agreed to  @srowen ,we should only not using hard-coding 
UTF-8,users can set it by giving their writer encoding


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94234598
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +539,91 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`ã`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`ã`view2`ã`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+val defaultDatabase = AnalysisContext.get.defaultDatabase
+val relation = lookupTableFromCatalog(u, defaultDatabase)
+resolveRelation(relation, defaultDatabase)
+  // Hive support is required to resolve a persistent view, the 
logical plan returned by
+  // catalog.lookupRelation() should be:
+  // `SubqueryAlias(_, View(desc: CatalogTable, desc.output, child: 
LogicalPlan), _)`,
+  // where the child should be a logical plan parsed from 
`desc.viewText`.
+  // If the child of a view is empty, we will throw an 
AnalysisException later in
+  // `checkAnalysis`.
+  case view @ View(desc, _, Some(child)) =>
+val context = AnalysisContext(defaultDatabase = 
desc.viewDefaultDatabase)
+// Resolve all the UnresolvedRelations and Views in the child.
+val newChild = AnalysisContext.withAnalysisContext(context) {
+  execute(child)
+}
+view.copy(child = Some(newChild))
+  case p @ SubqueryAlias(_, view: View, _) =>
+val newChild = resolveRelation(view, defaultDatabase)
+p.copy(child = newChild)
+  case _ => plan
+}
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+  case u: UnresolvedRelation => resolveRelation(u)
+}
+
+// Look up the table with the given name from catalog. The database we 
look up the table from
+// is decided follow the steps:
+// 1. If the database part is defined in the table identifier, use 
that database name;
+// 2. Else If the defaultDatabase is defined, use the default database 
name;
+// 3. Else use the currentDb of the SessionCatalog.
+private def lookupTableFromCatalog(
+u: UnresolvedRelation,
+defaultDatabase: Option[String] = None): LogicalPlan = {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94234459
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -44,39 +44,48 @@ abstract class Collect extends ImperativeAggregate {
 
   override def dataType: DataType = ArrayType(child.dataType)
 
-  override def supportsPartial: Boolean = false
-
-  override def aggBufferAttributes: Seq[AttributeReference] = Nil
-
-  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
-
-  override def inputAggBufferAttributes: Seq[AttributeReference] = Nil
-
   // Both `CollectList` and `CollectSet` are non-deterministic since their 
results depend on the
   // actual order of input rows.
   override def deterministic: Boolean = false
 
-  protected[this] val buffer: Growable[Any] with Iterable[Any]
-
-  override def initialize(b: InternalRow): Unit = {
-buffer.clear()
+  private def generateOutput(results: Iterable[Any]): Any = {
+if (results.isEmpty) {
--- End diff --

fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94234334
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
 ---
@@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest {
 val input = testRelation
   .groupBy('a, 'd)(
 countDistinct('e, 'c).as('agg1),
-CollectSet('b).toAggregateExpression().as('agg2))
+DummpAgg('b).toAggregateExpression().as('agg2))
--- End diff --

I will only remove this test in this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94234295
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
 ---
@@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest {
 val input = testRelation
   .groupBy('a, 'd)(
 countDistinct('e, 'c).as('agg1),
-CollectSet('b).toAggregateExpression().as('agg2))
+DummpAgg('b).toAggregateExpression().as('agg2))
--- End diff --

ok. make sense to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-30 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94234184
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
 ---
@@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest {
 val input = testRelation
   .groupBy('a, 'd)(
 countDistinct('e, 'c).as('agg1),
-CollectSet('b).toAggregateExpression().as('agg2))
+DummpAgg('b).toAggregateExpression().as('agg2))
--- End diff --

If we remove the logic from RewriteDistinctAggregates, we need to make sure 
that non-partial aggregates do not exist anymore. Lets do this in a follow-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94233910
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
 ---
@@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest {
 val input = testRelation
   .groupBy('a, 'd)(
 countDistinct('e, 'c).as('agg1),
-CollectSet('b).toAggregateExpression().as('agg2))
+DummpAgg('b).toAggregateExpression().as('agg2))
--- End diff --

because we don't have non-partial aggregation function now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-30 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94233881
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
 ---
@@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest {
 val input = testRelation
   .groupBy('a, 'd)(
 countDistinct('e, 'c).as('agg1),
-CollectSet('b).toAggregateExpression().as('agg2))
+DummpAgg('b).toAggregateExpression().as('agg2))
--- End diff --

We should remove the support for non-partial aggregation in that case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94233778
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
 ---
@@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest {
 val input = testRelation
   .groupBy('a, 'd)(
 countDistinct('e, 'c).as('agg1),
-CollectSet('b).toAggregateExpression().as('agg2))
+DummpAgg('b).toAggregateExpression().as('agg2))
--- End diff --

we can remove that logic in `RewriteDistinctAggregates` too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94232541
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala
 ---
@@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest {
 val input = testRelation
   .groupBy('a, 'd)(
 countDistinct('e, 'c).as('agg1),
-CollectSet('b).toAggregateExpression().as('agg2))
+DummpAgg('b).toAggregateExpression().as('agg2))
--- End diff --

i was thought about removing it. however, `RewriteDistinctAggregates` rule 
has logic for this case. this test is the only one for testing this logic. so i 
didn't remove it in the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-30 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16401
  
Just one minor question about the config. other LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94232049
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -642,6 +642,13 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val CBO_ENABLED =
--- End diff --

Is this meant for enabling the whole cbo framework or just for controlling 
how the plan statistics calculated?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16233
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16233
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70750/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16233
  
**[Test build #70750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70750/testReport)**
 for PR 16233 at commit 
[`ff9add6`](https://github.com/apache/spark/commit/ff9add61c86af097c33f3ac99cb0839cfe1fdd51).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15880
  
**[Test build #70755 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70755/testReport)**
 for PR 15880 at commit 
[`821cca6`](https://github.com/apache/spark/commit/821cca6cd836f11ea917c89938f288f126d633ab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...

2016-12-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15880
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16401
  
LGTM. What if we just add the conf parameter to the `statistics` method and 
give it a default value? e.g. `def statistics(conf: CatalystConf = 
SimpleCatalystConf)`. How much mode do we need to update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16404
  
**[Test build #70754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70754/testReport)**
 for PR 16404 at commit 
[`f145188`](https://github.com/apache/spark/commit/f1451883df9077ecbf31f3a86d2427b60262f863).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16404
  
**[Test build #70753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70753/testReport)**
 for PR 16404 at commit 
[`f145188`](https://github.com/apache/spark/commit/f1451883df9077ecbf31f3a86d2427b60262f863).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16404: [SPARK-18969][SQL] Support grouping by nondetermi...

2016-12-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16404#discussion_r94229396
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1918,28 +1918,37 @@ class Analyzer(
   case p: Project => p
   case f: Filter => f
 
+  case a: Aggregate if a.groupingExpressions.exists(!_.deterministic) 
=>
+val nondeterToAttr = getNondeterToAttr(a.groupingExpressions)
+val newChild = Project(a.child.output ++ nondeterToAttr.values, 
a.child)
+a.transformExpressions { case e =>
+  nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
+}.copy(child = newChild)
+
   // todo: It's hard to write a general rule to pull out 
nondeterministic expressions
   // from LogicalPlan, currently we only do it for UnaryNode which has 
same output
   // schema with its child.
   case p: UnaryNode if p.output == p.child.output && 
p.expressions.exists(!_.deterministic) =>
-val nondeterministicExprs = 
p.expressions.filterNot(_.deterministic).flatMap { expr =>
-  val leafNondeterministic = expr.collect {
-case n: Nondeterministic => n
-  }
-  leafNondeterministic.map { e =>
-val ne = e match {
-  case n: NamedExpression => n
-  case _ => Alias(e, "_nondeterministic")(isGenerated = true)
-}
-new TreeNodeRef(e) -> ne
-  }
-}.toMap
+val nondeterToAttr = getNondeterToAttr(p.expressions)
 val newPlan = p.transformExpressions { case e =>
-  nondeterministicExprs.get(new 
TreeNodeRef(e)).map(_.toAttribute).getOrElse(e)
+  nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }
-val newChild = Project(p.child.output ++ 
nondeterministicExprs.values, p.child)
+val newChild = Project(p.child.output ++ nondeterToAttr.values, 
p.child)
 Project(p.output, newPlan.withNewChildren(newChild :: Nil))
 }
+
+private def getNondeterToAttr(exprs: Seq[Expression]): Map[Expression, 
NamedExpression] = {
+  exprs.filterNot(_.deterministic).flatMap { expr =>
+val leafNondeterministic = expr.collect { case n: Nondeterministic 
=> n }
--- End diff --

this problem was already there, let's send a new PR to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2016-12-30 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16404
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16233
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70748/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 207 matches

Mail list logo