[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...
Github user zhaorongsheng commented on the issue: https://github.com/apache/spark/pull/16389 @zsxwing I think it may cause some other problem. For example, if we got some ExecutorLostFailure and the speculated task was running on it, the `numRunningTasks` will never be zero. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16422 After rethinking about it, `DESC EXTENDED/FORMATTED COLUMN` discloses the data patterns/statistics info. These info are pretty sensitive. Not all the users should be allowed to access it. We might face the security-related complaints about this feature. Also cc @rxin @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16422 To get the column names and types, we do not need `DESC COLUMN`. For retrieving the statistics, each vendor has different ways. Normally, users can access the statistics from the catalog tables/views or data dictionary views. AFAIK, I do not know any system offers `DESC COLUMN`, except the Hive-like system. [Hive 2.x also has a different syntax from Hive 1.x](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Hive2.0+:SyntaxChange). In this PR, we follow Hive 2.x. The complex types can be achieved in RDBMS by UDT. For example, in Oracle, the logical mapping of structured type is abstract data types. Also, DB2 documents how to use the structured type in the [link](http://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.structypes.doc/doc/t0006603.html). To access the nested field, it is using double dots (e.g., `col1..field1`). : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16341 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16341 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70762/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16341 **[Test build #70762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70762/testReport)** for PR 16341 at commit [`16b6030`](https://github.com/apache/spark/commit/16b6030ca56a538abd1c35d7949c6fa33a576f3f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94267919 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * Create a [[DescribeTableCommand]] logical plan. */ override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = withOrigin(ctx) { -// Describe column are not supported yet. Return null and let the parser decide -// what to do with this (create an exception or pass it on to a different system). if (ctx.describeColName != null) { - null + if (ctx.partitionSpec != null) { +throw new ParseException("DESC TABLE COLUMN for a specific partition is not supported", ctx) + } else { +val columnName = ctx.describeColName.getText +if (columnName.contains(".")) { + throw new ParseException( +"DESC TABLE COLUMN for an inner column of a nested type is not supported", ctx) --- End diff -- This might generate a confusing error message. ``` sql("describe formatted default.tab1.s").show(false) org.apache.spark.sql.catalyst.parser.ParseException: DESC TABLE COLUMN for an inner column of a nested type is not supported(line 1, pos 0) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16417 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70761/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16417 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16417 **[Test build #70761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70761/testReport)** for PR 16417 at commit [`a805b41`](https://github.com/apache/spark/commit/a805b4103d16310bd751588985318e4e2a213660). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16341 **[Test build #70762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70762/testReport)** for PR 16341 at commit [`16b6030`](https://github.com/apache/spark/commit/16b6030ca56a538abd1c35d7949c6fa33a576f3f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16441 Thanks for the PR; I do want to get this fixed. However, I don't think this is the right way to make predictions of probabilities for GBTs. I believe it should depend on the loss used. E.g., check out page 8 of Friedman (1999) "Greedy Function Approximation? A Gradient Boosting Machine" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16417 **[Test build #70761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70761/testReport)** for PR 16417 at commit [`a805b41`](https://github.com/apache/spark/commit/a805b4103d16310bd751588985318e4e2a213660). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16387 cc @rxin @zsxwing too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16417 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94264691 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -244,6 +251,18 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class); FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class); DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class); + + // determine whether double access should be aligned. + String arch = System.getProperty("os.arch", ""); + if (arch.matches("^(arm|arm32)")) { --- End diff -- Thanks for your clarification. I was afraid that ARM 64 may return `arm`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94264468 --- Diff: examples/src/main/python/mllib/decision_tree_regression_example.py --- @@ -44,7 +44,7 @@ # Evaluate model on test instances and compute test error predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) -testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() /\ +testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] - lp[1])).sum() /\ --- End diff -- Ah ok, makes sense - I was looking at changes directly from pep8 but if we need it to be compiled with python3 to test py3 pep8 that makes sense (of course a follow up issue for proper py3 support is the best place to address the issues not blocking pep8 testing). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94263914 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- Yea, I think this is a valid point. Let me check the length and the length limitation first for sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94263510 --- Diff: examples/src/main/python/mllib/decision_tree_regression_example.py --- @@ -44,7 +44,7 @@ # Evaluate model on test instances and compute test error predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) -testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() /\ +testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] - lp[1])).sum() /\ --- End diff -- That seems causing errors in python 3 when a tuple is used in lambda to unpack. It seems http://www.python.org/dev/peps/pep-3113 is related issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16424: [SPARK-19016][SQL][DOC] Document scalable partiti...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16424 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16424 OK, I'm merging this to master and branch-2.1. Thanks for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16441 **[Test build #70760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70760/testReport)** for PR 16441 at commit [`489e0e6`](https://github.com/apache/spark/commit/489e0e6db1d8c7ae519ee90f852cdfa3b7932e05). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16441 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70760/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16441 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16441 **[Test build #70760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70760/testReport)** for PR 16441 at commit [`489e0e6`](https://github.com/apache/spark/commit/489e0e6db1d8c7ae519ee90f852cdfa3b7932e05). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16441 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94259691 --- Diff: examples/src/main/python/mllib/decision_tree_regression_example.py --- @@ -44,7 +44,7 @@ # Evaluate model on test instances and compute test error predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) -testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() /\ +testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] - lp[1])).sum() /\ --- End diff -- Why did we get rid of the lambda (v, p) & similar elsewhere? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94259548 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- I'm slightly concerned we might eventually have this be too long to pass in the shell (on Linux in bash ARG_MAX is pretty high but that's not the case everywhere, although we would probably have to double the number of Python files before this started being an issue in Cygwin). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94259189 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -22,10 +22,14 @@ import java.lang.reflect.Method; import java.nio.ByteBuffer; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + import sun.misc.Cleaner; import sun.misc.Unsafe; public final class Platform { --- End diff -- I missed that. I'll address that when we determine the final path here (per my comment below). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16441 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70759/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16441 **[Test build #70759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70759/testReport)** for PR 16441 at commit [`4468891`](https://github.com/apache/spark/commit/4468891cd83760a2f97ef257ef176a34bc79e5cd). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16441 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94257179 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -244,6 +251,18 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class); FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class); DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class); + + // determine whether double access should be aligned. + String arch = System.getProperty("os.arch", ""); + if (arch.matches("^(arm|arm32)")) { --- End diff -- @kiszk I have tested on ARM 64 (`aarch64`). [Any alignment works for double access there](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch08s02.html), though 8-byte aligned access looks to be about 10% faster than unaligned access. Using an intermediate long buffer (your idea) is about 5% slower than direct access regardless of being aligned or not. In both cases, I tested with an ODROID C2 using Oracle Java 8. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13077 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70758/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict pr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16441 **[Test build #70759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70759/testReport)** for PR 16441 at commit [`4468891`](https://github.com/apache/spark/commit/4468891cd83760a2f97ef257ef176a34bc79e5cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13077 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13077 **[Test build #70758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70758/testReport)** for PR 13077 at commit [`ea896ef`](https://github.com/apache/spark/commit/ea896efb70b0bf7a78214f5817f83b2251c7bb83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16441: [SPARK-14975][ML][WIP] Fixed GBTClassifier to pre...
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/16441 [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict probability per training instance and fixed interfaces ## What changes were proposed in this pull request? For all of the classifiers in MLLib we can predict probabilities except for GBTClassifier. Also, all classifiers inherit from ProbabilisticClassifier but GBTClassifier strangely inherits from Predictor, which is a bug. This change corrects the interface and adds the ability for the classifier to give a probabilities vector. ## How was this patch tested? The basic ML tests were run after making the changes. I've marked this as WIP as I need to add more tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/imatiach-msft/spark ilmat/fix-GBT Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16441.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16441 commit 63a9574a0858ed9e4c27a4b698cb50d2475afc0b Author: Ilya MatiachDate: 2016-12-30T20:15:12Z [SPARK-14975][ML][WIP] Fixed GBTClassifier to predict probability per training instance and fixed interfaces commit 4468891cd83760a2f97ef257ef176a34bc79e5cd Author: Ilya Matiach Date: 2016-12-30T20:20:43Z Fixed scala style empty line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13077: [SPARK-10748] [Mesos] Log error instead of crashing Spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13077 **[Test build #70758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70758/testReport)** for PR 13077 at commit [`ea896ef`](https://github.com/apache/spark/commit/ea896efb70b0bf7a78214f5817f83b2251c7bb83). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94253091 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats == null) { +estimatedStats = cboStatistics(conf) + } + estimatedStats +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + def cboStatistics(conf: CatalystConf): Statistics = statistics --- End diff -- protected? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16233 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70757/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...
Github user michaelkamprath commented on the issue: https://github.com/apache/spark/pull/16403 @srowen To answer the use case question, it is primarily academic for learning and testing. Students and researchers build clusters of Raspberry PI, ODROID, or other SBCs to have a cost effective access to a multi-node hardware cluster. [Here](http://likemagicappears.com/projects/raspberry-pi-cluster/) [are](http://coen.boisestate.edu/ece/research-areas/raspberry-pi/) [some](https://www.raspberrypi.org/magpi/pi-spark-supercomputer/) [examples](http://hackaday.com/2016/05/09/designing-a-high-performance-parallel-personal-cluster/) [of](http://katie.atomicburn.com/2016/06/12/2016-school-gt-exhibition-raspberry-piodroid-c2-supercomputer/) [projects](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4803722/). There is even [a commercial vendor](https://www.picocluster.com/collections/) selling these SBC clusters. [In my own case](http://diybigdata.net/odroid-xu4-cluster/), its being used to economically learn how to deal problems of efficiency (it's easier to spot and work through patter ns of inefficiency on constrained systems than full powered systems). I am personally not aware if there are currently any server-class CPUs that requires double alignment. Double alignment SPARC processors used to be the bane of my existence in the early 2000's, but that was over a decade ago. My understanding is that today x86 supports unaligned double access with [a theoretical performance hit](https://developers.redhat.com/blog/2016/06/01/how-to-avoid-wasting-megabytes-of-memory-a-few-bytes-at-a-time/) that [in practice is rarely seen](http://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/). Typically you never concern yourself with alignment in Java because the JVM takes care of it for you, but here we are delving into the world of Unsafe, which bypasses the protections the JVM provides. Admittedly, it took me a long while to even figure out that my problem was related to alignment because as indicated, I haven't dealt with such issues in over a decade. With all that said, maybe a better approach here is to create a patch that users can use to create a spark build when they want to run Spark on a system that requires double alignment, which to the best of my knowledge are currently just ARM 32-bit CPUs. That would even let to be more concise, without needing to determine at runtime which method to use. And if ever should arise a server-class CPU with alignment requirements, we know what to do. Given that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16233 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16233 **[Test build #70757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70757/testReport)** for PR 16233 at commit [`4af4a11`](https://github.com/apache/spark/commit/4af4a11caab2d7b777c2f0881c574c0bda703d5d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94251780 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.estimation --- End diff -- estimation? Any better name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94251558 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -642,6 +642,13 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val CBO_ENABLED = +SQLConfigBuilder("spark.sql.cbo.enabled") + .internal() --- End diff -- Internal? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94251460 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats == null) { +estimatedStats = cboStatistics(conf) + } + estimatedStats +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Statistics = _ --- End diff -- Use `Option` here? Or use `@Nullable` to explicitly mark it nullable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15664 Thank you, @gatorsmile . Happy New Year! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16320 LGTM cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94249364 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -244,6 +251,18 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class); FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class); DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class); + + // determine whether double access should be aligned. + String arch = System.getProperty("os.arch", ""); + if (arch.matches("^(arm|arm32)")) { --- End diff -- What's happen on ARM 64-bit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16404 LGTM cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15664 Merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15664 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15664 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16371 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70756/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16371 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16371 **[Test build #70756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70756/testReport)** for PR 16371 at commit [`5c6b02a`](https://github.com/apache/spark/commit/5c6b02af16ed1b960242af74932a050f1c390a6e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16233 **[Test build #70757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70757/testReport)** for PR 16233 at commit [`4af4a11`](https://github.com/apache/spark/commit/4af4a11caab2d7b777c2f0881c574c0bda703d5d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70755/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15880 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15880 **[Test build #70755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70755/testReport)** for PR 15880 at commit [`821cca6`](https://github.com/apache/spark/commit/821cca6cd836f11ea917c89938f288f126d633ab). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70754/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16404 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70753/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16404 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16404 **[Test build #70753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70753/testReport)** for PR 16404 at commit [`f145188`](https://github.com/apache/spark/commit/f1451883df9077ecbf31f3a86d2427b60262f863). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16404 **[Test build #70754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70754/testReport)** for PR 16404 at commit [`f145188`](https://github.com/apache/spark/commit/f1451883df9077ecbf31f3a86d2427b60262f863). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94239683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { * indicates a timestamp format. Custom date formats follow the formats at * `java.text.SimpleDateFormat`. This applies to timestamp type. * + * `writeEncoding`(default `utf-8`) save dataFrame 2 csv by giving encoding --- End diff -- ok,I will write my unit test and modify this pull request --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94239452 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -573,6 +573,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { * indicates a timestamp format. Custom date formats follow the formats at * `java.text.SimpleDateFormat`. This applies to timestamp type. * + * `writeEncoding`(default `utf-8`) save dataFrame 2 csv by giving encoding --- End diff -- We also should add the same documentation in `readwriter.py`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94239100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -71,7 +71,9 @@ private[csv] class CSVOptions(@transient private val parameters: CaseInsensitive val delimiter = CSVTypeCast.toChar( parameters.getOrElse("sep", parameters.getOrElse("delimiter", ","))) private val parseMode = parameters.getOrElse("mode", "PERMISSIVE") - val charset = parameters.getOrElse("encoding", + val readCharSet = parameters.getOrElse("encoding", +parameters.getOrElse("charset", StandardCharsets.UTF_8.name())) + val writeCharSet = parameters.getOrElse("writeEncoding", --- End diff -- @HyukjinKwon I think so --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16428#discussion_r94238157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -71,7 +71,9 @@ private[csv] class CSVOptions(@transient private val parameters: CaseInsensitive val delimiter = CSVTypeCast.toChar( parameters.getOrElse("sep", parameters.getOrElse("delimiter", ","))) private val parseMode = parameters.getOrElse("mode", "PERMISSIVE") - val charset = parameters.getOrElse("encoding", + val readCharSet = parameters.getOrElse("encoding", +parameters.getOrElse("charset", StandardCharsets.UTF_8.name())) + val writeCharSet = parameters.getOrElse("writeEncoding", --- End diff -- I think we should not necessarily introduce additional option. We could just use `charset` variable because other options such as `nullValue` are already applied to both reading and writing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16428 Ah, I meant to add a test there in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16417 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16417 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70752/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16417 **[Test build #70752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70752/testReport)** for PR 16417 at commit [`a805b41`](https://github.com/apache/spark/commit/a805b4103d16310bd751588985318e4e2a213660). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on the issue: https://github.com/apache/spark/pull/16428 @HyukjinKwon ,I already run `CSVSuite` ,and all tests passed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16371 **[Test build #70756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70756/testReport)** for PR 16371 at commit [`5c6b02a`](https://github.com/apache/spark/commit/5c6b02af16ed1b960242af74932a050f1c390a6e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16428: [SPARK-19018][SQL] ADD csv write charset param
Github user cjuexuan commented on the issue: https://github.com/apache/spark/pull/16428 @HyukjinKwon ,I see ,because my version is `2.0.2`,we use `ByteArrayOutputStream` and call toString method ,this will using `Charset.defaultCharset()` and bind with env ,and in master branch ,we are already fix itï¼so l agreed to @srowen ,we should only not using hard-coding UTF-8,users can set it by giving their writer encoding --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94234598 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +539,91 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +val defaultDatabase = AnalysisContext.get.defaultDatabase +val relation = lookupTableFromCatalog(u, defaultDatabase) +resolveRelation(relation, defaultDatabase) + // Hive support is required to resolve a persistent view, the logical plan returned by + // catalog.lookupRelation() should be: + // `SubqueryAlias(_, View(desc: CatalogTable, desc.output, child: LogicalPlan), _)`, + // where the child should be a logical plan parsed from `desc.viewText`. + // If the child of a view is empty, we will throw an AnalysisException later in + // `checkAnalysis`. + case view @ View(desc, _, Some(child)) => +val context = AnalysisContext(defaultDatabase = desc.viewDefaultDatabase) +// Resolve all the UnresolvedRelations and Views in the child. +val newChild = AnalysisContext.withAnalysisContext(context) { + execute(child) +} +view.copy(child = Some(newChild)) + case p @ SubqueryAlias(_, view: View, _) => +val newChild = resolveRelation(view, defaultDatabase) +p.copy(child = newChild) + case _ => plan +} + +def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u))) + case u: UnresolvedRelation => resolveRelation(u) +} + +// Look up the table with the given name from catalog. The database we look up the table from +// is decided follow the steps: +// 1. If the database part is defined in the table identifier, use that database name; +// 2. Else If the defaultDatabase is defined, use the default database name; +// 3. Else use the currentDb of the SessionCatalog. +private def lookupTableFromCatalog( +u: UnresolvedRelation, +defaultDatabase: Option[String] = None): LogicalPlan = { try { -catalog.lookupRelation(u.tableIdentifier, u.alias) +
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94234459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala --- @@ -44,39 +44,48 @@ abstract class Collect extends ImperativeAggregate { override def dataType: DataType = ArrayType(child.dataType) - override def supportsPartial: Boolean = false - - override def aggBufferAttributes: Seq[AttributeReference] = Nil - - override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) - - override def inputAggBufferAttributes: Seq[AttributeReference] = Nil - // Both `CollectList` and `CollectSet` are non-deterministic since their results depend on the // actual order of input rows. override def deterministic: Boolean = false - protected[this] val buffer: Growable[Any] with Iterable[Any] - - override def initialize(b: InternalRow): Unit = { -buffer.clear() + private def generateOutput(results: Iterable[Any]): Any = { +if (results.isEmpty) { --- End diff -- fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94234334 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala --- @@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest { val input = testRelation .groupBy('a, 'd)( countDistinct('e, 'c).as('agg1), -CollectSet('b).toAggregateExpression().as('agg2)) +DummpAgg('b).toAggregateExpression().as('agg2)) --- End diff -- I will only remove this test in this pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94234295 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala --- @@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest { val input = testRelation .groupBy('a, 'd)( countDistinct('e, 'c).as('agg1), -CollectSet('b).toAggregateExpression().as('agg2)) +DummpAgg('b).toAggregateExpression().as('agg2)) --- End diff -- ok. make sense to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94234184 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala --- @@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest { val input = testRelation .groupBy('a, 'd)( countDistinct('e, 'c).as('agg1), -CollectSet('b).toAggregateExpression().as('agg2)) +DummpAgg('b).toAggregateExpression().as('agg2)) --- End diff -- If we remove the logic from RewriteDistinctAggregates, we need to make sure that non-partial aggregates do not exist anymore. Lets do this in a follow-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94233910 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala --- @@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest { val input = testRelation .groupBy('a, 'd)( countDistinct('e, 'c).as('agg1), -CollectSet('b).toAggregateExpression().as('agg2)) +DummpAgg('b).toAggregateExpression().as('agg2)) --- End diff -- because we don't have non-partial aggregation function now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94233881 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala --- @@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest { val input = testRelation .groupBy('a, 'd)( countDistinct('e, 'c).as('agg1), -CollectSet('b).toAggregateExpression().as('agg2)) +DummpAgg('b).toAggregateExpression().as('agg2)) --- End diff -- We should remove the support for non-partial aggregation in that case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94233778 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala --- @@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest { val input = testRelation .groupBy('a, 'd)( countDistinct('e, 'c).as('agg1), -CollectSet('b).toAggregateExpression().as('agg2)) +DummpAgg('b).toAggregateExpression().as('agg2)) --- End diff -- we can remove that logic in `RewriteDistinctAggregates` too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94232541 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregatesSuite.scala --- @@ -63,7 +63,7 @@ class RewriteDistinctAggregatesSuite extends PlanTest { val input = testRelation .groupBy('a, 'd)( countDistinct('e, 'c).as('agg1), -CollectSet('b).toAggregateExpression().as('agg2)) +DummpAgg('b).toAggregateExpression().as('agg2)) --- End diff -- i was thought about removing it. however, `RewriteDistinctAggregates` rule has logic for this case. this test is the only one for testing this logic. so i didn't remove it in the end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16401 Just one minor question about the config. other LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94232049 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -642,6 +642,13 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val CBO_ENABLED = --- End diff -- Is this meant for enabling the whole cbo framework or just for controlling how the plan statistics calculated? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16233 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16233 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70750/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16233 **[Test build #70750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70750/testReport)** for PR 16233 at commit [`ff9add6`](https://github.com/apache/spark/commit/ff9add61c86af097c33f3ac99cb0839cfe1fdd51). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15880 **[Test build #70755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70755/testReport)** for PR 15880 at commit [`821cca6`](https://github.com/apache/spark/commit/821cca6cd836f11ea917c89938f288f126d633ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare long and string type column m...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15880 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16401 LGTM. What if we just add the conf parameter to the `statistics` method and give it a default value? e.g. `def statistics(conf: CatalystConf = SimpleCatalystConf)`. How much mode do we need to update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16404 **[Test build #70754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70754/testReport)** for PR 16404 at commit [`f145188`](https://github.com/apache/spark/commit/f1451883df9077ecbf31f3a86d2427b60262f863). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16404 **[Test build #70753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70753/testReport)** for PR 16404 at commit [`f145188`](https://github.com/apache/spark/commit/f1451883df9077ecbf31f3a86d2427b60262f863). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16404: [SPARK-18969][SQL] Support grouping by nondetermi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16404#discussion_r94229396 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1918,28 +1918,37 @@ class Analyzer( case p: Project => p case f: Filter => f + case a: Aggregate if a.groupingExpressions.exists(!_.deterministic) => +val nondeterToAttr = getNondeterToAttr(a.groupingExpressions) +val newChild = Project(a.child.output ++ nondeterToAttr.values, a.child) +a.transformExpressions { case e => + nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) +}.copy(child = newChild) + // todo: It's hard to write a general rule to pull out nondeterministic expressions // from LogicalPlan, currently we only do it for UnaryNode which has same output // schema with its child. case p: UnaryNode if p.output == p.child.output && p.expressions.exists(!_.deterministic) => -val nondeterministicExprs = p.expressions.filterNot(_.deterministic).flatMap { expr => - val leafNondeterministic = expr.collect { -case n: Nondeterministic => n - } - leafNondeterministic.map { e => -val ne = e match { - case n: NamedExpression => n - case _ => Alias(e, "_nondeterministic")(isGenerated = true) -} -new TreeNodeRef(e) -> ne - } -}.toMap +val nondeterToAttr = getNondeterToAttr(p.expressions) val newPlan = p.transformExpressions { case e => - nondeterministicExprs.get(new TreeNodeRef(e)).map(_.toAttribute).getOrElse(e) + nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) } -val newChild = Project(p.child.output ++ nondeterministicExprs.values, p.child) +val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child) Project(p.output, newPlan.withNewChildren(newChild :: Nil)) } + +private def getNondeterToAttr(exprs: Seq[Expression]): Map[Expression, NamedExpression] = { + exprs.filterNot(_.deterministic).flatMap { expr => +val leafNondeterministic = expr.collect { case n: Nondeterministic => n } --- End diff -- this problem was already there, let's send a new PR to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16404 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16233 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70748/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org