[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/21939 got it. Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/21939 @shaneknapp what was the version of pyarrow in that build? 0.8 or 0.10? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/21939 @BryanCutler So, for this upgrade, even the JVM side dependency is 0.10, pyspark can work with any version between pyarrow 0.8 to 0.10 without problem? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22003: [SPARK-25019][BUILD] Fix orc dependency to use the same ...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/22003 @dongjoon-hyun no problem. Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-25019][BUILD] Fix orc dependency to use the same exclusion rules
Repository: spark Updated Branches: refs/heads/master 51e2b38d9 -> 278984d5a [SPARK-25019][BUILD] Fix orc dependency to use the same exclusion rules ## What changes were proposed in this pull request? During upgrading Apache ORC to 1.5.2 ([SPARK-24576](https://issues.apache.org/jira/browse/SPARK-24576)), `sql/core` module overrides the exclusion rules of parent pom file and it causes published `spark-sql_2.1X` artifacts have incomplete exclusion rules ([SPARK-25019](https://issues.apache.org/jira/browse/SPARK-25019)). This PR fixes it by moving the newly added exclusion rule to the parent pom. This also fixes the sbt build hack introduced at that time. ## How was this patch tested? Pass the existing dependency check and the tests. Author: Dongjoon Hyun Closes #22003 from dongjoon-hyun/SPARK-25019. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/278984d5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/278984d5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/278984d5 Branch: refs/heads/master Commit: 278984d5a5e56136c9f940f2d0e3d2040fad180b Parents: 51e2b38 Author: Dongjoon Hyun Authored: Mon Aug 6 12:00:39 2018 -0700 Committer: Yin Huai Committed: Mon Aug 6 12:00:39 2018 -0700 -- pom.xml | 4 sql/core/pom.xml | 28 2 files changed, 4 insertions(+), 28 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/278984d5/pom.xml -- diff --git a/pom.xml b/pom.xml index c46eb31..8abdb70 100644 --- a/pom.xml +++ b/pom.xml @@ -1744,6 +1744,10 @@ hadoop-common +org.apache.hadoop +hadoop-hdfs + + org.apache.hive hive-storage-api http://git-wip-us.apache.org/repos/asf/spark/blob/278984d5/sql/core/pom.xml -- diff --git a/sql/core/pom.xml b/sql/core/pom.xml index 68b42a4..ba17f5f 100644 --- a/sql/core/pom.xml +++ b/sql/core/pom.xml @@ -90,39 +90,11 @@ org.apache.orc orc-core ${orc.classifier} - - - org.apache.hadoop - hadoop-hdfs - - - - org.apache.hive - hive-storage-api - - org.apache.orc orc-mapreduce ${orc.classifier} - - - org.apache.hadoop - hadoop-hdfs - - - - org.apache.hive - hive-storage-api - - org.apache.parquet - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #22003: [SPARK-25019][BUILD] Fix orc dependency to use the same ...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/22003 lgtm. Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22003: [SPARK-25019][BUILD] Fix orc dependency to use th...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/22003#discussion_r207986831 --- Diff: sql/core/pom.xml --- @@ -90,39 +90,11 @@ org.apache.orc orc-core ${orc.classifier} - - - org.apache.hadoop - hadoop-hdfs - - - - org.apache.hive - hive-storage-api - - org.apache.orc orc-mapreduce ${orc.classifier} - - - org.apache.hadoop - hadoop-hdfs - - - - org.apache.hive - hive-storage-api - - --- End diff -- got it. Thank you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22003: [SPARK-25019][BUILD] Fix orc dependency to use th...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/22003#discussion_r207962501 --- Diff: sql/core/pom.xml --- @@ -90,39 +90,11 @@ org.apache.orc orc-core ${orc.classifier} - - - org.apache.hadoop - hadoop-hdfs - - - - org.apache.hive - hive-storage-api - - org.apache.orc orc-mapreduce ${orc.classifier} - - - org.apache.hadoop - hadoop-hdfs - - - - org.apache.hive - hive-storage-api - - --- End diff -- Thank you. Just for me to understand it better. Do you know why defining exclusions in this pom file messed up the pom? Also, how should I try it out myself? What is the right command to publish locally? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22003: [SPARK-25019][BUILD] Fix orc dependency to use th...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/22003#discussion_r207888608 --- Diff: sql/core/pom.xml --- @@ -90,39 +90,11 @@ org.apache.orc orc-core ${orc.classifier} - - - org.apache.hadoop - hadoop-hdfs - - - - org.apache.hive - hive-storage-api - - org.apache.orc orc-mapreduce ${orc.classifier} - - - org.apache.hadoop - hadoop-hdfs - - - - org.apache.hive - hive-storage-api - - --- End diff -- @dongjoon-hyun when we publish snapshot artifacts or releases, will the pom for spark sql get all of exclusions defined in the parent pom? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-24895] Remove spotbugs plugin
Repository: spark Updated Branches: refs/heads/master d4a277f0c -> fc21f192a [SPARK-24895] Remove spotbugs plugin ## What changes were proposed in this pull request? Spotbugs maven plugin was a recently added plugin before 2.4.0 snapshot artifacts were broken. To ensure it does not affect the maven deploy plugin, this change removes it. ## How was this patch tested? Local build was ran, but this patch will be actually tested by monitoring the apache repo artifacts and making sure metadata is correctly uploaded after this job is ran: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ Author: Eric Chang Closes #21865 from ericfchang/SPARK-24895. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc21f192 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc21f192 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc21f192 Branch: refs/heads/master Commit: fc21f192a302e48e5c321852e2a25639c5a182b5 Parents: d4a277f Author: Eric Chang Authored: Tue Jul 24 15:53:50 2018 -0700 Committer: Yin Huai Committed: Tue Jul 24 15:53:50 2018 -0700 -- pom.xml | 22 -- 1 file changed, 22 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fc21f192/pom.xml -- diff --git a/pom.xml b/pom.xml index 81a53ee..d75db0f 100644 --- a/pom.xml +++ b/pom.xml @@ -2610,28 +2610,6 @@ - -com.github.spotbugs -spotbugs-maven-plugin -3.1.3 - - ${basedir}/target/scala-${scala.binary.version}/classes - ${basedir}/target/scala-${scala.binary.version}/test-classes - Max - Low - true - FindPuzzlers - true - - - - - check - -compile - - - - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/21865 lgtm. I am merging this PR to master branch. Then, I will kick off https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/21865 cc @HyukjinKwon @kiszk I will merge this PR once it passes the test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
svn commit: r25324 - /dev/spark/v2.3.0-rc5-bin/ /release/spark/spark-2.3.0/
Author: yhuai Date: Wed Feb 28 07:25:53 2018 New Revision: 25324 Log: Releasing Apache Spark 2.3.0 Added: release/spark/spark-2.3.0/ - copied from r25323, dev/spark/v2.3.0-rc5-bin/ Removed: dev/spark/v2.3.0-rc5-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/20473#discussion_r16362 --- Diff: python/run-tests.py --- @@ -151,6 +151,38 @@ def parse_opts(): return opts +def _check_dependencies(python_exec, modules_to_test): +if "COVERAGE_PROCESS_START" in os.environ: +# Make sure if coverage is installed. +try: +subprocess_check_output( +[python_exec, "-c", "import coverage"], +stderr=open(os.devnull, 'w')) +except: +print_red("Coverage is not installed in Python executable '%s' " + "but 'COVERAGE_PROCESS_START' environment variable is set, " + "exiting." % python_exec) +sys.exit(-1) + +if pyspark_sql in modules_to_test: +# If we should test 'pyspark-sql', it checks if PyArrow and Pandas are installed and +# explicitly prints out. See SPARK-23300. +try: +subprocess_check_output( +[python_exec, "-c", "import pyarrow"], +stderr=open(os.devnull, 'w')) +except: --- End diff -- Thank you. Appreciate it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r165449847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends PredicateHelper { object PhysicalAggregation { // groupingExpressions, aggregateExpressions, resultExpressions, child type ReturnType = -(Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression], LogicalPlan) +(Seq[NamedExpression], Seq[Expression], Seq[NamedExpression], LogicalPlan) --- End diff -- It will be good to try it out soon. But it is not urgent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/20473#discussion_r165445947 --- Diff: python/run-tests.py --- @@ -151,6 +151,38 @@ def parse_opts(): return opts +def _check_dependencies(python_exec, modules_to_test): +if "COVERAGE_PROCESS_START" in os.environ: +# Make sure if coverage is installed. +try: +subprocess_check_output( +[python_exec, "-c", "import coverage"], +stderr=open(os.devnull, 'w')) +except: +print_red("Coverage is not installed in Python executable '%s' " + "but 'COVERAGE_PROCESS_START' environment variable is set, " + "exiting." % python_exec) +sys.exit(-1) + +if pyspark_sql in modules_to_test: +# If we should test 'pyspark-sql', it checks if PyArrow and Pandas are installed and +# explicitly prints out. See SPARK-23300. +try: +subprocess_check_output( +[python_exec, "-c", "import pyarrow"], +stderr=open(os.devnull, 'w')) +except: --- End diff -- Actually, since we are here, is it possible to do the same thing as https://github.com/apache/spark/blob/ec63e2d0743a4f75e1cce21d0fe2b54407a86a4a/python/pyspark/sql/tests.py#L51-L63 and https://github.com/apache/spark/blob/ec63e2d0743a4f75e1cce21d0fe2b54407a86a4a/python/pyspark/sql/tests.py#L78-L84? It will be nice to use the same logic. Otherwise, even we do not print the warning at here, tests may still get skipped because of the version issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/20473#discussion_r165445232 --- Diff: python/run-tests.py --- @@ -151,6 +151,38 @@ def parse_opts(): return opts +def _check_dependencies(python_exec, modules_to_test): +if "COVERAGE_PROCESS_START" in os.environ: +# Make sure if coverage is installed. +try: +subprocess_check_output( +[python_exec, "-c", "import coverage"], +stderr=open(os.devnull, 'w')) +except: +print_red("Coverage is not installed in Python executable '%s' " + "but 'COVERAGE_PROCESS_START' environment variable is set, " + "exiting." % python_exec) +sys.exit(-1) + +if pyspark_sql in modules_to_test: +# If we should test 'pyspark-sql', it checks if PyArrow and Pandas are installed and +# explicitly prints out. See SPARK-23300. +try: +subprocess_check_output( +[python_exec, "-c", "import pyarrow"], +stderr=open(os.devnull, 'w')) +except: --- End diff -- How about we also explicitly mention that pyarrow/pandas related tests will run if they are installed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/20465 So, jenkins jobs run those tests with python3? If so, I feel better because those tests are not completely skipped in Jenkins. If it is hard to make them run with python 2. Letâs have a log to explicitly show if we are going to run tests using pandas/pyarrow, which will help us confirm if they get exercised with python 3 in Jenkins or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/20465 @felixcheung jenkins is actually skipping those tests (see the failure of this pr). It makes sense to provide a way to allow developers to not run those tests. But, I'd prefer that we run those tests by default. So, we can make sure that jenkins is doing the right thing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r165253818 --- Diff: python/pyspark/sql/tests.py --- @@ -4353,6 +4347,446 @@ def test_unsupported_types(self): df.groupby('id').apply(f).collect() +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed") +class GroupbyAggPandasUDFTests(ReusedSQLTestCase): + +@property +def data(self): +from pyspark.sql.functions import array, explode, col, lit +return self.spark.range(10).toDF('id') \ +.withColumn("vs", array([lit(i * 1.0) + col('id') for i in range(20, 30)])) \ +.withColumn("v", explode(col('vs'))) \ +.drop('vs') \ +.withColumn('w', lit(1.0)) + +@property +def python_plus_one(self): +from pyspark.sql.functions import udf + +@udf('double') +def plus_one(v): +assert isinstance(v, (int, float)) +return v + 1 +return plus_one + +@property +def pandas_scalar_plus_two(self): +import pandas as pd +from pyspark.sql.functions import pandas_udf, PandasUDFType + +@pandas_udf('double', PandasUDFType.SCALAR) +def plus_two(v): +assert isinstance(v, pd.Series) +return v + 2 +return plus_two + +@property +def pandas_agg_mean_udf(self): +from pyspark.sql.functions import pandas_udf, PandasUDFType + +@pandas_udf('double', PandasUDFType.GROUP_AGG) +def avg(v): +return v.mean() +return avg + +@property +def pandas_agg_sum_udf(self): +from pyspark.sql.functions import pandas_udf, PandasUDFType + +@pandas_udf('double', PandasUDFType.GROUP_AGG) +def sum(v): +return v.sum() +return sum + +@property +def pandas_agg_weighted_mean_udf(self): +import numpy as np +from pyspark.sql.functions import pandas_udf, PandasUDFType + +@pandas_udf('double', PandasUDFType.GROUP_AGG) +def weighted_mean(v, w): +return np.average(v, weights=w) +return weighted_mean + +def test_manual(self): +df = self.data +sum_udf = self.pandas_agg_sum_udf +mean_udf = self.pandas_agg_mean_udf + +result1 = df.groupby('id').agg(sum_udf(df.v), mean_udf(df.v)).sort('id') +expected1 = self.spark.createDataFrame( +[[0, 245.0, 24.5], + [1, 255.0, 25.5], + [2, 265.0, 26.5], + [3, 275.0, 27.5], + [4, 285.0, 28.5], + [5, 295.0, 29.5], + [6, 305.0, 30.5], + [7, 315.0, 31.5], + [8, 325.0, 32.5], + [9, 335.0, 33.5]], +['id', 'sum(v)', 'avg(v)']) + +self.assertPandasEqual(expected1.toPandas(), result1.toPandas()) + +def test_basic(self): +from pyspark.sql.functions import col, lit, sum, mean + +df = self.data +weighted_mean_udf = self.pandas_agg_weighted_mean_udf + +# Groupby one column and aggregate one UDF with literal +result1 = df.groupby('id').agg(weighted_mean_udf(df.v, lit(1.0))).sort('id') +expected1 = df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort('id') +self.assertPandasEqual(expected1.toPandas(), result1.toPandas()) + +# Groupby one expression and aggregate one UDF with literal +result2 = df.groupby((col('id') + 1)).agg(weighted_mean_udf(df.v, lit(1.0)))\ +.sort(df.id + 1) +expected2 = df.groupby((col('id') + 1))\ +.agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort(df.id + 1) +self.assertPandasEqual(expected2.toPandas(), result2.toPandas()) + +# Groupby one column and aggregate one UDF without literal +result3 = df.groupby('id').agg(weighted_mean_udf(df.v, df.w)).sort('id') +expected3 = df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, w)')).sort('id') +self.assertPandasEqual(expected3.toPandas(), result3.toPandas()) + +# Groupby one expression and aggregate one UDF without literal +result4 = df.groupby((col('id') + 1).alias('id'))\ +.agg(weighted_mean_udf(df.v, df.w))\ +.sort('id') +expected4 = df.groupby((col('id') + 1).alias('id'))\ +.agg(mean(df.v).alias('weighted_mean(v, w)'))\ +.sort('id') +
[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r165253514 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends PredicateHelper { object PhysicalAggregation { // groupingExpressions, aggregateExpressions, resultExpressions, child type ReturnType = -(Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression], LogicalPlan) +(Seq[NamedExpression], Seq[Expression], Seq[NamedExpression], LogicalPlan) --- End diff -- I prefer that we try out using a new rule. We can create utility function to reuse code. Will you have a chance to try it out? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r165220142 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends PredicateHelper { object PhysicalAggregation { // groupingExpressions, aggregateExpressions, resultExpressions, child type ReturnType = -(Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression], LogicalPlan) +(Seq[NamedExpression], Seq[Expression], Seq[NamedExpression], LogicalPlan) --- End diff -- @icexelloss Thank you for this contribution! I just came across the change in this file. I am not sure if changing the type at here is the best option. The reason is that whenever we use this PhysicalAggregation rule, we have to check the instance type of those aggregate expressions and do casting. To me, it seems better to leave this rule untouched and create a new rule just for Python UDAF. What do you think? (maybe you and reviewers already discussed it. If so, can you point me to the discussion?) Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20037: [SPARK-22849] ivy.retrieve pattern should also co...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/20037#discussion_r163463718 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -1271,7 +1271,7 @@ private[spark] object SparkSubmitUtils { // retrieve all resolved dependencies ivy.retrieve(rr.getModuleDescriptor.getModuleRevisionId, packagesDirectory.getAbsolutePath + File.separator + -"[organization]_[artifact]-[revision].[ext]", +"[organization]_[artifact]-[revision](-[classifier]).[ext]", --- End diff -- I tried it today. Somehow, I only got the test jar downloaded. Have you guys seen this issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20110: [SPARK-22313][PYTHON][FOLLOWUP] Explicitly import warnin...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/20110 Thank you! Let's also check the build result to make sure `pyspark.streaming.tests.FlumePollingStreamTests` is indeed triggered (I hit this issue while running this test). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19535: [SPARK-22313][PYTHON] Mark/print deprecation warn...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/19535#discussion_r159019845 --- Diff: python/pyspark/streaming/flume.py --- @@ -54,8 +54,13 @@ def createStream(ssc, hostname, port, :param bodyDecoder: A function used to decode body (default is utf8_decoder) :return: A DStream object -.. note:: Deprecated in 2.3.0 +.. note:: Deprecated in 2.3.0. Flume support is deprecated as of Spark 2.3.0. +See SPARK-22142. """ +warnings.warn( --- End diff -- thank you :) It will be good to also check why master build does not fail since python should complain about it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19535: [SPARK-22313][PYTHON] Mark/print deprecation warn...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/19535#discussion_r159013024 --- Diff: python/pyspark/streaming/flume.py --- @@ -54,8 +54,13 @@ def createStream(ssc, hostname, port, :param bodyDecoder: A function used to decode body (default is utf8_decoder) :return: A DStream object -.. note:: Deprecated in 2.3.0 +.. note:: Deprecated in 2.3.0. Flume support is deprecated as of Spark 2.3.0. +See SPARK-22142. """ +warnings.warn( --- End diff -- Seems `warnings` is not imported in this file? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #5604: [SPARK-1442][SQL] Window Function Support for Spar...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5604#discussion_r157933488 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.analysis.UnresolvedException +import org.apache.spark.sql.catalyst.errors.TreeNodeException +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.{NumericType, DataType} + +/** + * The trait of the Window Specification (specified in the OVER clause or WINDOW clause) for + * Window Functions. + */ +sealed trait WindowSpec + +/** + * The specification for a window function. + * @param partitionSpec It defines the way that input rows are partitioned. + * @param orderSpec It defines the ordering of rows in a partition. + * @param frameSpecification It defines the window frame in a partition. + */ +case class WindowSpecDefinition( +partitionSpec: Seq[Expression], +orderSpec: Seq[SortOrder], +frameSpecification: WindowFrame) extends Expression with WindowSpec { + + def validate: Option[String] = frameSpecification match { +case UnspecifiedFrame => + Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + +"during analysis. Please file a bug report.") +case frame: SpecifiedWindowFrame => frame.validate.orElse { + def checkValueBasedBoundaryForRangeFrame(): Option[String] = { +if (orderSpec.length > 1) { + // It is not allowed to have a value-based PRECEDING and FOLLOWING + // as the boundary of a Range Window Frame. + Some("This Range Window Frame only accepts at most one ORDER BY expression.") +} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { + Some("The data type of the expression in the ORDER BY clause should be a numeric type.") +} else { + None +} + } + + (frame.frameType, frame.frameStart, frame.frameEnd) match { +case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() +case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() +case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() +case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() +case (_, _, _) => None + } +} + } + + type EvaluatedType = Any + + override def children: Seq[Expression] = partitionSpec ++ orderSpec + + override lazy val resolved: Boolean = +childrenResolved && frameSpecification.isInstanceOf[SpecifiedWindowFrame] + + + override def toString: String = simpleString + + override def eval(input: Row): EvaluatedType = throw new UnsupportedOperationException + override def nullable: Boolean = true + override def foldable: Boolean = false + override def dataType: DataType = throw new UnsupportedOperationException +} + +/** + * A Window specification reference that refers to the [[WindowSpecDefinition]] defined + * under the name `name`. + */ +case class WindowSpecReference(name: String) extends WindowSpec + +/** + * The trait used to represent the type of a Window Frame. + */ +sealed trait FrameType + +/** + * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] + * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered + * as a physical offset. + * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, + *
[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/19448 Thank you :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/19448 I am not really worried about this particular change. It's already merged and it seems a small and safe change. I am not planning to revert it. But, in general, let's avoid of merging changes that are not bug fixes to a maintenance branch. If there is an exception, it will be better to make it clear earlier. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/19448 @HyukjinKwon branch-2.2 is in a maintenance branch, I am not sure it is appropriate to merge this change to branch-2.2 since it is not really a bug fix. If the doc is not accurate, we should fix the doc. For a maintenance branch, we need to be very careful on what we merge and we should always avoid of unnecessary changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/19149 Can we add a test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19080: [SPARK-21865][SQL] simplify the distribution sema...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r136214689 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -30,18 +30,43 @@ import org.apache.spark.sql.types.{DataType, IntegerType} * - Intra-partition ordering of data: In this case the distribution describes guarantees made *about how tuples are distributed within a single partition. */ -sealed trait Distribution +sealed trait Distribution { + /** + * The required number of partitions for this distribution. If it's None, then any number of + * partitions is allowed for this distribution. + */ + def requiredNumPartitions: Option[Int] + + /** + * Creates a default partitioning for this distribution, which can satisfy this distribution while + * matching the given number of partitions. + */ + def createPartitioning(numPartitions: Int): Partitioning +} /** * Represents a distribution where no promises are made about co-location of data. */ -case object UnspecifiedDistribution extends Distribution +case object UnspecifiedDistribution extends Distribution { + override def requiredNumPartitions: Option[Int] = None + + override def createPartitioning(numPartitions: Int): Partitioning = { +throw new IllegalStateException("UnspecifiedDistribution does not have default partitioning.") + } +} /** * Represents a distribution that only has a single partition and all tuples of the dataset * are co-located. */ -case object AllTuples extends Distribution +case object AllTuples extends Distribution { --- End diff -- I'd like to keep `AllTuples`. `SingleNodeDistribution` is a special case of `AllTuples` and seems we do not really need the extra information introduced by `SingleNode`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19080: [SPARK-21865][SQL] simplify the distribution semantic of...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/19080 Have a question after reading the new approach. Let's say that we have a join like `T1 JOIN T2 on T1.a = T2.a`. Also `T1` is hash partitioned by the value of `T1.a` and it has 10 partitions, and `T2` is range partitioned by the value of `T2.a` and it has 10 partitions. Both sides will satisfy the required distribution of the join. However, we need to add an exchange at either side in order to produce the correct result. How will we handle this case with this change? Also, regarding > For multiple children, Spark only guarantees they have the same number of partitions, and it's the operator's responsibility to leverage this guarantee to achieve more complicated requirements. Can you give a concrete example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[3/3] spark-website git commit: Add the news about spark-summit-eu-2017 agenda
Add the news about spark-summit-eu-2017 agenda Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/35eb1471 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/35eb1471 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/35eb1471 Branch: refs/heads/asf-site Commit: 35eb1471704a97c18e96b46f2495a7117565466d Parents: cca972e Author: Yin HuaiAuthored: Mon Aug 28 22:40:10 2017 + Committer: Yin Huai Committed: Mon Aug 28 15:54:26 2017 -0700 -- ...-08-28-spark-summit-eu-2017-agenda-posted.md | 17 ++ site/committers.html| 6 +- site/community.html | 6 +- site/contributing.html | 6 +- site/developer-tools.html | 6 +- site/documentation.html | 6 +- site/downloads.html | 6 +- site/examples.html | 6 +- site/faq.html | 6 +- site/graphx/index.html | 6 +- site/improvement-proposals.html | 6 +- site/index.html | 6 +- site/mailing-lists.html | 6 +- site/mllib/index.html | 6 +- site/news/amp-camp-2013-registration-ope.html | 6 +- .../news/announcing-the-first-spark-summit.html | 6 +- .../news/fourth-spark-screencast-published.html | 6 +- site/news/index.html| 16 +- site/news/nsdi-paper.html | 6 +- site/news/one-month-to-spark-summit-2015.html | 6 +- .../proposals-open-for-spark-summit-east.html | 6 +- ...registration-open-for-spark-summit-east.html | 6 +- .../news/run-spark-and-shark-on-amazon-emr.html | 6 +- site/news/spark-0-6-1-and-0-5-2-released.html | 6 +- site/news/spark-0-6-2-released.html | 6 +- site/news/spark-0-7-0-released.html | 6 +- site/news/spark-0-7-2-released.html | 6 +- site/news/spark-0-7-3-released.html | 6 +- site/news/spark-0-8-0-released.html | 6 +- site/news/spark-0-8-1-released.html | 6 +- site/news/spark-0-9-0-released.html | 6 +- site/news/spark-0-9-1-released.html | 6 +- site/news/spark-0-9-2-released.html | 6 +- site/news/spark-1-0-0-released.html | 6 +- site/news/spark-1-0-1-released.html | 6 +- site/news/spark-1-0-2-released.html | 6 +- site/news/spark-1-1-0-released.html | 6 +- site/news/spark-1-1-1-released.html | 6 +- site/news/spark-1-2-0-released.html | 6 +- site/news/spark-1-2-1-released.html | 6 +- site/news/spark-1-2-2-released.html | 6 +- site/news/spark-1-3-0-released.html | 6 +- site/news/spark-1-4-0-released.html | 6 +- site/news/spark-1-4-1-released.html | 6 +- site/news/spark-1-5-0-released.html | 6 +- site/news/spark-1-5-1-released.html | 6 +- site/news/spark-1-5-2-released.html | 6 +- site/news/spark-1-6-0-released.html | 6 +- site/news/spark-1-6-1-released.html | 6 +- site/news/spark-1-6-2-released.html | 6 +- site/news/spark-1-6-3-released.html | 6 +- site/news/spark-2-0-0-released.html | 6 +- site/news/spark-2-0-1-released.html | 6 +- site/news/spark-2-0-2-released.html | 6 +- site/news/spark-2-1-0-released.html | 6 +- site/news/spark-2-1-1-released.html | 6 +- site/news/spark-2-2-0-released.html | 6 +- site/news/spark-2.0.0-preview.html | 6 +- .../spark-accepted-into-apache-incubator.html | 6 +- site/news/spark-and-shark-in-the-news.html | 6 +- site/news/spark-becomes-tlp.html| 6 +- site/news/spark-featured-in-wired.html | 6 +- .../spark-mailing-lists-moving-to-apache.html | 6 +- site/news/spark-meetups.html| 6 +- site/news/spark-screencasts-published.html | 6 +- site/news/spark-summit-2013-is-a-wrap.html | 6 +- site/news/spark-summit-2014-videos-posted.html | 6 +- site/news/spark-summit-2015-videos-posted.html | 6 +- site/news/spark-summit-agenda-posted.html | 6 +- .../spark-summit-east-2015-videos-posted.html | 6 +- .../spark-summit-east-2016-cfp-closing.html | 6 +- .../spark-summit-east-2017-agenda-posted.html | 6 +- site/news/spark-summit-east-agenda-posted.html | 6 +- .../spark-summit-eu-2017-agenda-posted.html | 223 +++
[1/3] spark-website git commit: Add the news about spark-summit-eu-2017 agenda
Repository: spark-website Updated Branches: refs/heads/asf-site cca972e7f -> 35eb14717 http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-3-0.html -- diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html index 10d934b..5e4d302 100644 --- a/site/releases/spark-release-1-3-0.html +++ b/site/releases/spark-release-1-3-0.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-3-1.html -- diff --git a/site/releases/spark-release-1-3-1.html b/site/releases/spark-release-1-3-1.html index 7df8028..116898f 100644 --- a/site/releases/spark-release-1-3-1.html +++ b/site/releases/spark-release-1-3-1.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-4-0.html -- diff --git a/site/releases/spark-release-1-4-0.html b/site/releases/spark-release-1-4-0.html index 143cc17..b75a496 100644 --- a/site/releases/spark-release-1-4-0.html +++ b/site/releases/spark-release-1-4-0.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-4-1.html -- diff --git a/site/releases/spark-release-1-4-1.html b/site/releases/spark-release-1-4-1.html index ccdd161..30b92fd 100644 --- a/site/releases/spark-release-1-4-1.html +++ b/site/releases/spark-release-1-4-1.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-5-0.html -- diff --git a/site/releases/spark-release-1-5-0.html b/site/releases/spark-release-1-5-0.html index f73ab5d..6e1411d 100644 --- a/site/releases/spark-release-1-5-0.html +++ b/site/releases/spark-release-1-5-0.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-5-1.html -- diff --git a/site/releases/spark-release-1-5-1.html b/site/releases/spark-release-1-5-1.html index 3af892e..b447dd7 100644 --- a/site/releases/spark-release-1-5-1.html +++
[2/3] spark-website git commit: Add the news about spark-summit-eu-2017 agenda
http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-accepted-into-apache-incubator.html -- diff --git a/site/news/spark-accepted-into-apache-incubator.html b/site/news/spark-accepted-into-apache-incubator.html index 62638f2..a4a913f 100644 --- a/site/news/spark-accepted-into-apache-incubator.html +++ b/site/news/spark-accepted-into-apache-incubator.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-and-shark-in-the-news.html -- diff --git a/site/news/spark-and-shark-in-the-news.html b/site/news/spark-and-shark-in-the-news.html index 4a0c4fc..55d2ade 100644 --- a/site/news/spark-and-shark-in-the-news.html +++ b/site/news/spark-and-shark-in-the-news.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-becomes-tlp.html -- diff --git a/site/news/spark-becomes-tlp.html b/site/news/spark-becomes-tlp.html index 6c76d20..0f17857 100644 --- a/site/news/spark-becomes-tlp.html +++ b/site/news/spark-becomes-tlp.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-featured-in-wired.html -- diff --git a/site/news/spark-featured-in-wired.html b/site/news/spark-featured-in-wired.html index 1d35e40..1c0b69a 100644 --- a/site/news/spark-featured-in-wired.html +++ b/site/news/spark-featured-in-wired.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-mailing-lists-moving-to-apache.html -- diff --git a/site/news/spark-mailing-lists-moving-to-apache.html b/site/news/spark-mailing-lists-moving-to-apache.html index b586b65..4e12162 100644 --- a/site/news/spark-mailing-lists-moving-to-apache.html +++ b/site/news/spark-mailing-lists-moving-to-apache.html @@ -161,6 +161,9 @@ Latest News + Spark Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted + (Aug 28, 2017) + Spark 2.2.0 released (Jul 11, 2017) @@ -170,9 +173,6 @@ Spark Summit (June 5-7th, 2017, San Francisco) agenda posted (Mar 31, 2017) - Spark Summit East (Feb 7-9th, 2017, Boston) agenda posted - (Jan 04, 2017) - Archive http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-meetups.html -- diff --git a/site/news/spark-meetups.html b/site/news/spark-meetups.html index 4de6525..92da537 100644 --- a/site/news/spark-meetups.html +++ b/site/news/spark-meetups.html @@ -161,6
[GitHub] spark issue #18944: [SPARK-21732][SQL]Lazily init hive metastore client
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/18944 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-21111][TEST][2.2] Fix the test failure of describe.sql
Repository: spark Updated Branches: refs/heads/branch-2.2 76ee41fd7 -> a585c870a [SPARK-2][TEST][2.2] Fix the test failure of describe.sql ## What changes were proposed in this pull request? Test failed in `describe.sql`. We need to fix the related bug introduced in (https://github.com/apache/spark/pull/17649) in the follow-up PR to master. ## How was this patch tested? N/A Author: gatorsmileCloses #18316 from gatorsmile/fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a585c870 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a585c870 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a585c870 Branch: refs/heads/branch-2.2 Commit: a585c870a066fa94d97462cefbaa4057a7a0ed44 Parents: 76ee41f Author: gatorsmile Authored: Thu Jun 15 18:25:39 2017 -0700 Committer: Yin Huai Committed: Thu Jun 15 18:25:39 2017 -0700 -- sql/core/src/test/resources/sql-tests/results/describe.sql.out | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a585c870/sql/core/src/test/resources/sql-tests/results/describe.sql.out -- diff --git a/sql/core/src/test/resources/sql-tests/results/describe.sql.out b/sql/core/src/test/resources/sql-tests/results/describe.sql.out index 329532c..ab9f278 100644 --- a/sql/core/src/test/resources/sql-tests/results/describe.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/describe.sql.out @@ -127,6 +127,7 @@ Providerparquet Num Buckets2 Bucket Columns [`a`] Sort Columns [`b`] +Commenttable_comment Table Properties [e=3] Location [not included in comparison]sql/core/spark-warehouse/t Storage Properties [a=1, b=2] @@ -157,6 +158,7 @@ Providerparquet Num Buckets2 Bucket Columns [`a`] Sort Columns [`b`] +Commenttable_comment Table Properties [e=3] Location [not included in comparison]sql/core/spark-warehouse/t Storage Properties [a=1, b=2] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #18316: [SPARK-21111] [TEST] [2.2] Fix the test failure of descr...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/18316 Thanks! I have merged this pr to branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18316: [SPARK-21111] [TEST] [2.2] Fix the test failure of descr...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/18316 thanks! merging to branch-2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18316: [SPARK-21111] [TEST] [2.2] Fix the test failure of descr...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/18316 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/18064 My suggestion was about getting changes on the interfaces of ExecutedCommandExec and SaveIntoDataSourceCommand to separate prs. It will help code review (both speed and quality). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18148: [SPARK-20926][SQL] Removing exposures to guava library c...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/18148 @vanzin Seems merging to branch-2.2 was an accident? Since it is not really a bug fix, should we revert it from branch-2.2 and just keep it in the master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/18064 I just case across this pr. I have one general feedback. It will be great if we can make a pr have a single purpose. This pr contains different kinds of changes in order to fix the UI. If refactoring is needed, I'd recommend to have separate PR for refactoring purposes. Then, use a different PR to do the actual fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: Revert "[SPARK-20946][SQL] simplify the config setting logic in SparkSession.getOrCreate"
Repository: spark Updated Branches: refs/heads/branch-2.2 6c628e75e -> b560c975b Revert "[SPARK-20946][SQL] simplify the config setting logic in SparkSession.getOrCreate" This reverts commit e11d90bf8deb553fd41b8837e3856c11486c2503. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b560c975 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b560c975 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b560c975 Branch: refs/heads/branch-2.2 Commit: b560c975b7cdc8828fc9e27cbca740c5e550b9cd Parents: 6c628e7 Author: Yin HuaiAuthored: Fri Jun 2 15:36:21 2017 -0700 Committer: Yin Huai Committed: Fri Jun 2 15:37:38 2017 -0700 -- .../spark/ml/recommendation/ALSSuite.scala | 4 +++- .../apache/spark/ml/tree/impl/TreeTests.scala | 2 ++ .../org/apache/spark/sql/SparkSession.scala | 25 +--- 3 files changed, 21 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b560c975/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala index 23f2256..701040f 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala @@ -820,13 +820,15 @@ class ALSCleanerSuite extends SparkFunSuite { FileUtils.listFiles(localDir, TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE).asScala.toSet try { conf.set("spark.local.dir", localDir.getAbsolutePath) - val sc = new SparkContext("local[2]", "ALSCleanerSuite", conf) + val sc = new SparkContext("local[2]", "test", conf) try { sc.setCheckpointDir(checkpointDir.getAbsolutePath) // Generate test data val (training, _) = ALSSuite.genImplicitTestData(sc, 20, 5, 1, 0.2, 0) // Implicitly test the cleaning of parents during ALS training val spark = SparkSession.builder + .master("local[2]") + .appName("ALSCleanerSuite") .sparkContext(sc) .getOrCreate() import spark.implicits._ http://git-wip-us.apache.org/repos/asf/spark/blob/b560c975/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala b/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala index b6894b3..92a2369 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala @@ -43,6 +43,8 @@ private[ml] object TreeTests extends SparkFunSuite { categoricalFeatures: Map[Int, Int], numClasses: Int): DataFrame = { val spark = SparkSession.builder() + .master("local[2]") + .appName("TreeTests") .sparkContext(data.sparkContext) .getOrCreate() import spark.implicits._ http://git-wip-us.apache.org/repos/asf/spark/blob/b560c975/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala index bf37b76..d2bf350 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -757,8 +757,6 @@ object SparkSession { private[this] var userSuppliedContext: Option[SparkContext] = None -// The `SparkConf` inside the given `SparkContext` may get changed if you specify some options -// for this builder. private[spark] def sparkContext(sparkContext: SparkContext): Builder = synchronized { userSuppliedContext = Option(sparkContext) this @@ -856,7 +854,7 @@ object SparkSession { * * @since 2.2.0 */ -def withExtensions(f: SparkSessionExtensions => Unit): Builder = synchronized { +def withExtensions(f: SparkSessionExtensions => Unit): Builder = { f(extensions) this } @@ -901,14 +899,22 @@ object SparkSession { // No active nor global default session. Create a new one. val sparkContext = userSuppliedContext.getOrElse { + // set app name if not given + val randomAppName = java.util.UUID.randomUUID().toString val sparkConf = new SparkConf() - options.get("spark.master").foreach(sparkConf.setMaster) - // set a random app
[GitHub] spark issue #18172: [SPARK-20946][SQL] simplify the config setting logic in ...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/18172 Reverting this because it breaks repl tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17617: [SPARK-20244][Core] Handle incorrect bytesRead me...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/17617#discussion_r119938185 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -143,14 +144,29 @@ class SparkHadoopUtil extends Logging { * Returns a function that can be called to find Hadoop FileSystem bytes read. If * getFSBytesReadOnThreadCallback is called from thread r at time t, the returned callback will * return the bytes read on r since t. - * - * @return None if the required method can't be found. --- End diff -- Why removing this line instead of the doc? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17763: [SPARK-13747][Core]Add ThreadUtils.awaitReady and disall...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17763 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17666 I have reverted this change from both master and branch-2.2. I have reopened the jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: Revert "[SPARK-20311][SQL] Support aliases for table value functions"
Repository: spark Updated Branches: refs/heads/branch-2.2 9e8d23b3a -> d191b962d Revert "[SPARK-20311][SQL] Support aliases for table value functions" This reverts commit 714811d0b5bcb5d47c39782ff74f898d276ecc59. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d191b962 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d191b962 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d191b962 Branch: refs/heads/branch-2.2 Commit: d191b962dc81c015fa92a38d882a8c7ea620ef06 Parents: 9e8d23b Author: Yin HuaiAuthored: Tue May 9 14:47:45 2017 -0700 Committer: Yin Huai Committed: Tue May 9 14:49:02 2017 -0700 -- .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 20 ++ .../analysis/ResolveTableValuedFunctions.scala | 22 +++- .../sql/catalyst/analysis/unresolved.scala | 10 ++--- .../spark/sql/catalyst/parser/AstBuilder.scala | 17 --- .../sql/catalyst/analysis/AnalysisSuite.scala | 14 + .../sql/catalyst/parser/PlanParserSuite.scala | 13 +--- 6 files changed, 17 insertions(+), 79 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d191b962/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 15e4dd4..1ecb3d1 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -472,23 +472,15 @@ identifierComment ; relationPrimary -: tableIdentifier sample? (AS? strictIdentifier)? #tableName -| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery -| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation -| inlineTable #inlineTableDefault2 -| functionTable#tableValuedFunction +: tableIdentifier sample? (AS? strictIdentifier)? #tableName +| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery +| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation +| inlineTable #inlineTableDefault2 +| identifier '(' (expression (',' expression)*)? ')' #tableValuedFunction ; inlineTable -: VALUES expression (',' expression)* tableAlias -; - -functionTable -: identifier '(' (expression (',' expression)*)? ')' tableAlias -; - -tableAlias -: (AS? identifier identifierList?)? +: VALUES expression (',' expression)* (AS? identifier identifierList?)? ; rowFormat http://git-wip-us.apache.org/repos/asf/spark/blob/d191b962/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala index dad1340..de6de24 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala @@ -19,8 +19,8 @@ package org.apache.spark.sql.catalyst.analysis import java.util.Locale -import org.apache.spark.sql.catalyst.expressions.{Alias, Expression} -import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, Range} +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Range} import org.apache.spark.sql.catalyst.rules._ import org.apache.spark.sql.types.{DataType, IntegerType, LongType} @@ -105,7 +105,7 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case u: UnresolvedTableValuedFunction if u.functionArgs.forall(_.resolved) => - val resolvedFunc = builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match { + builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match { case Some(tvf) => val resolved = tvf.flatMap { case (argList, resolver) => argList.implicitCast(u.functionArgs) match { @@ -125,21 +125,5 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan]
spark git commit: Revert "[SPARK-20311][SQL] Support aliases for table value functions"
Repository: spark Updated Branches: refs/heads/master ac1ab6b9d -> f79aa285c Revert "[SPARK-20311][SQL] Support aliases for table value functions" This reverts commit 714811d0b5bcb5d47c39782ff74f898d276ecc59. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f79aa285 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f79aa285 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f79aa285 Branch: refs/heads/master Commit: f79aa285cf115963ba06a9cacb3dbd7e3cbf7728 Parents: ac1ab6b Author: Yin HuaiAuthored: Tue May 9 14:47:45 2017 -0700 Committer: Yin Huai Committed: Tue May 9 14:47:45 2017 -0700 -- .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 20 ++ .../analysis/ResolveTableValuedFunctions.scala | 22 +++- .../sql/catalyst/analysis/unresolved.scala | 10 ++--- .../spark/sql/catalyst/parser/AstBuilder.scala | 17 --- .../sql/catalyst/analysis/AnalysisSuite.scala | 14 + .../sql/catalyst/parser/PlanParserSuite.scala | 13 +--- 6 files changed, 17 insertions(+), 79 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f79aa285/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 41daf58..14c511f 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -472,23 +472,15 @@ identifierComment ; relationPrimary -: tableIdentifier sample? (AS? strictIdentifier)? #tableName -| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery -| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation -| inlineTable #inlineTableDefault2 -| functionTable#tableValuedFunction +: tableIdentifier sample? (AS? strictIdentifier)? #tableName +| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery +| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation +| inlineTable #inlineTableDefault2 +| identifier '(' (expression (',' expression)*)? ')' #tableValuedFunction ; inlineTable -: VALUES expression (',' expression)* tableAlias -; - -functionTable -: identifier '(' (expression (',' expression)*)? ')' tableAlias -; - -tableAlias -: (AS? identifier identifierList?)? +: VALUES expression (',' expression)* (AS? identifier identifierList?)? ; rowFormat http://git-wip-us.apache.org/repos/asf/spark/blob/f79aa285/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala index dad1340..de6de24 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala @@ -19,8 +19,8 @@ package org.apache.spark.sql.catalyst.analysis import java.util.Locale -import org.apache.spark.sql.catalyst.expressions.{Alias, Expression} -import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, Range} +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Range} import org.apache.spark.sql.catalyst.rules._ import org.apache.spark.sql.types.{DataType, IntegerType, LongType} @@ -105,7 +105,7 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { case u: UnresolvedTableValuedFunction if u.functionArgs.forall(_.resolved) => - val resolvedFunc = builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match { + builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match { case Some(tvf) => val resolved = tvf.flatMap { case (argList, resolver) => argList.implicitCast(u.functionArgs) match { @@ -125,21 +125,5 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] {
[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17666 I am going to revert this PR from master and branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17666 @maropu Sorry. I think this PR introduces a regression. ``` scala> spark.sql("select * from range(1, 10) cross join range(1, 10)").explain == Physical Plan == org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER join between logical plans Range (1, 10, step=1, splits=None) and Range (1, 10, step=1, splits=None) Join condition is missing or trivial. Use the CROSS JOIN syntax to allow cartesian products between these relations.; ``` I think we are taking the cross as the alias. I reverted your change locally and the query worked. I am attaching the expected analyzed plan below. ``` scala> spark.sql("select * from range(1, 10) cross join range(1, 10)").queryExecution.analyzed res1: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Project [id#8L, id#9L] +- Join Cross :- Range (1, 10, step=1, splits=None) +- Range (1, 10, step=1, splits=None) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17905 i see. I think https://github.com/apache/spark/pull/17905/commits/d4c1a9db25ee7386f7b12e4dabb54210a9892510 is good. How about we get it checked in first (after jenkins passes)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17905 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17905 @falaki's PR did not actually trigger that test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17905 @felixcheung you are right. That is the problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17903 I do not think https://github.com/apache/spark/pull/17649 caused the problem. I saw failures without that internally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails
Repository: spark Updated Branches: refs/heads/branch-2.2 23681e9ca -> 4179ffc03 [SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails ## What changes were proposed in this pull request? Cleaning existing temp tables before running tableNames tests ## How was this patch tested? SparkR Unit tests Author: HosseinCloses #17903 from falaki/SPARK-20661. (cherry picked from commit 2abfee18b6511482b916c36f00bf3abf68a59e19) Signed-off-by: Yin Huai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4179ffc0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4179ffc0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4179ffc0 Branch: refs/heads/branch-2.2 Commit: 4179ffc031a0dbca6a93255c673de800ce7393fe Parents: 23681e9 Author: Hossein Authored: Mon May 8 14:48:11 2017 -0700 Committer: Yin Huai Committed: Mon May 8 14:48:29 2017 -0700 -- R/pkg/inst/tests/testthat/test_sparkSQL.R | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4179ffc0/R/pkg/inst/tests/testthat/test_sparkSQL.R -- diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R b/R/pkg/inst/tests/testthat/test_sparkSQL.R index 3f445e2..58cd259 100644 --- a/R/pkg/inst/tests/testthat/test_sparkSQL.R +++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R @@ -668,6 +668,8 @@ test_that("jsonRDD() on a RDD with json string", { }) test_that("test tableNames and tables", { + # Making sure there are no registered temp tables from previous tests + suppressWarnings(sapply(tableNames(), function(tname) { dropTempTable(tname) })) df <- read.json(jsonPath) createOrReplaceTempView(df, "table1") expect_equal(length(tableNames()), 1) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails
Repository: spark Updated Branches: refs/heads/master 829cd7b8b -> 2abfee18b [SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails ## What changes were proposed in this pull request? Cleaning existing temp tables before running tableNames tests ## How was this patch tested? SparkR Unit tests Author: HosseinCloses #17903 from falaki/SPARK-20661. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2abfee18 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2abfee18 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2abfee18 Branch: refs/heads/master Commit: 2abfee18b6511482b916c36f00bf3abf68a59e19 Parents: 829cd7b Author: Hossein Authored: Mon May 8 14:48:11 2017 -0700 Committer: Yin Huai Committed: Mon May 8 14:48:11 2017 -0700 -- R/pkg/inst/tests/testthat/test_sparkSQL.R | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2abfee18/R/pkg/inst/tests/testthat/test_sparkSQL.R -- diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R b/R/pkg/inst/tests/testthat/test_sparkSQL.R index f517ce6..ab6888e 100644 --- a/R/pkg/inst/tests/testthat/test_sparkSQL.R +++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R @@ -677,6 +677,8 @@ test_that("jsonRDD() on a RDD with json string", { }) test_that("test tableNames and tables", { + # Making sure there are no registered temp tables from previous tests + suppressWarnings(sapply(tableNames(), function(tname) { dropTempTable(tname) })) df <- read.json(jsonPath) createOrReplaceTempView(df, "table1") expect_equal(length(tableNames()), 1) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17903 Thanks @falaki. Merging to master and branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17903 Seems 2.2 build is fine. But, I'd like to get this merged in branch-2.2 since this test will fail if any previous tests leak tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17903 @felixcheung fyi. I think the main problem of this test is that it will be broken if tests executed before this one leak any table. I think this change makes sense. I will merge it once it passes jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17892: [SPARK-20626][SPARKR] address date test warning with tim...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17892 @felixcheung Seems master build is broken because R tests are broken (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/2844/console). I am not sure if this PR caused that. Can you help to take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17746: [SPARK-20449][ML] Upgrade breeze version to 0.13.1
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17746 @dbtsai Thanks for the explanation and the context :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17746: [SPARK-20449][ML] Upgrade breeze version to 0.13.1
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17746 Can I ask how we decided merging this dependency change after the cut of the release branch (especially this change affects user code)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17659: [SPARK-20358] [core] Executors failing stage on interrup...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17659 lgtm. Merging to master and branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-20217][CORE] Executor should not fail stage if killed task throws non-interrupted exception
Repository: spark Updated Branches: refs/heads/master 4000f128b -> 5142e5d4e [SPARK-20217][CORE] Executor should not fail stage if killed task throws non-interrupted exception ## What changes were proposed in this pull request? If tasks throw non-interrupted exceptions on kill (e.g. java.nio.channels.ClosedByInterruptException), their death is reported back as TaskFailed instead of TaskKilled. This causes stage failure in some cases. This is reproducible as follows. Run the following, and then use SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will fail since we threw a RuntimeException instead of InterruptedException. ``` spark.range(100).repartition(100).foreach { i => try { Thread.sleep(1000) } catch { case t: InterruptedException => throw new RuntimeException(t) } } ``` Based on the code in TaskSetManager, I think this also affects kills of speculative tasks. However, since the number of speculated tasks is few, and usually you need to fail a task a few times before the stage is cancelled, it unlikely this would be noticed in production unless both speculation was enabled and the num allowed task failures was = 1. We should probably unconditionally return TaskKilled instead of TaskFailed if the task was killed by the driver, regardless of the actual exception thrown. ## How was this patch tested? Unit test. The test fails before the change in Executor.scala cc JoshRosen Author: Eric LiangCloses #17531 from ericl/fix-task-interrupt. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5142e5d4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5142e5d4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5142e5d4 Branch: refs/heads/master Commit: 5142e5d4e09c7cb36cf1d792934a21c5305c6d42 Parents: 4000f12 Author: Eric Liang Authored: Wed Apr 5 19:37:21 2017 -0700 Committer: Yin Huai Committed: Wed Apr 5 19:37:21 2017 -0700 -- core/src/main/scala/org/apache/spark/executor/Executor.scala | 2 +- core/src/test/scala/org/apache/spark/SparkContextSuite.scala | 8 +++- 2 files changed, 8 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5142e5d4/core/src/main/scala/org/apache/spark/executor/Executor.scala -- diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala b/core/src/main/scala/org/apache/spark/executor/Executor.scala index 99b1608..83469c5 100644 --- a/core/src/main/scala/org/apache/spark/executor/Executor.scala +++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala @@ -432,7 +432,7 @@ private[spark] class Executor( setTaskFinishedAndClearInterruptStatus() execBackend.statusUpdate(taskId, TaskState.KILLED, ser.serialize(TaskKilled(t.reason))) -case _: InterruptedException if task.reasonIfKilled.isDefined => +case NonFatal(_) if task != null && task.reasonIfKilled.isDefined => val killReason = task.reasonIfKilled.getOrElse("unknown reason") logInfo(s"Executor interrupted and killed $taskName (TID $taskId), reason: $killReason") setTaskFinishedAndClearInterruptStatus() http://git-wip-us.apache.org/repos/asf/spark/blob/5142e5d4/core/src/test/scala/org/apache/spark/SparkContextSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala index 2c94755..735f445 100644 --- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala @@ -572,7 +572,13 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu // first attempt will hang if (!SparkContextSuite.isTaskStarted) { SparkContextSuite.isTaskStarted = true - Thread.sleep(999) + try { +Thread.sleep(999) + } catch { +case t: Throwable => + // SPARK-20217 should not fail stage if task throws non-interrupted exception + throw new RuntimeException("killed") + } } // second attempt succeeds immediately } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17531 Thanks. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17531 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17423: [SPARK-20088] Do not create new SparkContext in SparkR c...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17423 got it. Thanks :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17423: [SPARK-20088] Do not create new SparkContext in SparkR c...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17423 @felixcheung `SparkContext.getOrCreate` is the preferred way to create a SparkContext. So, even we have check, it is still better to use `getOrCreate`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-19620][SQL] Fix incorrect exchange coordinator id in the physical plan
Repository: spark Updated Branches: refs/heads/master fcb68e0f5 -> dd9049e04 [SPARK-19620][SQL] Fix incorrect exchange coordinator id in the physical plan ## What changes were proposed in this pull request? When adaptive execution is enabled, an exchange coordinator is used in the Exchange operators. For Join, the same exchange coordinator is used for its two Exchanges. But the physical plan shows two different coordinator Ids which is confusing. This PR is to fix the incorrect exchange coordinator id in the physical plan. The coordinator object instead of the `Option[ExchangeCoordinator]` should be used to generate the identity hash code of the same coordinator. ## How was this patch tested? Before the patch, the physical plan shows two different exchange coordinator id for Join. ``` == Physical Plan == *Project [key1#3L, value2#12L] +- *SortMergeJoin [key1#3L], [key2#11L], Inner :- *Sort [key1#3L ASC NULLS FIRST], false, 0 : +- Exchange(coordinator id: 1804587700) hashpartitioning(key1#3L, 10), coordinator[target post-shuffle partition size: 67108864] : +- *Project [(id#0L % 500) AS key1#3L] :+- *Filter isnotnull((id#0L % 500)) : +- *Range (0, 1000, step=1, splits=Some(10)) +- *Sort [key2#11L ASC NULLS FIRST], false, 0 +- Exchange(coordinator id: 793927319) hashpartitioning(key2#11L, 10), coordinator[target post-shuffle partition size: 67108864] +- *Project [(id#8L % 500) AS key2#11L, id#8L AS value2#12L] +- *Filter isnotnull((id#8L % 500)) +- *Range (0, 1000, step=1, splits=Some(10)) ``` After the patch, two exchange coordinator id are the same. Author: Carson WangCloses #16952 from carsonwang/FixCoordinatorId. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dd9049e0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dd9049e0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dd9049e0 Branch: refs/heads/master Commit: dd9049e0492cc70b629518fee9b3d1632374c612 Parents: fcb68e0 Author: Carson Wang Authored: Fri Mar 10 11:13:26 2017 -0800 Committer: Yin Huai Committed: Fri Mar 10 11:13:26 2017 -0800 -- .../org/apache/spark/sql/execution/exchange/ShuffleExchange.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dd9049e0/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala index 125a493..f06544e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala @@ -46,7 +46,7 @@ case class ShuffleExchange( override def nodeName: String = { val extraInfo = coordinator match { case Some(exchangeCoordinator) => -s"(coordinator id: ${System.identityHashCode(coordinator)})" +s"(coordinator id: ${System.identityHashCode(exchangeCoordinator)})" case None => "" } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16952 LGTM. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17156: [SPARK-19816][SQL][Tests] Fix an issue that DataFrameCal...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17156 merged to branch-2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite doesn't recover the log level
Repository: spark Updated Branches: refs/heads/branch-2.1 da04d45c2 -> 664c9795c [SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite doesn't recover the log level ## What changes were proposed in this pull request? "DataFrameCallbackSuite.execute callback functions when a DataFrame action failed" sets the log level to "fatal" but doesn't recover it. Hence, tests running after it won't output any logs except fatal logs. This PR uses `testQuietly` instead to avoid changing the log level. ## How was this patch tested? Jenkins Author: Shixiong ZhuCloses #17156 from zsxwing/SPARK-19816. (cherry picked from commit fbc4058037cf5b0be9f14a7dd28105f7f8151bed) Signed-off-by: Yin Huai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/664c9795 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/664c9795 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/664c9795 Branch: refs/heads/branch-2.1 Commit: 664c9795c94d3536ff9fe54af06e0fb6c0012862 Parents: da04d45 Author: Shixiong Zhu Authored: Fri Mar 3 19:00:35 2017 -0800 Committer: Yin Huai Committed: Fri Mar 3 19:09:38 2017 -0800 -- .../scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/664c9795/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala index 3ae5ce6..f372e94 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala @@ -58,7 +58,7 @@ class DataFrameCallbackSuite extends QueryTest with SharedSQLContext { spark.listenerManager.unregister(listener) } - test("execute callback functions when a DataFrame action failed") { + testQuietly("execute callback functions when a DataFrame action failed") { val metrics = ArrayBuffer.empty[(String, QueryExecution, Exception)] val listener = new QueryExecutionListener { override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = { @@ -73,8 +73,6 @@ class DataFrameCallbackSuite extends QueryTest with SharedSQLContext { val errorUdf = udf[Int, Int] { _ => throw new RuntimeException("udf error") } val df = sparkContext.makeRDD(Seq(1 -> "a")).toDF("i", "j") -// Ignore the log when we are expecting an exception. -sparkContext.setLogLevel("FATAL") val e = intercept[SparkException](df.select(errorUdf($"i")).collect()) assert(metrics.length == 1) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #17156: [SPARK-19816][SQL][Tests] Fix an issue that DataFrameCal...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/17156 Let's also merge this to branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16917: [SPARK-19529][BRANCH-1.6] Backport PR #16866 to branch-1...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16917 Let's use a meaningful title in future :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16935 cool. It has been merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-19604][TESTS] Log the start of every Python test
Repository: spark Updated Branches: refs/heads/branch-2.1 88c43f4fb -> b9ab4c0e9 [SPARK-19604][TESTS] Log the start of every Python test ## What changes were proposed in this pull request? Right now, we only have info level log after we finish the tests of a Python test file. We should also log the start of a test. So, if a test is hanging, we can tell which test file is running. ## How was this patch tested? This is a change for python tests. Author: Yin Huai <yh...@databricks.com> Closes #16935 from yhuai/SPARK-19604. (cherry picked from commit f6c3bba22501ee7753d85c6e51ffe851d43869c1) Signed-off-by: Yin Huai <yh...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9ab4c0e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9ab4c0e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9ab4c0e Branch: refs/heads/branch-2.1 Commit: b9ab4c0e983df463232f1adbe6e5982b0d7d497d Parents: 88c43f4 Author: Yin Huai <yh...@databricks.com> Authored: Wed Feb 15 14:41:15 2017 -0800 Committer: Yin Huai <yh...@databricks.com> Committed: Wed Feb 15 18:43:57 2017 -0800 -- python/run-tests.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b9ab4c0e/python/run-tests.py -- diff --git a/python/run-tests.py b/python/run-tests.py index 38b3bb8..53a0aef 100755 --- a/python/run-tests.py +++ b/python/run-tests.py @@ -72,7 +72,7 @@ def run_individual_python_test(test_name, pyspark_python): 'PYSPARK_PYTHON': which(pyspark_python), 'PYSPARK_DRIVER_PYTHON': which(pyspark_python) }) -LOGGER.debug("Starting test(%s): %s", pyspark_python, test_name) +LOGGER.info("Starting test(%s): %s", pyspark_python, test_name) start_time = time.time() try: per_test_output = tempfile.TemporaryFile() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16935 Seems I cannot merge now... Will try again later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16935 ok. Nothing new to add. I will merge this to master and branch-2.1 (in case we want to debug any python test hanging issue in branch-2.1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16935 Let's not merge it right now. I may need to log more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16935: [SPARK-19604] [TESTS] Log the start of every Pyth...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/16935 [SPARK-19604] [TESTS] Log the start of every Python test ## What changes were proposed in this pull request? Right now, we only have info level log after we finish the tests of a Python test file. We should also log the start of a test. So, if a test is hanging, we can tell which test file is running. ## How was this patch tested? This is a change for python tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark SPARK-19604 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16935.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16935 commit 1181cc3be7bcf21fbe7e88b35ac662353fb2f366 Author: Yin Huai <yh...@databricks.com> Date: 2017-02-15T04:19:28Z Right now, we only have info level log after we finish the tests of a Python test file. We should also log the start of a test. So, if a test is hanging, we can tell which test file is running. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16894: [SPARK-17897] [SQL] [BACKPORT-2.0] Fixed IsNotNull Const...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16894 thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16067 @gatorsmile can we also add it in branch-2.0? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: [SPARK-19295][SQL] IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars
Repository: spark Updated Branches: refs/heads/master 640f94233 -> 63d839028 [SPARK-19295][SQL] IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars ## What changes were proposed in this pull request? This will help the users to know the location of those downloaded jars when `spark.sql.hive.metastore.jars` is set to `maven`. ## How was this patch tested? jenkins Author: Yin Huai <yh...@databricks.com> Closes #16649 from yhuai/SPARK-19295. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/63d83902 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/63d83902 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/63d83902 Branch: refs/heads/master Commit: 63d839028a6e03644febc360519fa8e01c5534cf Parents: 640f942 Author: Yin Huai <yh...@databricks.com> Authored: Thu Jan 19 14:23:36 2017 -0800 Committer: Yin Huai <yh...@databricks.com> Committed: Thu Jan 19 14:23:36 2017 -0800 -- .../org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/63d83902/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala index 26b2de8..63fdd6b 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala @@ -122,6 +122,7 @@ private[hive] object IsolatedClientLoader extends Logging { // TODO: Remove copy logic. val tempDir = Utils.createTempDir(namePrefix = s"hive-${version}") allFiles.foreach(f => FileUtils.copyFileToDirectory(f, tempDir)) +logInfo(s"Downloaded metastore jars to ${tempDir.getCanonicalPath}") tempDir.listFiles().map(_.toURI.toURL) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #16649: [SPARK-19295] [SQL] IsolatedClientLoader's downloadVersi...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16649 Cool I am merging this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16645 My main concern of this pr is that if people will think it is recommended to add new batches to force those rules running in a certain ordering. For these resolution rules, we can also use conditions to control when they will fire, right? If we will always replace a logical plan to another one in the analysis phase, seems we should use `resolved` to control if a rule will fired. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16649: [SPARK-19295] [SQL] IsolatedClientLoader's downlo...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/16649 [SPARK-19295] [SQL] IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars ## What changes were proposed in this pull request? This will help the users to know the location of those downloaded jars. ## How was this patch tested? jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark SPARK-19295 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16649.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16649 commit 6c67582d85473d123053a45aa051578232c32dad Author: Yin Huai <yh...@databricks.com> Date: 2017-01-19T20:30:08Z [SPARK-19295] IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
spark git commit: Update known_translations for contributor names
Repository: spark Updated Branches: refs/heads/master fe409f31d -> 0c9231858 Update known_translations for contributor names ## What changes were proposed in this pull request? Update known_translations per https://github.com/apache/spark/pull/16423#issuecomment-269739634 Author: Yin Huai <yh...@databricks.com> Closes #16628 from yhuai/known_translations. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c923185 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c923185 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0c923185 Branch: refs/heads/master Commit: 0c9231858866eff16f97df073d22811176fb6b36 Parents: fe409f3 Author: Yin Huai <yh...@databricks.com> Authored: Wed Jan 18 18:18:51 2017 -0800 Committer: Yin Huai <yh...@databricks.com> Committed: Wed Jan 18 18:18:51 2017 -0800 -- dev/create-release/known_translations | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0c923185/dev/create-release/known_translations -- diff --git a/dev/create-release/known_translations b/dev/create-release/known_translations index 0f30990..87bf2f2 100644 --- a/dev/create-release/known_translations +++ b/dev/create-release/known_translations @@ -177,7 +177,7 @@ anabranch - Bill Chambers ashangit - Nicolas Fraison avulanov - Alexander Ulanov biglobster - Liang Ke -cenyuhai - Cen Yu Hai +cenyuhai - Yuhai Cen codlife - Jianfei Wang david-weiluo-ren - Weiluo (David) Ren dding3 - Ding Ding @@ -198,7 +198,8 @@ petermaxlee - Peter Lee phalodi - Sandeep Purohit pkch - pkch priyankagargnitk - Priyanka Garg -sharkdtu - Sharkd Tu +sharkdtu - Xiaogang Tu shenh062326 - Shen Hong aokolnychyi - Anton Okolnychyi linbojin - Linbo Jin +lw-lin - Liwei Lin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16613 nvm. After second thought, the feature flag does not really buy us anything. We just store the original view definition and the column mapping in the metastore. So, I think it is fine to just do the switch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16628: Update known_translations for contributor names
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16628 I am merging this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14204 ok I agree. Originally, I thought it will be helpful to figure out the worker that an executor belongs to. But, if it does not provide very useful information. I am fine to drop it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16628: Update known_translations for contributor names
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16628 done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16613 is there a feature flag that is used to determine if we use this new approach? I feel it will be good to have an internal feature flag to determine the code path. So, if there is something wrong that is hard to fix quickly before the release, we can still switch back to the old code path. Then, in the next release, we can remove the feature flag. What do you think? Also, @jiangxb1987 can you take a look at the SQLViewSuite and see if we have enough test coverage? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16517 Looks good to me. @gatorsmile can you explain your concerns? I am wondering what kind of cases that you think HiveFileFormat may not be able to handle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96566857 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -276,40 +276,31 @@ case class InsertIntoHiveTable( } } -val jobConf = new JobConf(hadoopConf) -val jobConfSer = new SerializableJobConf(jobConf) - -// When speculation is on and output committer class name contains "Direct", we should warn -// users that they may loss data if they are using a direct output committer. -val speculationEnabled = sqlContext.sparkContext.conf.getBoolean("spark.speculation", false) -val outputCommitterClass = jobConf.get("mapred.output.committer.class", "") -if (speculationEnabled && outputCommitterClass.contains("Direct")) { --- End diff -- seems this change is unnecessary and users may still use direct output committer (they can still find the code on Internet). Let's keep the warning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96566523 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -276,40 +276,31 @@ case class InsertIntoHiveTable( } } -val jobConf = new JobConf(hadoopConf) -val jobConfSer = new SerializableJobConf(jobConf) - -// When speculation is on and output committer class name contains "Direct", we should warn -// users that they may loss data if they are using a direct output committer. -val speculationEnabled = sqlContext.sparkContext.conf.getBoolean("spark.speculation", false) -val outputCommitterClass = jobConf.get("mapred.output.committer.class", "") -if (speculationEnabled && outputCommitterClass.contains("Direct")) { --- End diff -- Do we still need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96566290 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import scala.collection.JavaConverters._ + +import org.apache.hadoop.fs.{FileStatus, Path} +import org.apache.hadoop.hive.ql.exec.Utilities +import org.apache.hadoop.hive.ql.io.{HiveFileFormatUtils, HiveOutputFormat} +import org.apache.hadoop.hive.serde2.Serializer +import org.apache.hadoop.hive.serde2.objectinspector.{ObjectInspectorUtils, StructObjectInspector} +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.ObjectInspectorCopyOption +import org.apache.hadoop.io.Writable +import org.apache.hadoop.mapred.{JobConf, Reporter} +import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext} + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.execution.datasources.{FileFormat, OutputWriter, OutputWriterFactory} +import org.apache.spark.sql.hive.{HiveInspectors, HiveTableUtil} +import org.apache.spark.sql.hive.HiveShim.{ShimFileSinkDesc => FileSinkDesc} +import org.apache.spark.sql.types.StructType +import org.apache.spark.util.SerializableJobConf + +/** + * `FileFormat` for writing Hive tables. + * + * TODO: implement the read logic. + */ +class HiveFileFormat(fileSinkConf: FileSinkDesc) extends FileFormat { + override def inferSchema( + sparkSession: SparkSession, + options: Map[String, String], + files: Seq[FileStatus]): Option[StructType] = { +throw new UnsupportedOperationException(s"inferSchema is not supported for hive data source.") + } + + override def prepareWrite( + sparkSession: SparkSession, + job: Job, + options: Map[String, String], + dataSchema: StructType): OutputWriterFactory = { +val conf = job.getConfiguration +val tableDesc = fileSinkConf.getTableInfo +conf.set("mapred.output.format.class", tableDesc.getOutputFileFormatClassName) + +// Add table properties from storage handler to hadoopConf, so any custom storage +// handler settings can be set to hadoopConf +HiveTableUtil.configureJobPropertiesForStorageHandler(tableDesc, conf, false) +Utilities.copyTableJobPropertiesToConf(tableDesc, conf) --- End diff -- Will tableDesc be null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org