from:"yhuai"

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-17 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/21939
  
got it. Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-17 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/21939
  
@shaneknapp what was the version of pyarrow in that build? 0.8 or 0.10?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-17 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/21939
  
@BryanCutler  So, for this upgrade, even the JVM side dependency is 0.10, 
pyspark can work with any version between pyarrow 0.8 to 0.10 without problem?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22003: [SPARK-25019][BUILD] Fix orc dependency to use the same ...

2018-08-06 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/22003
  
@dongjoon-hyun  no problem. Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-25019][BUILD] Fix orc dependency to use the same exclusion rules

2018-08-06 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 51e2b38d9 -> 278984d5a


[SPARK-25019][BUILD] Fix orc dependency to use the same exclusion rules

## What changes were proposed in this pull request?

During upgrading Apache ORC to 1.5.2 
([SPARK-24576](https://issues.apache.org/jira/browse/SPARK-24576)), `sql/core` 
module overrides the exclusion rules of parent pom file and it causes published 
`spark-sql_2.1X` artifacts have incomplete exclusion rules 
([SPARK-25019](https://issues.apache.org/jira/browse/SPARK-25019)). This PR 
fixes it by moving the newly added exclusion rule to the parent pom. This also 
fixes the sbt build hack introduced at that time.

## How was this patch tested?

Pass the existing dependency check and the tests.

Author: Dongjoon Hyun 

Closes #22003 from dongjoon-hyun/SPARK-25019.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/278984d5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/278984d5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/278984d5

Branch: refs/heads/master
Commit: 278984d5a5e56136c9f940f2d0e3d2040fad180b
Parents: 51e2b38
Author: Dongjoon Hyun 
Authored: Mon Aug 6 12:00:39 2018 -0700
Committer: Yin Huai 
Committed: Mon Aug 6 12:00:39 2018 -0700

--
 pom.xml  |  4 
 sql/core/pom.xml | 28 
 2 files changed, 4 insertions(+), 28 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/278984d5/pom.xml
--
diff --git a/pom.xml b/pom.xml
index c46eb31..8abdb70 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1744,6 +1744,10 @@
 hadoop-common
   
   
+org.apache.hadoop
+hadoop-hdfs
+  
+  
 org.apache.hive
 hive-storage-api
   

http://git-wip-us.apache.org/repos/asf/spark/blob/278984d5/sql/core/pom.xml
--
diff --git a/sql/core/pom.xml b/sql/core/pom.xml
index 68b42a4..ba17f5f 100644
--- a/sql/core/pom.xml
+++ b/sql/core/pom.xml
@@ -90,39 +90,11 @@
   org.apache.orc
   orc-core
   ${orc.classifier}
-  
-
-  org.apache.hadoop
-  hadoop-hdfs
-
-
-
-  org.apache.hive
-  hive-storage-api
-
-  
 
 
   org.apache.orc
   orc-mapreduce
   ${orc.classifier}
-  
-
-  org.apache.hadoop
-  hadoop-hdfs
-
-
-
-  org.apache.hive
-  hive-storage-api
-
-  
 
 
   org.apache.parquet


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #22003: [SPARK-25019][BUILD] Fix orc dependency to use the same ...

2018-08-06 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/22003
  
lgtm. Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22003: [SPARK-25019][BUILD] Fix orc dependency to use th...

2018-08-06 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22003#discussion_r207986831
  
--- Diff: sql/core/pom.xml ---
@@ -90,39 +90,11 @@
   org.apache.orc
   orc-core
   ${orc.classifier}
-  
-
-  org.apache.hadoop
-  hadoop-hdfs
-
-
-
-  org.apache.hive
-  hive-storage-api
-
-  
 
 
   org.apache.orc
   orc-mapreduce
   ${orc.classifier}
-  
-
-  org.apache.hadoop
-  hadoop-hdfs
-
-
-
-  org.apache.hive
-  hive-storage-api
-
-  
--- End diff --

got it. Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22003: [SPARK-25019][BUILD] Fix orc dependency to use th...

2018-08-06 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22003#discussion_r207962501
  
--- Diff: sql/core/pom.xml ---
@@ -90,39 +90,11 @@
   org.apache.orc
   orc-core
   ${orc.classifier}
-  
-
-  org.apache.hadoop
-  hadoop-hdfs
-
-
-
-  org.apache.hive
-  hive-storage-api
-
-  
 
 
   org.apache.orc
   orc-mapreduce
   ${orc.classifier}
-  
-
-  org.apache.hadoop
-  hadoop-hdfs
-
-
-
-  org.apache.hive
-  hive-storage-api
-
-  
--- End diff --

Thank you. Just for me to understand it better. Do you know why defining 
exclusions in this pom file messed up the pom?

Also, how should I try it out myself? What is the right command to publish 
locally?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22003: [SPARK-25019][BUILD] Fix orc dependency to use th...

2018-08-06 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22003#discussion_r207888608
  
--- Diff: sql/core/pom.xml ---
@@ -90,39 +90,11 @@
   org.apache.orc
   orc-core
   ${orc.classifier}
-  
-
-  org.apache.hadoop
-  hadoop-hdfs
-
-
-
-  org.apache.hive
-  hive-storage-api
-
-  
 
 
   org.apache.orc
   orc-mapreduce
   ${orc.classifier}
-  
-
-  org.apache.hadoop
-  hadoop-hdfs
-
-
-
-  org.apache.hive
-  hive-storage-api
-
-  
--- End diff --

@dongjoon-hyun when we publish snapshot artifacts or releases, will the pom 
for spark sql get all of exclusions defined in the parent pom?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-24895] Remove spotbugs plugin

2018-07-24 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master d4a277f0c -> fc21f192a


[SPARK-24895] Remove spotbugs plugin

## What changes were proposed in this pull request?

Spotbugs maven plugin was a recently added plugin before 2.4.0 snapshot 
artifacts were broken.  To ensure it does not affect the maven deploy plugin, 
this change removes it.

## How was this patch tested?

Local build was ran, but this patch will be actually tested by monitoring the 
apache repo artifacts and making sure metadata is correctly uploaded after this 
job is ran: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

Author: Eric Chang 

Closes #21865 from ericfchang/SPARK-24895.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fc21f192
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fc21f192
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fc21f192

Branch: refs/heads/master
Commit: fc21f192a302e48e5c321852e2a25639c5a182b5
Parents: d4a277f
Author: Eric Chang 
Authored: Tue Jul 24 15:53:50 2018 -0700
Committer: Yin Huai 
Committed: Tue Jul 24 15:53:50 2018 -0700

--
 pom.xml | 22 --
 1 file changed, 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fc21f192/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 81a53ee..d75db0f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2610,28 +2610,6 @@
   
 
   
-  
-com.github.spotbugs
-spotbugs-maven-plugin
-3.1.3
-
-  
${basedir}/target/scala-${scala.binary.version}/classes
-  
${basedir}/target/scala-${scala.binary.version}/test-classes
-  Max
-  Low
-  true
-  FindPuzzlers
-  true
-
-
-  
-
-  check
-
-compile
-  
-
-  
 
   
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin

2018-07-24 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/21865
  
lgtm. I am merging this PR to master branch. Then, I will kick off 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin

2018-07-24 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/21865
  
cc @HyukjinKwon @kiszk 

I will merge this PR once it passes the test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

svn commit: r25324 - /dev/spark/v2.3.0-rc5-bin/ /release/spark/spark-2.3.0/

2018-02-27 Thread yhuai

Author: yhuai
Date: Wed Feb 28 07:25:53 2018
New Revision: 25324

Log:
Releasing Apache Spark 2.3.0

Added:
release/spark/spark-2.3.0/
  - copied from r25323, dev/spark/v2.3.0-rc5-bin/
Removed:
dev/spark/v2.3.0-rc5-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...

2018-02-01 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20473#discussion_r16362
  
--- Diff: python/run-tests.py ---
@@ -151,6 +151,38 @@ def parse_opts():
 return opts
 
 
+def _check_dependencies(python_exec, modules_to_test):
+if "COVERAGE_PROCESS_START" in os.environ:
+# Make sure if coverage is installed.
+try:
+subprocess_check_output(
+[python_exec, "-c", "import coverage"],
+stderr=open(os.devnull, 'w'))
+except:
+print_red("Coverage is not installed in Python executable '%s' 
"
+  "but 'COVERAGE_PROCESS_START' environment variable 
is set, "
+  "exiting." % python_exec)
+sys.exit(-1)
+
+if pyspark_sql in modules_to_test:
+# If we should test 'pyspark-sql', it checks if PyArrow and Pandas 
are installed and
+# explicitly prints out. See SPARK-23300.
+try:
+subprocess_check_output(
+[python_exec, "-c", "import pyarrow"],
+stderr=open(os.devnull, 'w'))
+except:
--- End diff --

Thank you. Appreciate it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-02-01 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r165449847
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends 
PredicateHelper {
 object PhysicalAggregation {
   // groupingExpressions, aggregateExpressions, resultExpressions, child
   type ReturnType =
-(Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression], 
LogicalPlan)
+(Seq[NamedExpression], Seq[Expression], Seq[NamedExpression], 
LogicalPlan)
--- End diff --

It will be good to try it out soon. But it is not urgent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...

2018-02-01 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20473#discussion_r165445947
  
--- Diff: python/run-tests.py ---
@@ -151,6 +151,38 @@ def parse_opts():
 return opts
 
 
+def _check_dependencies(python_exec, modules_to_test):
+if "COVERAGE_PROCESS_START" in os.environ:
+# Make sure if coverage is installed.
+try:
+subprocess_check_output(
+[python_exec, "-c", "import coverage"],
+stderr=open(os.devnull, 'w'))
+except:
+print_red("Coverage is not installed in Python executable '%s' 
"
+  "but 'COVERAGE_PROCESS_START' environment variable 
is set, "
+  "exiting." % python_exec)
+sys.exit(-1)
+
+if pyspark_sql in modules_to_test:
+# If we should test 'pyspark-sql', it checks if PyArrow and Pandas 
are installed and
+# explicitly prints out. See SPARK-23300.
+try:
+subprocess_check_output(
+[python_exec, "-c", "import pyarrow"],
+stderr=open(os.devnull, 'w'))
+except:
--- End diff --

Actually, since we are here, is it possible to do the same thing as 
https://github.com/apache/spark/blob/ec63e2d0743a4f75e1cce21d0fe2b54407a86a4a/python/pyspark/sql/tests.py#L51-L63
 and 
https://github.com/apache/spark/blob/ec63e2d0743a4f75e1cce21d0fe2b54407a86a4a/python/pyspark/sql/tests.py#L78-L84?

It will be nice to use the same logic. Otherwise, even we do not print the 
warning at here, tests may still get skipped because of the version issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...

2018-02-01 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20473#discussion_r165445232
  
--- Diff: python/run-tests.py ---
@@ -151,6 +151,38 @@ def parse_opts():
 return opts
 
 
+def _check_dependencies(python_exec, modules_to_test):
+if "COVERAGE_PROCESS_START" in os.environ:
+# Make sure if coverage is installed.
+try:
+subprocess_check_output(
+[python_exec, "-c", "import coverage"],
+stderr=open(os.devnull, 'w'))
+except:
+print_red("Coverage is not installed in Python executable '%s' 
"
+  "but 'COVERAGE_PROCESS_START' environment variable 
is set, "
+  "exiting." % python_exec)
+sys.exit(-1)
+
+if pyspark_sql in modules_to_test:
+# If we should test 'pyspark-sql', it checks if PyArrow and Pandas 
are installed and
+# explicitly prints out. See SPARK-23300.
+try:
+subprocess_check_output(
+[python_exec, "-c", "import pyarrow"],
+stderr=open(os.devnull, 'w'))
+except:
--- End diff --

How about we also explicitly mention that pyarrow/pandas related tests will 
run if they are installed?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/20465
  
So, jenkins jobs run those tests with python3? If so, I feel better because 
those tests are not completely skipped in Jenkins. If it is hard to make them 
run with python 2. Letâs have a log to explicitly show if we are going to run 
tests using pandas/pyarrow, which will help us confirm if they get exercised 
with python 3 in Jenkins or not.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-01-31 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/20465
  
@felixcheung jenkins is actually skipping those tests (see the failure of 
this pr). It makes sense to provide a way to allow developers to not run those 
tests. But, I'd prefer that we run those tests by default. So, we can make sure 
that jenkins is doing the right thing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-31 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r165253818
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4353,6 +4347,446 @@ def test_unsupported_types(self):
 df.groupby('id').apply(f).collect()
 
 
+@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not 
installed")
+class GroupbyAggPandasUDFTests(ReusedSQLTestCase):
+
+@property
+def data(self):
+from pyspark.sql.functions import array, explode, col, lit
+return self.spark.range(10).toDF('id') \
+.withColumn("vs", array([lit(i * 1.0) + col('id') for i in 
range(20, 30)])) \
+.withColumn("v", explode(col('vs'))) \
+.drop('vs') \
+.withColumn('w', lit(1.0))
+
+@property
+def python_plus_one(self):
+from pyspark.sql.functions import udf
+
+@udf('double')
+def plus_one(v):
+assert isinstance(v, (int, float))
+return v + 1
+return plus_one
+
+@property
+def pandas_scalar_plus_two(self):
+import pandas as pd
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.SCALAR)
+def plus_two(v):
+assert isinstance(v, pd.Series)
+return v + 2
+return plus_two
+
+@property
+def pandas_agg_mean_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def avg(v):
+return v.mean()
+return avg
+
+@property
+def pandas_agg_sum_udf(self):
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def sum(v):
+return v.sum()
+return sum
+
+@property
+def pandas_agg_weighted_mean_udf(self):
+import numpy as np
+from pyspark.sql.functions import pandas_udf, PandasUDFType
+
+@pandas_udf('double', PandasUDFType.GROUP_AGG)
+def weighted_mean(v, w):
+return np.average(v, weights=w)
+return weighted_mean
+
+def test_manual(self):
+df = self.data
+sum_udf = self.pandas_agg_sum_udf
+mean_udf = self.pandas_agg_mean_udf
+
+result1 = df.groupby('id').agg(sum_udf(df.v), 
mean_udf(df.v)).sort('id')
+expected1 = self.spark.createDataFrame(
+[[0, 245.0, 24.5],
+ [1, 255.0, 25.5],
+ [2, 265.0, 26.5],
+ [3, 275.0, 27.5],
+ [4, 285.0, 28.5],
+ [5, 295.0, 29.5],
+ [6, 305.0, 30.5],
+ [7, 315.0, 31.5],
+ [8, 325.0, 32.5],
+ [9, 335.0, 33.5]],
+['id', 'sum(v)', 'avg(v)'])
+
+self.assertPandasEqual(expected1.toPandas(), result1.toPandas())
+
+def test_basic(self):
+from pyspark.sql.functions import col, lit, sum, mean
+
+df = self.data
+weighted_mean_udf = self.pandas_agg_weighted_mean_udf
+
+# Groupby one column and aggregate one UDF with literal
+result1 = df.groupby('id').agg(weighted_mean_udf(df.v, 
lit(1.0))).sort('id')
+expected1 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort('id')
+self.assertPandasEqual(expected1.toPandas(), result1.toPandas())
+
+# Groupby one expression and aggregate one UDF with literal
+result2 = df.groupby((col('id') + 1)).agg(weighted_mean_udf(df.v, 
lit(1.0)))\
+.sort(df.id + 1)
+expected2 = df.groupby((col('id') + 1))\
+.agg(mean(df.v).alias('weighted_mean(v, 1.0)')).sort(df.id + 1)
+self.assertPandasEqual(expected2.toPandas(), result2.toPandas())
+
+# Groupby one column and aggregate one UDF without literal
+result3 = df.groupby('id').agg(weighted_mean_udf(df.v, 
df.w)).sort('id')
+expected3 = 
df.groupby('id').agg(mean(df.v).alias('weighted_mean(v, w)')).sort('id')
+self.assertPandasEqual(expected3.toPandas(), result3.toPandas())
+
+# Groupby one expression and aggregate one UDF without literal
+result4 = df.groupby((col('id') + 1).alias('id'))\
+.agg(weighted_mean_udf(df.v, df.w))\
+.sort('id')
+expected4 = df.groupby((col('id') + 1).alias('id'))\
+.agg(mean(df.v).alias('weighted_mean(v, w)'))\
+.sort('id')
+

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-31 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r165253514
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends 
PredicateHelper {
 object PhysicalAggregation {
   // groupingExpressions, aggregateExpressions, resultExpressions, child
   type ReturnType =
-(Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression], 
LogicalPlan)
+(Seq[NamedExpression], Seq[Expression], Seq[NamedExpression], 
LogicalPlan)
--- End diff --

I prefer that we try out using a new rule. We can create utility function 
to reuse code. Will you have a chance to try it out?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-01-31 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19872#discussion_r165220142
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends 
PredicateHelper {
 object PhysicalAggregation {
   // groupingExpressions, aggregateExpressions, resultExpressions, child
   type ReturnType =
-(Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression], 
LogicalPlan)
+(Seq[NamedExpression], Seq[Expression], Seq[NamedExpression], 
LogicalPlan)
--- End diff --

@icexelloss Thank you for this contribution! I just came across the change 
in this file. I am not sure if changing the type at here is the best option. 
The reason is that whenever we use this PhysicalAggregation rule, we have to 
check the instance type of those aggregate expressions and do casting. To me, 
it seems better to leave this rule untouched and create a new rule just for 
Python UDAF. What do you think?

(maybe you and reviewers already discussed it. If so, can you point me to 
the discussion?)

Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20037: [SPARK-22849] ivy.retrieve pattern should also co...

2018-01-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20037#discussion_r163463718
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -1271,7 +1271,7 @@ private[spark] object SparkSubmitUtils {
 // retrieve all resolved dependencies
 ivy.retrieve(rr.getModuleDescriptor.getModuleRevisionId,
   packagesDirectory.getAbsolutePath + File.separator +
-"[organization]_[artifact]-[revision].[ext]",
+"[organization]_[artifact]-[revision](-[classifier]).[ext]",
--- End diff --

I tried it today. Somehow, I only got the test jar downloaded. Have you 
guys seen this issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20110: [SPARK-22313][PYTHON][FOLLOWUP] Explicitly import warnin...

2017-12-28 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/20110
  
Thank you! Let's also check the build result to make sure 
`pyspark.streaming.tests.FlumePollingStreamTests` is indeed triggered (I hit 
this issue while running this test). 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19535: [SPARK-22313][PYTHON] Mark/print deprecation warn...

2017-12-28 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19535#discussion_r159019845
  
--- Diff: python/pyspark/streaming/flume.py ---
@@ -54,8 +54,13 @@ def createStream(ssc, hostname, port,
 :param bodyDecoder:  A function used to decode body (default is 
utf8_decoder)
 :return: A DStream object
 
-.. note:: Deprecated in 2.3.0
+.. note:: Deprecated in 2.3.0. Flume support is deprecated as of 
Spark 2.3.0.
+See SPARK-22142.
 """
+warnings.warn(
--- End diff --

thank you :) It will be good to also check why master build does not fail 
since python should complain about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19535: [SPARK-22313][PYTHON] Mark/print deprecation warn...

2017-12-28 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19535#discussion_r159013024
  
--- Diff: python/pyspark/streaming/flume.py ---
@@ -54,8 +54,13 @@ def createStream(ssc, hostname, port,
 :param bodyDecoder:  A function used to decode body (default is 
utf8_decoder)
 :return: A DStream object
 
-.. note:: Deprecated in 2.3.0
+.. note:: Deprecated in 2.3.0. Flume support is deprecated as of 
Spark 2.3.0.
+See SPARK-22142.
 """
+warnings.warn(
--- End diff --

Seems `warnings` is not imported in this file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #5604: [SPARK-1442][SQL] Window Function Support for Spar...

2017-12-19 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5604#discussion_r157933488
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.analysis.UnresolvedException
+import org.apache.spark.sql.catalyst.errors.TreeNodeException
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.{NumericType, DataType}
+
+/**
+ * The trait of the Window Specification (specified in the OVER clause or 
WINDOW clause) for
+ * Window Functions.
+ */
+sealed trait WindowSpec
+
+/**
+ * The specification for a window function.
+ * @param partitionSpec It defines the way that input rows are partitioned.
+ * @param orderSpec It defines the ordering of rows in a partition.
+ * @param frameSpecification It defines the window frame in a partition.
+ */
+case class WindowSpecDefinition(
+partitionSpec: Seq[Expression],
+orderSpec: Seq[SortOrder],
+frameSpecification: WindowFrame) extends Expression with WindowSpec {
+
+  def validate: Option[String] = frameSpecification match {
+case UnspecifiedFrame =>
+  Some("Found a UnspecifiedFrame. It should be converted to a 
SpecifiedWindowFrame " +
+"during analysis. Please file a bug report.")
+case frame: SpecifiedWindowFrame => frame.validate.orElse {
+  def checkValueBasedBoundaryForRangeFrame(): Option[String] = {
+if (orderSpec.length > 1)  {
+  // It is not allowed to have a value-based PRECEDING and 
FOLLOWING
+  // as the boundary of a Range Window Frame.
+  Some("This Range Window Frame only accepts at most one ORDER BY 
expression.")
+} else if (orderSpec.nonEmpty && 
!orderSpec.head.dataType.isInstanceOf[NumericType]) {
+  Some("The data type of the expression in the ORDER BY clause 
should be a numeric type.")
+} else {
+  None
+}
+  }
+
+  (frame.frameType, frame.frameStart, frame.frameEnd) match {
+case (RangeFrame, vp: ValuePreceding, _) => 
checkValueBasedBoundaryForRangeFrame()
+case (RangeFrame, vf: ValueFollowing, _) => 
checkValueBasedBoundaryForRangeFrame()
+case (RangeFrame, _, vp: ValuePreceding) => 
checkValueBasedBoundaryForRangeFrame()
+case (RangeFrame, _, vf: ValueFollowing) => 
checkValueBasedBoundaryForRangeFrame()
+case (_, _, _) => None
+  }
+}
+  }
+
+  type EvaluatedType = Any
+
+  override def children: Seq[Expression]  = partitionSpec ++ orderSpec
+
+  override lazy val resolved: Boolean =
+childrenResolved && 
frameSpecification.isInstanceOf[SpecifiedWindowFrame]
+
+
+  override def toString: String = simpleString
+
+  override def eval(input: Row): EvaluatedType = throw new 
UnsupportedOperationException
+  override def nullable: Boolean = true
+  override def foldable: Boolean = false
+  override def dataType: DataType = throw new UnsupportedOperationException
+}
+
+/**
+ * A Window specification reference that refers to the 
[[WindowSpecDefinition]] defined
+ * under the name `name`.
+ */
+case class WindowSpecReference(name: String) extends WindowSpec
+
+/**
+ * The trait used to represent the type of a Window Frame.
+ */
+sealed trait FrameType
+
+/**
+ * RowFrame treats rows in a partition individually. When a 
[[ValuePreceding]]
+ * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is 
considered
+ * as a physical offset.
+ * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 
3-row frame,
+ *

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/19448
  
Thank you :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/19448
  
I am not really worried about this particular change. It's already merged 
and it seems a small and safe change. I am not planning to revert it.

But, in general, let's avoid of merging changes that are not bug fixes to a 
maintenance branch. If there is an exception, it will be better to make it 
clear earlier. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-13 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/19448
  
@HyukjinKwon branch-2.2 is in a maintenance branch, I am not sure it is 
appropriate to merge this change to branch-2.2 since it is not really a bug 
fix. If the doc is not accurate, we should fix the doc. For a maintenance 
branch, we need to be very careful on what we merge and we should always avoid 
of unnecessary changes. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...

2017-09-29 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/19149
  
Can we add a test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19080: [SPARK-21865][SQL] simplify the distribution sema...

2017-08-30 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/19080#discussion_r136214689
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -30,18 +30,43 @@ import org.apache.spark.sql.types.{DataType, 
IntegerType}
  *  - Intra-partition ordering of data: In this case the distribution 
describes guarantees made
  *about how tuples are distributed within a single partition.
  */
-sealed trait Distribution
+sealed trait Distribution {
+  /**
+   * The required number of partitions for this distribution. If it's 
None, then any number of
+   * partitions is allowed for this distribution.
+   */
+  def requiredNumPartitions: Option[Int]
+
+  /**
+   * Creates a default partitioning for this distribution, which can 
satisfy this distribution while
+   * matching the given number of partitions.
+   */
+  def createPartitioning(numPartitions: Int): Partitioning
+}
 
 /**
  * Represents a distribution where no promises are made about co-location 
of data.
  */
-case object UnspecifiedDistribution extends Distribution
+case object UnspecifiedDistribution extends Distribution {
+  override def requiredNumPartitions: Option[Int] = None
+
+  override def createPartitioning(numPartitions: Int): Partitioning = {
+throw new IllegalStateException("UnspecifiedDistribution does not have 
default partitioning.")
+  }
+}
 
 /**
  * Represents a distribution that only has a single partition and all 
tuples of the dataset
  * are co-located.
  */
-case object AllTuples extends Distribution
+case object AllTuples extends Distribution {
--- End diff --

I'd like to keep `AllTuples`. `SingleNodeDistribution` is a special case of 
`AllTuples` and seems we do not really need the extra information introduced by 
`SingleNode`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19080: [SPARK-21865][SQL] simplify the distribution semantic of...

2017-08-30 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/19080
  
Have a question after reading the new approach. Let's say that we have a 
join like `T1 JOIN T2 on T1.a = T2.a`. Also `T1` is hash partitioned by the 
value of `T1.a` and it has 10 partitions, and `T2` is range partitioned by the 
value of `T2.a` and it has 10 partitions. Both sides will satisfy the required 
distribution of the join. However, we need to add an exchange at either side in 
order to produce the correct result. How will we handle this case with this 
change?

Also, regarding
> For multiple children, Spark only guarantees they have the same number of 
partitions, and it's the operator's responsibility to leverage this guarantee 
to achieve more complicated requirements. 

Can you give a concrete example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[3/3] spark-website git commit: Add the news about spark-summit-eu-2017 agenda

2017-08-28 Thread yhuai

Add the news about spark-summit-eu-2017 agenda


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/35eb1471
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/35eb1471
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/35eb1471

Branch: refs/heads/asf-site
Commit: 35eb1471704a97c18e96b46f2495a7117565466d
Parents: cca972e
Author: Yin Huai 
Authored: Mon Aug 28 22:40:10 2017 +
Committer: Yin Huai 
Committed: Mon Aug 28 15:54:26 2017 -0700

--
 ...-08-28-spark-summit-eu-2017-agenda-posted.md |  17 ++
 site/committers.html|   6 +-
 site/community.html |   6 +-
 site/contributing.html  |   6 +-
 site/developer-tools.html   |   6 +-
 site/documentation.html |   6 +-
 site/downloads.html |   6 +-
 site/examples.html  |   6 +-
 site/faq.html   |   6 +-
 site/graphx/index.html  |   6 +-
 site/improvement-proposals.html |   6 +-
 site/index.html |   6 +-
 site/mailing-lists.html |   6 +-
 site/mllib/index.html   |   6 +-
 site/news/amp-camp-2013-registration-ope.html   |   6 +-
 .../news/announcing-the-first-spark-summit.html |   6 +-
 .../news/fourth-spark-screencast-published.html |   6 +-
 site/news/index.html|  16 +-
 site/news/nsdi-paper.html   |   6 +-
 site/news/one-month-to-spark-summit-2015.html   |   6 +-
 .../proposals-open-for-spark-summit-east.html   |   6 +-
 ...registration-open-for-spark-summit-east.html |   6 +-
 .../news/run-spark-and-shark-on-amazon-emr.html |   6 +-
 site/news/spark-0-6-1-and-0-5-2-released.html   |   6 +-
 site/news/spark-0-6-2-released.html |   6 +-
 site/news/spark-0-7-0-released.html |   6 +-
 site/news/spark-0-7-2-released.html |   6 +-
 site/news/spark-0-7-3-released.html |   6 +-
 site/news/spark-0-8-0-released.html |   6 +-
 site/news/spark-0-8-1-released.html |   6 +-
 site/news/spark-0-9-0-released.html |   6 +-
 site/news/spark-0-9-1-released.html |   6 +-
 site/news/spark-0-9-2-released.html |   6 +-
 site/news/spark-1-0-0-released.html |   6 +-
 site/news/spark-1-0-1-released.html |   6 +-
 site/news/spark-1-0-2-released.html |   6 +-
 site/news/spark-1-1-0-released.html |   6 +-
 site/news/spark-1-1-1-released.html |   6 +-
 site/news/spark-1-2-0-released.html |   6 +-
 site/news/spark-1-2-1-released.html |   6 +-
 site/news/spark-1-2-2-released.html |   6 +-
 site/news/spark-1-3-0-released.html |   6 +-
 site/news/spark-1-4-0-released.html |   6 +-
 site/news/spark-1-4-1-released.html |   6 +-
 site/news/spark-1-5-0-released.html |   6 +-
 site/news/spark-1-5-1-released.html |   6 +-
 site/news/spark-1-5-2-released.html |   6 +-
 site/news/spark-1-6-0-released.html |   6 +-
 site/news/spark-1-6-1-released.html |   6 +-
 site/news/spark-1-6-2-released.html |   6 +-
 site/news/spark-1-6-3-released.html |   6 +-
 site/news/spark-2-0-0-released.html |   6 +-
 site/news/spark-2-0-1-released.html |   6 +-
 site/news/spark-2-0-2-released.html |   6 +-
 site/news/spark-2-1-0-released.html |   6 +-
 site/news/spark-2-1-1-released.html |   6 +-
 site/news/spark-2-2-0-released.html |   6 +-
 site/news/spark-2.0.0-preview.html  |   6 +-
 .../spark-accepted-into-apache-incubator.html   |   6 +-
 site/news/spark-and-shark-in-the-news.html  |   6 +-
 site/news/spark-becomes-tlp.html|   6 +-
 site/news/spark-featured-in-wired.html  |   6 +-
 .../spark-mailing-lists-moving-to-apache.html   |   6 +-
 site/news/spark-meetups.html|   6 +-
 site/news/spark-screencasts-published.html  |   6 +-
 site/news/spark-summit-2013-is-a-wrap.html  |   6 +-
 site/news/spark-summit-2014-videos-posted.html  |   6 +-
 site/news/spark-summit-2015-videos-posted.html  |   6 +-
 site/news/spark-summit-agenda-posted.html   |   6 +-
 .../spark-summit-east-2015-videos-posted.html   |   6 +-
 .../spark-summit-east-2016-cfp-closing.html |   6 +-
 .../spark-summit-east-2017-agenda-posted.html   |   6 +-
 site/news/spark-summit-east-agenda-posted.html  |   6 +-
 .../spark-summit-eu-2017-agenda-posted.html | 223 +++

[1/3] spark-website git commit: Add the news about spark-summit-eu-2017 agenda

2017-08-28 Thread yhuai

Repository: spark-website
Updated Branches:
  refs/heads/asf-site cca972e7f -> 35eb14717


http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-3-0.html
--
diff --git a/site/releases/spark-release-1-3-0.html 
b/site/releases/spark-release-1-3-0.html
index 10d934b..5e4d302 100644
--- a/site/releases/spark-release-1-3-0.html
+++ b/site/releases/spark-release-1-3-0.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-3-1.html
--
diff --git a/site/releases/spark-release-1-3-1.html 
b/site/releases/spark-release-1-3-1.html
index 7df8028..116898f 100644
--- a/site/releases/spark-release-1-3-1.html
+++ b/site/releases/spark-release-1-3-1.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-4-0.html
--
diff --git a/site/releases/spark-release-1-4-0.html 
b/site/releases/spark-release-1-4-0.html
index 143cc17..b75a496 100644
--- a/site/releases/spark-release-1-4-0.html
+++ b/site/releases/spark-release-1-4-0.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-4-1.html
--
diff --git a/site/releases/spark-release-1-4-1.html 
b/site/releases/spark-release-1-4-1.html
index ccdd161..30b92fd 100644
--- a/site/releases/spark-release-1-4-1.html
+++ b/site/releases/spark-release-1-4-1.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-5-0.html
--
diff --git a/site/releases/spark-release-1-5-0.html 
b/site/releases/spark-release-1-5-0.html
index f73ab5d..6e1411d 100644
--- a/site/releases/spark-release-1-5-0.html
+++ b/site/releases/spark-release-1-5-0.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/releases/spark-release-1-5-1.html
--
diff --git a/site/releases/spark-release-1-5-1.html 
b/site/releases/spark-release-1-5-1.html
index 3af892e..b447dd7 100644
--- a/site/releases/spark-release-1-5-1.html
+++

[2/3] spark-website git commit: Add the news about spark-summit-eu-2017 agenda

2017-08-28 Thread yhuai

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-accepted-into-apache-incubator.html
--
diff --git a/site/news/spark-accepted-into-apache-incubator.html 
b/site/news/spark-accepted-into-apache-incubator.html
index 62638f2..a4a913f 100644
--- a/site/news/spark-accepted-into-apache-incubator.html
+++ b/site/news/spark-accepted-into-apache-incubator.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-and-shark-in-the-news.html
--
diff --git a/site/news/spark-and-shark-in-the-news.html 
b/site/news/spark-and-shark-in-the-news.html
index 4a0c4fc..55d2ade 100644
--- a/site/news/spark-and-shark-in-the-news.html
+++ b/site/news/spark-and-shark-in-the-news.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-becomes-tlp.html
--
diff --git a/site/news/spark-becomes-tlp.html b/site/news/spark-becomes-tlp.html
index 6c76d20..0f17857 100644
--- a/site/news/spark-becomes-tlp.html
+++ b/site/news/spark-becomes-tlp.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-featured-in-wired.html
--
diff --git a/site/news/spark-featured-in-wired.html 
b/site/news/spark-featured-in-wired.html
index 1d35e40..1c0b69a 100644
--- a/site/news/spark-featured-in-wired.html
+++ b/site/news/spark-featured-in-wired.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-mailing-lists-moving-to-apache.html
--
diff --git a/site/news/spark-mailing-lists-moving-to-apache.html 
b/site/news/spark-mailing-lists-moving-to-apache.html
index b586b65..4e12162 100644
--- a/site/news/spark-mailing-lists-moving-to-apache.html
+++ b/site/news/spark-mailing-lists-moving-to-apache.html
@@ -161,6 +161,9 @@
   Latest News
   
 
+  Spark 
Summit Europe (October 24-26th, 2017, Dublin, Ireland) agenda posted
+  (Aug 28, 2017)
+
   Spark 2.2.0 
released
   (Jul 11, 2017)
 
@@ -170,9 +173,6 @@
   Spark 
Summit (June 5-7th, 2017, San Francisco) agenda posted
   (Mar 31, 2017)
 
-  Spark 
Summit East (Feb 7-9th, 2017, Boston) agenda posted
-  (Jan 04, 2017)
-
   
   Archive
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/35eb1471/site/news/spark-meetups.html
--
diff --git a/site/news/spark-meetups.html b/site/news/spark-meetups.html
index 4de6525..92da537 100644
--- a/site/news/spark-meetups.html
+++ b/site/news/spark-meetups.html
@@ -161,6

[GitHub] spark issue #18944: [SPARK-21732][SQL]Lazily init hive metastore client

2017-08-14 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/18944
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-21111][TEST][2.2] Fix the test failure of describe.sql

2017-06-15 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 76ee41fd7 -> a585c870a


[SPARK-2][TEST][2.2] Fix the test failure of describe.sql

## What changes were proposed in this pull request?
Test failed in `describe.sql`.

We need to fix the related bug introduced in 
(https://github.com/apache/spark/pull/17649) in the follow-up PR to master.

## How was this patch tested?
N/A

Author: gatorsmile 

Closes #18316 from gatorsmile/fix.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a585c870
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a585c870
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a585c870

Branch: refs/heads/branch-2.2
Commit: a585c870a066fa94d97462cefbaa4057a7a0ed44
Parents: 76ee41f
Author: gatorsmile 
Authored: Thu Jun 15 18:25:39 2017 -0700
Committer: Yin Huai 
Committed: Thu Jun 15 18:25:39 2017 -0700

--
 sql/core/src/test/resources/sql-tests/results/describe.sql.out | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a585c870/sql/core/src/test/resources/sql-tests/results/describe.sql.out
--
diff --git a/sql/core/src/test/resources/sql-tests/results/describe.sql.out 
b/sql/core/src/test/resources/sql-tests/results/describe.sql.out
index 329532c..ab9f278 100644
--- a/sql/core/src/test/resources/sql-tests/results/describe.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/describe.sql.out
@@ -127,6 +127,7 @@ Providerparquet
 Num Buckets2   
 Bucket Columns [`a`]   
 Sort Columns   [`b`]   
+Commenttable_comment   
 Table Properties   [e=3]   
 Location [not included in comparison]sql/core/spark-warehouse/t

 Storage Properties [a=1, b=2]  
@@ -157,6 +158,7 @@ Providerparquet
 Num Buckets2   
 Bucket Columns [`a`]   
 Sort Columns   [`b`]   
+Commenttable_comment   
 Table Properties   [e=3]   
 Location [not included in comparison]sql/core/spark-warehouse/t

 Storage Properties [a=1, b=2]  


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #18316: [SPARK-21111] [TEST] [2.2] Fix the test failure of descr...

2017-06-15 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/18316
  
Thanks! I have merged this pr to branch-2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18316: [SPARK-21111] [TEST] [2.2] Fix the test failure of descr...

2017-06-15 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/18316
  
thanks! merging to branch-2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18316: [SPARK-21111] [TEST] [2.2] Fix the test failure of descr...

2017-06-15 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/18316
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-06-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/18064
  
My suggestion was about getting changes on the interfaces of 
ExecutedCommandExec and SaveIntoDataSourceCommand to separate prs. It will help 
code review (both speed and quality).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18148: [SPARK-20926][SQL] Removing exposures to guava library c...

2017-06-07 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/18148
  
@vanzin Seems merging to branch-2.2 was an accident? Since it is not really 
a bug fix, should we revert it from branch-2.2 and just keep it in the master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-06-07 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/18064
  
I just case across this pr. I have one general feedback. It will be great 
if we can make a pr have a single purpose. This pr contains different kinds of 
changes in order to fix the UI. If refactoring is needed, I'd recommend to have 
separate PR for refactoring purposes. Then, use a different PR to do the actual 
fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: Revert "[SPARK-20946][SQL] simplify the config setting logic in SparkSession.getOrCreate"

2017-06-02 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 6c628e75e -> b560c975b


Revert "[SPARK-20946][SQL] simplify the config setting logic in 
SparkSession.getOrCreate"

This reverts commit e11d90bf8deb553fd41b8837e3856c11486c2503.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b560c975
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b560c975
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b560c975

Branch: refs/heads/branch-2.2
Commit: b560c975b7cdc8828fc9e27cbca740c5e550b9cd
Parents: 6c628e7
Author: Yin Huai 
Authored: Fri Jun 2 15:36:21 2017 -0700
Committer: Yin Huai 
Committed: Fri Jun 2 15:37:38 2017 -0700

--
 .../spark/ml/recommendation/ALSSuite.scala  |  4 +++-
 .../apache/spark/ml/tree/impl/TreeTests.scala   |  2 ++
 .../org/apache/spark/sql/SparkSession.scala | 25 +---
 3 files changed, 21 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b560c975/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala
index 23f2256..701040f 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala
@@ -820,13 +820,15 @@ class ALSCleanerSuite extends SparkFunSuite {
   FileUtils.listFiles(localDir, TrueFileFilter.INSTANCE, 
TrueFileFilter.INSTANCE).asScala.toSet
 try {
   conf.set("spark.local.dir", localDir.getAbsolutePath)
-  val sc = new SparkContext("local[2]", "ALSCleanerSuite", conf)
+  val sc = new SparkContext("local[2]", "test", conf)
   try {
 sc.setCheckpointDir(checkpointDir.getAbsolutePath)
 // Generate test data
 val (training, _) = ALSSuite.genImplicitTestData(sc, 20, 5, 1, 0.2, 0)
 // Implicitly test the cleaning of parents during ALS training
 val spark = SparkSession.builder
+  .master("local[2]")
+  .appName("ALSCleanerSuite")
   .sparkContext(sc)
   .getOrCreate()
 import spark.implicits._

http://git-wip-us.apache.org/repos/asf/spark/blob/b560c975/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala
--
diff --git a/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala 
b/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala
index b6894b3..92a2369 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala
@@ -43,6 +43,8 @@ private[ml] object TreeTests extends SparkFunSuite {
   categoricalFeatures: Map[Int, Int],
   numClasses: Int): DataFrame = {
 val spark = SparkSession.builder()
+  .master("local[2]")
+  .appName("TreeTests")
   .sparkContext(data.sparkContext)
   .getOrCreate()
 import spark.implicits._

http://git-wip-us.apache.org/repos/asf/spark/blob/b560c975/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index bf37b76..d2bf350 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -757,8 +757,6 @@ object SparkSession {
 
 private[this] var userSuppliedContext: Option[SparkContext] = None
 
-// The `SparkConf` inside the given `SparkContext` may get changed if you 
specify some options
-// for this builder.
 private[spark] def sparkContext(sparkContext: SparkContext): Builder = 
synchronized {
   userSuppliedContext = Option(sparkContext)
   this
@@ -856,7 +854,7 @@ object SparkSession {
  *
  * @since 2.2.0
  */
-def withExtensions(f: SparkSessionExtensions => Unit): Builder = 
synchronized {
+def withExtensions(f: SparkSessionExtensions => Unit): Builder = {
   f(extensions)
   this
 }
@@ -901,14 +899,22 @@ object SparkSession {
 
 // No active nor global default session. Create a new one.
 val sparkContext = userSuppliedContext.getOrElse {
+  // set app name if not given
+  val randomAppName = java.util.UUID.randomUUID().toString
   val sparkConf = new SparkConf()
-  options.get("spark.master").foreach(sparkConf.setMaster)
-  // set a random app

[GitHub] spark issue #18172: [SPARK-20946][SQL] simplify the config setting logic in ...

2017-06-02 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/18172
  
Reverting this because it breaks repl tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17617: [SPARK-20244][Core] Handle incorrect bytesRead me...

2017-06-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/17617#discussion_r119938185
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -143,14 +144,29 @@ class SparkHadoopUtil extends Logging {
* Returns a function that can be called to find Hadoop FileSystem bytes 
read. If
* getFSBytesReadOnThreadCallback is called from thread r at time t, the 
returned callback will
* return the bytes read on r since t.
-   *
-   * @return None if the required method can't be found.
--- End diff --

Why removing this line instead of the doc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17763: [SPARK-13747][Core]Add ThreadUtils.awaitReady and disall...

2017-05-17 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17763
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...

2017-05-09 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17666
  
I have reverted this change from both master and branch-2.2. I have 
reopened the jira.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: Revert "[SPARK-20311][SQL] Support aliases for table value functions"

2017-05-09 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 9e8d23b3a -> d191b962d


Revert "[SPARK-20311][SQL] Support aliases for table value functions"

This reverts commit 714811d0b5bcb5d47c39782ff74f898d276ecc59.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d191b962
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d191b962
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d191b962

Branch: refs/heads/branch-2.2
Commit: d191b962dc81c015fa92a38d882a8c7ea620ef06
Parents: 9e8d23b
Author: Yin Huai 
Authored: Tue May 9 14:47:45 2017 -0700
Committer: Yin Huai 
Committed: Tue May 9 14:49:02 2017 -0700

--
 .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 20 ++
 .../analysis/ResolveTableValuedFunctions.scala  | 22 +++-
 .../sql/catalyst/analysis/unresolved.scala  | 10 ++---
 .../spark/sql/catalyst/parser/AstBuilder.scala  | 17 ---
 .../sql/catalyst/analysis/AnalysisSuite.scala   | 14 +
 .../sql/catalyst/parser/PlanParserSuite.scala   | 13 +---
 6 files changed, 17 insertions(+), 79 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d191b962/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 15e4dd4..1ecb3d1 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -472,23 +472,15 @@ identifierComment
 ;
 
 relationPrimary
-: tableIdentifier sample? (AS? strictIdentifier)?  #tableName
-| '(' queryNoWith ')' sample? (AS? strictIdentifier)?  #aliasedQuery
-| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation
-| inlineTable  #inlineTableDefault2
-| functionTable#tableValuedFunction
+: tableIdentifier sample? (AS? strictIdentifier)?   #tableName
+| '(' queryNoWith ')' sample? (AS? strictIdentifier)?   
#aliasedQuery
+| '(' relation ')' sample? (AS? strictIdentifier)?  
#aliasedRelation
+| inlineTable   
#inlineTableDefault2
+| identifier '(' (expression (',' expression)*)? ')'
#tableValuedFunction
 ;
 
 inlineTable
-: VALUES expression (',' expression)*  tableAlias
-;
-
-functionTable
-: identifier '(' (expression (',' expression)*)? ')' tableAlias
-;
-
-tableAlias
-: (AS? identifier identifierList?)?
+: VALUES expression (',' expression)*  (AS? identifier identifierList?)?
 ;
 
 rowFormat

http://git-wip-us.apache.org/repos/asf/spark/blob/d191b962/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
index dad1340..de6de24 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
@@ -19,8 +19,8 @@ package org.apache.spark.sql.catalyst.analysis
 
 import java.util.Locale
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Expression}
-import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, 
Range}
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Range}
 import org.apache.spark.sql.catalyst.rules._
 import org.apache.spark.sql.types.{DataType, IntegerType, LongType}
 
@@ -105,7 +105,7 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
 case u: UnresolvedTableValuedFunction if u.functionArgs.forall(_.resolved) 
=>
-  val resolvedFunc = 
builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match {
+  builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match {
 case Some(tvf) =>
   val resolved = tvf.flatMap { case (argList, resolver) =>
 argList.implicitCast(u.functionArgs) match {
@@ -125,21 +125,5 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan]

spark git commit: Revert "[SPARK-20311][SQL] Support aliases for table value functions"

2017-05-09 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master ac1ab6b9d -> f79aa285c


Revert "[SPARK-20311][SQL] Support aliases for table value functions"

This reverts commit 714811d0b5bcb5d47c39782ff74f898d276ecc59.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f79aa285
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f79aa285
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f79aa285

Branch: refs/heads/master
Commit: f79aa285cf115963ba06a9cacb3dbd7e3cbf7728
Parents: ac1ab6b
Author: Yin Huai 
Authored: Tue May 9 14:47:45 2017 -0700
Committer: Yin Huai 
Committed: Tue May 9 14:47:45 2017 -0700

--
 .../apache/spark/sql/catalyst/parser/SqlBase.g4 | 20 ++
 .../analysis/ResolveTableValuedFunctions.scala  | 22 +++-
 .../sql/catalyst/analysis/unresolved.scala  | 10 ++---
 .../spark/sql/catalyst/parser/AstBuilder.scala  | 17 ---
 .../sql/catalyst/analysis/AnalysisSuite.scala   | 14 +
 .../sql/catalyst/parser/PlanParserSuite.scala   | 13 +---
 6 files changed, 17 insertions(+), 79 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f79aa285/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
--
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 41daf58..14c511f 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -472,23 +472,15 @@ identifierComment
 ;
 
 relationPrimary
-: tableIdentifier sample? (AS? strictIdentifier)?  #tableName
-| '(' queryNoWith ')' sample? (AS? strictIdentifier)?  #aliasedQuery
-| '(' relation ')' sample? (AS? strictIdentifier)? #aliasedRelation
-| inlineTable  #inlineTableDefault2
-| functionTable#tableValuedFunction
+: tableIdentifier sample? (AS? strictIdentifier)?   #tableName
+| '(' queryNoWith ')' sample? (AS? strictIdentifier)?   
#aliasedQuery
+| '(' relation ')' sample? (AS? strictIdentifier)?  
#aliasedRelation
+| inlineTable   
#inlineTableDefault2
+| identifier '(' (expression (',' expression)*)? ')'
#tableValuedFunction
 ;
 
 inlineTable
-: VALUES expression (',' expression)*  tableAlias
-;
-
-functionTable
-: identifier '(' (expression (',' expression)*)? ')' tableAlias
-;
-
-tableAlias
-: (AS? identifier identifierList?)?
+: VALUES expression (',' expression)*  (AS? identifier identifierList?)?
 ;
 
 rowFormat

http://git-wip-us.apache.org/repos/asf/spark/blob/f79aa285/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
index dad1340..de6de24 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
@@ -19,8 +19,8 @@ package org.apache.spark.sql.catalyst.analysis
 
 import java.util.Locale
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Expression}
-import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Project, 
Range}
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Range}
 import org.apache.spark.sql.catalyst.rules._
 import org.apache.spark.sql.types.{DataType, IntegerType, LongType}
 
@@ -105,7 +105,7 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
 case u: UnresolvedTableValuedFunction if u.functionArgs.forall(_.resolved) 
=>
-  val resolvedFunc = 
builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match {
+  builtinFunctions.get(u.functionName.toLowerCase(Locale.ROOT)) match {
 case Some(tvf) =>
   val resolved = tvf.flatMap { case (argList, resolver) =>
 argList.implicitCast(u.functionArgs) match {
@@ -125,21 +125,5 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {

[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...

2017-05-09 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17666
  
I am going to revert this PR from master and branch-2.2. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...

2017-05-09 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17666
  
@maropu Sorry. I think this PR introduces a regression.

```
scala> spark.sql("select * from range(1, 10) cross join range(1, 
10)").explain
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Detected cartesian product for 
INNER join between logical plans
Range (1, 10, step=1, splits=None)
and
Range (1, 10, step=1, splits=None)
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these 
relations.;
```

I think we are taking the cross as the alias. 

I reverted your change locally and the query worked. I am attaching the 
expected analyzed plan below.
```
scala> spark.sql("select * from range(1, 10) cross join range(1, 
10)").queryExecution.analyzed
res1: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Project [id#8L, id#9L]
+- Join Cross
   :- Range (1, 10, step=1, splits=None)
   +- Range (1, 10, step=1, splits=None)

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17905
  
i see. I think 
https://github.com/apache/spark/pull/17905/commits/d4c1a9db25ee7386f7b12e4dabb54210a9892510
 is good. How about we get it checked in first (after jenkins passes)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17905
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17905
  
@falaki's PR did not actually trigger that test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17905: [SPARK-20661][SPARKR][TEST][FOLLOWUP] SparkR tableNames(...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17905
  
@felixcheung  you are right. That is the problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17903
  
I do not think https://github.com/apache/spark/pull/17649 caused the 
problem. I saw failures without that internally. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails

2017-05-08 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 23681e9ca -> 4179ffc03


[SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails

## What changes were proposed in this pull request?
Cleaning existing temp tables before running tableNames tests

## How was this patch tested?
SparkR Unit tests

Author: Hossein 

Closes #17903 from falaki/SPARK-20661.

(cherry picked from commit 2abfee18b6511482b916c36f00bf3abf68a59e19)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4179ffc0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4179ffc0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4179ffc0

Branch: refs/heads/branch-2.2
Commit: 4179ffc031a0dbca6a93255c673de800ce7393fe
Parents: 23681e9
Author: Hossein 
Authored: Mon May 8 14:48:11 2017 -0700
Committer: Yin Huai 
Committed: Mon May 8 14:48:29 2017 -0700

--
 R/pkg/inst/tests/testthat/test_sparkSQL.R | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4179ffc0/R/pkg/inst/tests/testthat/test_sparkSQL.R
--
diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R 
b/R/pkg/inst/tests/testthat/test_sparkSQL.R
index 3f445e2..58cd259 100644
--- a/R/pkg/inst/tests/testthat/test_sparkSQL.R
+++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R
@@ -668,6 +668,8 @@ test_that("jsonRDD() on a RDD with json string", {
 })
 
 test_that("test tableNames and tables", {
+  # Making sure there are no registered temp tables from previous tests
+  suppressWarnings(sapply(tableNames(), function(tname) { dropTempTable(tname) 
}))
   df <- read.json(jsonPath)
   createOrReplaceTempView(df, "table1")
   expect_equal(length(tableNames()), 1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails

2017-05-08 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 829cd7b8b -> 2abfee18b


[SPARK-20661][SPARKR][TEST] SparkR tableNames() test fails

## What changes were proposed in this pull request?
Cleaning existing temp tables before running tableNames tests

## How was this patch tested?
SparkR Unit tests

Author: Hossein 

Closes #17903 from falaki/SPARK-20661.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2abfee18
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2abfee18
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2abfee18

Branch: refs/heads/master
Commit: 2abfee18b6511482b916c36f00bf3abf68a59e19
Parents: 829cd7b
Author: Hossein 
Authored: Mon May 8 14:48:11 2017 -0700
Committer: Yin Huai 
Committed: Mon May 8 14:48:11 2017 -0700

--
 R/pkg/inst/tests/testthat/test_sparkSQL.R | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2abfee18/R/pkg/inst/tests/testthat/test_sparkSQL.R
--
diff --git a/R/pkg/inst/tests/testthat/test_sparkSQL.R 
b/R/pkg/inst/tests/testthat/test_sparkSQL.R
index f517ce6..ab6888e 100644
--- a/R/pkg/inst/tests/testthat/test_sparkSQL.R
+++ b/R/pkg/inst/tests/testthat/test_sparkSQL.R
@@ -677,6 +677,8 @@ test_that("jsonRDD() on a RDD with json string", {
 })
 
 test_that("test tableNames and tables", {
+  # Making sure there are no registered temp tables from previous tests
+  suppressWarnings(sapply(tableNames(), function(tname) { dropTempTable(tname) 
}))
   df <- read.json(jsonPath)
   createOrReplaceTempView(df, "table1")
   expect_equal(length(tableNames()), 1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17903
  
Thanks @falaki. Merging to master and branch-2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17903
  
Seems 2.2 build is fine. But, I'd like to get this merged in branch-2.2 
since this test will fail if any previous tests leak tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17903: [SPARK-20661][SparkR][Test] SparkR tableNames() test fai...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17903
  
@felixcheung fyi. I think the main problem of this test is that it will be 
broken if tests executed before this one leak any table. I think this change 
makes sense. I will merge it once it passes jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17892: [SPARK-20626][SPARKR] address date test warning with tim...

2017-05-08 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17892
  
@felixcheung Seems master build is broken because R tests are broken 
(https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/2844/console).
 I am not sure if this PR caused that. Can you help to take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17746: [SPARK-20449][ML] Upgrade breeze version to 0.13.1

2017-05-01 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17746
  
@dbtsai Thanks for the explanation and the context :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17746: [SPARK-20449][ML] Upgrade breeze version to 0.13.1

2017-05-01 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17746
  
Can I ask how we decided merging this dependency change after the cut of 
the release branch (especially this change affects user code)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17659: [SPARK-20358] [core] Executors failing stage on interrup...

2017-04-20 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17659
  
lgtm. Merging to master and branch-2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-20217][CORE] Executor should not fail stage if killed task throws non-interrupted exception

2017-04-05 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 4000f128b -> 5142e5d4e


[SPARK-20217][CORE] Executor should not fail stage if killed task throws 
non-interrupted exception

## What changes were proposed in this pull request?

If tasks throw non-interrupted exceptions on kill (e.g. 
java.nio.channels.ClosedByInterruptException), their death is reported back as 
TaskFailed instead of TaskKilled. This causes stage failure in some cases.

This is reproducible as follows. Run the following, and then use 
SparkContext.killTaskAttempt to kill one of the tasks. The entire stage will 
fail since we threw a RuntimeException instead of InterruptedException.

```
spark.range(100).repartition(100).foreach { i =>
  try {
Thread.sleep(1000)
  } catch {
case t: InterruptedException =>
  throw new RuntimeException(t)
  }
}
```
Based on the code in TaskSetManager, I think this also affects kills of 
speculative tasks. However, since the number of speculated tasks is few, and 
usually you need to fail a task a few times before the stage is cancelled, it 
unlikely this would be noticed in production unless both speculation was 
enabled and the num allowed task failures was = 1.

We should probably unconditionally return TaskKilled instead of TaskFailed if 
the task was killed by the driver, regardless of the actual exception thrown.

## How was this patch tested?

Unit test. The test fails before the change in Executor.scala

cc JoshRosen

Author: Eric Liang 

Closes #17531 from ericl/fix-task-interrupt.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5142e5d4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5142e5d4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5142e5d4

Branch: refs/heads/master
Commit: 5142e5d4e09c7cb36cf1d792934a21c5305c6d42
Parents: 4000f12
Author: Eric Liang 
Authored: Wed Apr 5 19:37:21 2017 -0700
Committer: Yin Huai 
Committed: Wed Apr 5 19:37:21 2017 -0700

--
 core/src/main/scala/org/apache/spark/executor/Executor.scala | 2 +-
 core/src/test/scala/org/apache/spark/SparkContextSuite.scala | 8 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5142e5d4/core/src/main/scala/org/apache/spark/executor/Executor.scala
--
diff --git a/core/src/main/scala/org/apache/spark/executor/Executor.scala 
b/core/src/main/scala/org/apache/spark/executor/Executor.scala
index 99b1608..83469c5 100644
--- a/core/src/main/scala/org/apache/spark/executor/Executor.scala
+++ b/core/src/main/scala/org/apache/spark/executor/Executor.scala
@@ -432,7 +432,7 @@ private[spark] class Executor(
   setTaskFinishedAndClearInterruptStatus()
   execBackend.statusUpdate(taskId, TaskState.KILLED, 
ser.serialize(TaskKilled(t.reason)))
 
-case _: InterruptedException if task.reasonIfKilled.isDefined =>
+case NonFatal(_) if task != null && task.reasonIfKilled.isDefined =>
   val killReason = task.reasonIfKilled.getOrElse("unknown reason")
   logInfo(s"Executor interrupted and killed $taskName (TID $taskId), 
reason: $killReason")
   setTaskFinishedAndClearInterruptStatus()

http://git-wip-us.apache.org/repos/asf/spark/blob/5142e5d4/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
--
diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
index 2c94755..735f445 100644
--- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
@@ -572,7 +572,13 @@ class SparkContextSuite extends SparkFunSuite with 
LocalSparkContext with Eventu
 // first attempt will hang
 if (!SparkContextSuite.isTaskStarted) {
   SparkContextSuite.isTaskStarted = true
-  Thread.sleep(999)
+  try {
+Thread.sleep(999)
+  } catch {
+case t: Throwable =>
+  // SPARK-20217 should not fail stage if task throws 
non-interrupted exception
+  throw new RuntimeException("killed")
+  }
 }
 // second attempt succeeds immediately
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17531
  
Thanks. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17531
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17423: [SPARK-20088] Do not create new SparkContext in SparkR c...

2017-03-26 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17423
  
got it. Thanks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17423: [SPARK-20088] Do not create new SparkContext in SparkR c...

2017-03-25 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17423
  
@felixcheung `SparkContext.getOrCreate` is the preferred way to create a 
SparkContext. So, even we have check, it is still better to use `getOrCreate`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-19620][SQL] Fix incorrect exchange coordinator id in the physical plan

2017-03-10 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master fcb68e0f5 -> dd9049e04


[SPARK-19620][SQL] Fix incorrect exchange coordinator id in the physical plan

## What changes were proposed in this pull request?
When adaptive execution is enabled, an exchange coordinator is used in the 
Exchange operators. For Join, the same exchange coordinator is used for its two 
Exchanges. But the physical plan shows two different coordinator Ids which is 
confusing.

This PR is to fix the incorrect exchange coordinator id in the physical plan. 
The coordinator object instead of the `Option[ExchangeCoordinator]` should be 
used to generate the identity hash code of the same coordinator.

## How was this patch tested?
Before the patch, the physical plan shows two different exchange coordinator id 
for Join.
```
== Physical Plan ==
*Project [key1#3L, value2#12L]
+- *SortMergeJoin [key1#3L], [key2#11L], Inner
   :- *Sort [key1#3L ASC NULLS FIRST], false, 0
   :  +- Exchange(coordinator id: 1804587700) hashpartitioning(key1#3L, 10), 
coordinator[target post-shuffle partition size: 67108864]
   : +- *Project [(id#0L % 500) AS key1#3L]
   :+- *Filter isnotnull((id#0L % 500))
   :   +- *Range (0, 1000, step=1, splits=Some(10))
   +- *Sort [key2#11L ASC NULLS FIRST], false, 0
  +- Exchange(coordinator id: 793927319) hashpartitioning(key2#11L, 10), 
coordinator[target post-shuffle partition size: 67108864]
 +- *Project [(id#8L % 500) AS key2#11L, id#8L AS value2#12L]
+- *Filter isnotnull((id#8L % 500))
   +- *Range (0, 1000, step=1, splits=Some(10))
```
After the patch, two exchange coordinator id are the same.

Author: Carson Wang 

Closes #16952 from carsonwang/FixCoordinatorId.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dd9049e0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dd9049e0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dd9049e0

Branch: refs/heads/master
Commit: dd9049e0492cc70b629518fee9b3d1632374c612
Parents: fcb68e0
Author: Carson Wang 
Authored: Fri Mar 10 11:13:26 2017 -0800
Committer: Yin Huai 
Committed: Fri Mar 10 11:13:26 2017 -0800

--
 .../org/apache/spark/sql/execution/exchange/ShuffleExchange.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dd9049e0/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala
index 125a493..f06544e 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchange.scala
@@ -46,7 +46,7 @@ case class ShuffleExchange(
   override def nodeName: String = {
 val extraInfo = coordinator match {
   case Some(exchangeCoordinator) =>
-s"(coordinator id: ${System.identityHashCode(coordinator)})"
+s"(coordinator id: ${System.identityHashCode(exchangeCoordinator)})"
   case None => ""
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...

2017-03-10 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16952
  
LGTM. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17156: [SPARK-19816][SQL][Tests] Fix an issue that DataFrameCal...

2017-03-03 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17156
  
merged to branch-2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite doesn't recover the log level

2017-03-03 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 da04d45c2 -> 664c9795c


[SPARK-19816][SQL][TESTS] Fix an issue that DataFrameCallbackSuite doesn't 
recover the log level

## What changes were proposed in this pull request?

"DataFrameCallbackSuite.execute callback functions when a DataFrame action 
failed" sets the log level to "fatal" but doesn't recover it. Hence, tests 
running after it won't output any logs except fatal logs.

This PR uses `testQuietly` instead to avoid changing the log level.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu 

Closes #17156 from zsxwing/SPARK-19816.

(cherry picked from commit fbc4058037cf5b0be9f14a7dd28105f7f8151bed)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/664c9795
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/664c9795
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/664c9795

Branch: refs/heads/branch-2.1
Commit: 664c9795c94d3536ff9fe54af06e0fb6c0012862
Parents: da04d45
Author: Shixiong Zhu 
Authored: Fri Mar 3 19:00:35 2017 -0800
Committer: Yin Huai 
Committed: Fri Mar 3 19:09:38 2017 -0800

--
 .../scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/664c9795/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
--
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
index 3ae5ce6..f372e94 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
@@ -58,7 +58,7 @@ class DataFrameCallbackSuite extends QueryTest with 
SharedSQLContext {
 spark.listenerManager.unregister(listener)
   }
 
-  test("execute callback functions when a DataFrame action failed") {
+  testQuietly("execute callback functions when a DataFrame action failed") {
 val metrics = ArrayBuffer.empty[(String, QueryExecution, Exception)]
 val listener = new QueryExecutionListener {
   override def onFailure(funcName: String, qe: QueryExecution, exception: 
Exception): Unit = {
@@ -73,8 +73,6 @@ class DataFrameCallbackSuite extends QueryTest with 
SharedSQLContext {
 val errorUdf = udf[Int, Int] { _ => throw new RuntimeException("udf 
error") }
 val df = sparkContext.makeRDD(Seq(1 -> "a")).toDF("i", "j")
 
-// Ignore the log when we are expecting an exception.
-sparkContext.setLogLevel("FATAL")
 val e = intercept[SparkException](df.select(errorUdf($"i")).collect())
 
 assert(metrics.length == 1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #17156: [SPARK-19816][SQL][Tests] Fix an issue that DataFrameCal...

2017-03-03 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/17156
  
Let's also merge this to branch-2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16917: [SPARK-19529][BRANCH-1.6] Backport PR #16866 to branch-1...

2017-02-27 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16917
  
Let's use a meaningful title in future :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test

2017-02-15 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16935
  
cool. It has been merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-19604][TESTS] Log the start of every Python test

2017-02-15 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 88c43f4fb -> b9ab4c0e9


[SPARK-19604][TESTS] Log the start of every Python test

## What changes were proposed in this pull request?
Right now, we only have info level log after we finish the tests of a Python 
test file. We should also log the start of a test. So, if a test is hanging, we 
can tell which test file is running.

## How was this patch tested?
This is a change for python tests.

Author: Yin Huai <yh...@databricks.com>

Closes #16935 from yhuai/SPARK-19604.

(cherry picked from commit f6c3bba22501ee7753d85c6e51ffe851d43869c1)
Signed-off-by: Yin Huai <yh...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9ab4c0e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9ab4c0e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9ab4c0e

Branch: refs/heads/branch-2.1
Commit: b9ab4c0e983df463232f1adbe6e5982b0d7d497d
Parents: 88c43f4
Author: Yin Huai <yh...@databricks.com>
Authored: Wed Feb 15 14:41:15 2017 -0800
Committer: Yin Huai <yh...@databricks.com>
Committed: Wed Feb 15 18:43:57 2017 -0800

--
 python/run-tests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b9ab4c0e/python/run-tests.py
--
diff --git a/python/run-tests.py b/python/run-tests.py
index 38b3bb8..53a0aef 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -72,7 +72,7 @@ def run_individual_python_test(test_name, pyspark_python):
 'PYSPARK_PYTHON': which(pyspark_python),
 'PYSPARK_DRIVER_PYTHON': which(pyspark_python)
 })
-LOGGER.debug("Starting test(%s): %s", pyspark_python, test_name)
+LOGGER.info("Starting test(%s): %s", pyspark_python, test_name)
 start_time = time.time()
 try:
 per_test_output = tempfile.TemporaryFile()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test

2017-02-15 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16935
  
Seems I cannot merge now... Will try again later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test

2017-02-15 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16935
  
ok. Nothing new to add. I will merge this to master and branch-2.1 (in case 
we want to debug any python test hanging issue in branch-2.1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16935: [SPARK-19604] [TESTS] Log the start of every Python test

2017-02-14 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16935
  
Let's not merge it right now. I may need to log more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16935: [SPARK-19604] [TESTS] Log the start of every Pyth...

2017-02-14 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/16935

[SPARK-19604] [TESTS] Log the start of every Python test


## What changes were proposed in this pull request?
Right now, we only have info level log after we finish the tests of a 
Python test file. We should also log the start of a test. So, if a test is 
hanging, we can tell which test file is running.

## How was this patch tested?
This is a change for python tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark SPARK-19604

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16935.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16935


commit 1181cc3be7bcf21fbe7e88b35ac662353fb2f366
Author: Yin Huai <yh...@databricks.com>
Date:   2017-02-15T04:19:28Z

Right now, we only have info level log after we finish the tests of a 
Python test file. We should also log the start of a test. So, if a test is 
hanging, we can tell which test file is running.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16894: [SPARK-17897] [SQL] [BACKPORT-2.0] Fixed IsNotNull Const...

2017-02-12 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16894
  
thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2017-02-10 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16067
  
@gatorsmile can we also add it in branch-2.0? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-19295][SQL] IsolatedClientLoader's downloadVersion should log the location of downloaded metastore client jars

2017-01-19 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 640f94233 -> 63d839028


[SPARK-19295][SQL] IsolatedClientLoader's downloadVersion should log the 
location of downloaded metastore client jars

## What changes were proposed in this pull request?
This will help the users to know the location of those downloaded jars when 
`spark.sql.hive.metastore.jars` is set to `maven`.

## How was this patch tested?
jenkins

Author: Yin Huai <yh...@databricks.com>

Closes #16649 from yhuai/SPARK-19295.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/63d83902
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/63d83902
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/63d83902

Branch: refs/heads/master
Commit: 63d839028a6e03644febc360519fa8e01c5534cf
Parents: 640f942
Author: Yin Huai <yh...@databricks.com>
Authored: Thu Jan 19 14:23:36 2017 -0800
Committer: Yin Huai <yh...@databricks.com>
Committed: Thu Jan 19 14:23:36 2017 -0800

--
 .../org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/63d83902/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
index 26b2de8..63fdd6b 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
@@ -122,6 +122,7 @@ private[hive] object IsolatedClientLoader extends Logging {
 // TODO: Remove copy logic.
 val tempDir = Utils.createTempDir(namePrefix = s"hive-${version}")
 allFiles.foreach(f => FileUtils.copyFileToDirectory(f, tempDir))
+logInfo(s"Downloaded metastore jars to ${tempDir.getCanonicalPath}")
 tempDir.listFiles().map(_.toURI.toURL)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #16649: [SPARK-19295] [SQL] IsolatedClientLoader's downloadVersi...

2017-01-19 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16649
  
Cool I am merging this to master. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16645
  
My main concern of this pr is that if people will think it is recommended 
to add new batches to force those rules running in a certain ordering. For 
these resolution rules, we can also use conditions to control when they will 
fire, right? If we will always replace a logical plan to another one in the 
analysis phase, seems we should use `resolved` to control if a rule will fired. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16649: [SPARK-19295] [SQL] IsolatedClientLoader's downlo...

2017-01-19 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/16649

[SPARK-19295] [SQL] IsolatedClientLoader's downloadVersion should log the 
location of downloaded metastore client jars

## What changes were proposed in this pull request?
This will help the users to know the location of those downloaded jars.

## How was this patch tested?
jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark SPARK-19295

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16649.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16649


commit 6c67582d85473d123053a45aa051578232c32dad
Author: Yin Huai <yh...@databricks.com>
Date:   2017-01-19T20:30:08Z

[SPARK-19295] IsolatedClientLoader's downloadVersion should log the 
location of downloaded metastore client jars




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: Update known_translations for contributor names

2017-01-18 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master fe409f31d -> 0c9231858


Update known_translations for contributor names

## What changes were proposed in this pull request?
Update known_translations per 
https://github.com/apache/spark/pull/16423#issuecomment-269739634

Author: Yin Huai <yh...@databricks.com>

Closes #16628 from yhuai/known_translations.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0c923185
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0c923185
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0c923185

Branch: refs/heads/master
Commit: 0c9231858866eff16f97df073d22811176fb6b36
Parents: fe409f3
Author: Yin Huai <yh...@databricks.com>
Authored: Wed Jan 18 18:18:51 2017 -0800
Committer: Yin Huai <yh...@databricks.com>
Committed: Wed Jan 18 18:18:51 2017 -0800

--
 dev/create-release/known_translations | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0c923185/dev/create-release/known_translations
--
diff --git a/dev/create-release/known_translations 
b/dev/create-release/known_translations
index 0f30990..87bf2f2 100644
--- a/dev/create-release/known_translations
+++ b/dev/create-release/known_translations
@@ -177,7 +177,7 @@ anabranch - Bill Chambers
 ashangit - Nicolas Fraison
 avulanov - Alexander Ulanov
 biglobster - Liang Ke
-cenyuhai - Cen Yu Hai
+cenyuhai - Yuhai Cen
 codlife - Jianfei Wang
 david-weiluo-ren - Weiluo (David) Ren
 dding3 - Ding Ding
@@ -198,7 +198,8 @@ petermaxlee - Peter Lee
 phalodi - Sandeep Purohit
 pkch - pkch
 priyankagargnitk - Priyanka Garg
-sharkdtu - Sharkd Tu
+sharkdtu - Xiaogang Tu
 shenh062326 - Shen Hong
 aokolnychyi - Anton Okolnychyi
 linbojin - Linbo Jin
+lw-lin - Liwei Lin


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

2017-01-18 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16613
  
nvm. After second thought, the feature flag does not really buy us 
anything. We just store the original view definition and the column mapping in 
the metastore. So, I think it is fine to just do the switch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16628: Update known_translations for contributor names

2017-01-18 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16628
  
I am merging this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2017-01-18 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14204
  
ok I agree. Originally, I thought it will be helpful to figure out the 
worker that an executor belongs to.

But, if it does not provide very useful information. I am fine to drop it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16628: Update known_translations for contributor names

2017-01-18 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16628
  
done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

2017-01-18 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16613
  
is there a feature flag that is used to determine if we use this new 
approach? I feel it will be good to have an internal feature flag to determine 
the code path. So, if there is something wrong that is hard to fix quickly 
before the release, we can still switch back to the old code path. Then, in the 
next release, we can remove the feature flag. What do you think?

Also, @jiangxb1987 can you take a look at the SQLViewSuite and see if we 
have enough test coverage?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...

2017-01-17 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16517
  
Looks good to me. @gatorsmile can you explain your concerns? I am wondering 
what kind of cases that you think HiveFileFormat may not be able to handle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...

2017-01-17 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16517#discussion_r96566857
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -276,40 +276,31 @@ case class InsertIntoHiveTable(
   }
 }
 
-val jobConf = new JobConf(hadoopConf)
-val jobConfSer = new SerializableJobConf(jobConf)
-
-// When speculation is on and output committer class name contains 
"Direct", we should warn
-// users that they may loss data if they are using a direct output 
committer.
-val speculationEnabled = 
sqlContext.sparkContext.conf.getBoolean("spark.speculation", false)
-val outputCommitterClass = 
jobConf.get("mapred.output.committer.class", "")
-if (speculationEnabled && outputCommitterClass.contains("Direct")) {
--- End diff --

seems this change is unnecessary and users may still use direct output 
committer (they can still find the code on Internet). Let's keep the warning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...

2017-01-17 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16517#discussion_r96566523
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -276,40 +276,31 @@ case class InsertIntoHiveTable(
   }
 }
 
-val jobConf = new JobConf(hadoopConf)
-val jobConfSer = new SerializableJobConf(jobConf)
-
-// When speculation is on and output committer class name contains 
"Direct", we should warn
-// users that they may loss data if they are using a direct output 
committer.
-val speculationEnabled = 
sqlContext.sparkContext.conf.getBoolean("spark.speculation", false)
-val outputCommitterClass = 
jobConf.get("mapred.output.committer.class", "")
-if (speculationEnabled && outputCommitterClass.contains("Direct")) {
--- End diff --

Do we still need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...

2017-01-17 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16517#discussion_r96566290
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
 ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.hive.ql.exec.Utilities
+import org.apache.hadoop.hive.ql.io.{HiveFileFormatUtils, HiveOutputFormat}
+import org.apache.hadoop.hive.serde2.Serializer
+import 
org.apache.hadoop.hive.serde2.objectinspector.{ObjectInspectorUtils, 
StructObjectInspector}
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.ObjectInspectorCopyOption
+import org.apache.hadoop.io.Writable
+import org.apache.hadoop.mapred.{JobConf, Reporter}
+import org.apache.hadoop.mapreduce.{Job, TaskAttemptContext}
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources.{FileFormat, 
OutputWriter, OutputWriterFactory}
+import org.apache.spark.sql.hive.{HiveInspectors, HiveTableUtil}
+import org.apache.spark.sql.hive.HiveShim.{ShimFileSinkDesc => 
FileSinkDesc}
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.SerializableJobConf
+
+/**
+ * `FileFormat` for writing Hive tables.
+ *
+ * TODO: implement the read logic.
+ */
+class HiveFileFormat(fileSinkConf: FileSinkDesc) extends FileFormat {
+  override def inferSchema(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  files: Seq[FileStatus]): Option[StructType] = {
+throw new UnsupportedOperationException(s"inferSchema is not supported 
for hive data source.")
+  }
+
+  override def prepareWrite(
+  sparkSession: SparkSession,
+  job: Job,
+  options: Map[String, String],
+  dataSchema: StructType): OutputWriterFactory = {
+val conf = job.getConfiguration
+val tableDesc = fileSinkConf.getTableInfo
+conf.set("mapred.output.format.class", 
tableDesc.getOutputFileFormatClassName)
+
+// Add table properties from storage handler to hadoopConf, so any 
custom storage
+// handler settings can be set to hadoopConf
+HiveTableUtil.configureJobPropertiesForStorageHandler(tableDesc, conf, 
false)
+Utilities.copyTableJobPropertiesToConf(tableDesc, conf)
--- End diff --

Will tableDesc be null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 5990 matches

Mail list logo