[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20649
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88166/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20649
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20649
  
**[Test build #88166 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88166/testReport)**
 for PR 20649 at commit 
[`78e037d`](https://github.com/apache/spark/commit/78e037d06d2972062cac14fe8b503e0802a4c32e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-11 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20795#discussion_r173692751
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1192,10 +1193,18 @@ class Analyzer(
* @see https://issues.apache.org/jira/browse/SPARK-19737
*/
   object LookupFunctions extends Rule[LogicalPlan] {
-override def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformAllExpressions {
-  case f: UnresolvedFunction if !catalog.functionExists(f.name) =>
-withPosition(f) {
-  throw new 
NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName)
+override def apply(plan: LogicalPlan): LogicalPlan = {
+  val catalogFunctionNameSet = new 
mutable.HashSet[FunctionIdentifier]()
+plan.transformAllExpressions {
--- End diff --

changed, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20782: [SPARK-23627][SQL] Provide isEmpty in DataSet

2018-03-11 Thread goungoun
Github user goungoun commented on the issue:

https://github.com/apache/spark/pull/20782
  
As unnecessary information is included, I closed this pull request. Please 
refer request #20800 instead of #20782. I am sorry for your inconvenience.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20799
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20799
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88168/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20799
  
**[Test build #88168 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88168/testReport)**
 for PR 20799 at commit 
[`afd88e1`](https://github.com/apache/spark/commit/afd88e1a6585851f307413fd39a72167b9d8224c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in DataSet

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20800
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20800: [SPARK-23627][SQL] Provide isEmpty in DataSet

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20800
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20800: isEmpty in Dataset and its testSuite

2018-03-11 Thread goungoun
GitHub user goungoun opened a pull request:

https://github.com/apache/spark/pull/20800

isEmpty in Dataset and its testSuite

## What changes were proposed in this pull request?

This PR adds isEmpty() in DataSet

## How was this patch tested?

Unit tests added

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/goungoun/spark SPARK-23627

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20800.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20800


commit d3fc620e49ae07f6ebc0ee20976c12da49f2e0af
Author: Goun Na 
Date:   2018-03-12T02:57:19Z

isEmpty in Dataset and its testSuite




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Prevent crashes on schema inferring o...

2018-03-11 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20796
  
Could you add more tests for all the invalid range `253-255` in 
`UTF8StringSuite`?

@HyukjinKwon Could you trigger this test? Also, kindly pinging cuz this is 
probably your domain?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20799
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1459/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20799
  
**[Test build #88168 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88168/testReport)**
 for PR 20799 at commit 
[`afd88e1`](https://github.com/apache/spark/commit/afd88e1a6585851f307413fd39a72167b9d8224c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20799: [SPARK-23635][YARN] AM env variable should not overwrite...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20799
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20799: [SPARK-23635][YARN] AM env variable should not ov...

2018-03-11 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/20799

[SPARK-23635][YARN] AM env variable should not overwrite same name env 
variable set through spark.executorEnv.

## What changes were proposed in this pull request?

In the current Spark on YARN code, AM always will copy and overwrite its 
env variables to executors, so we cannot set different values for executors.

To reproduce issue, user could start spark-shell like:

```
./bin/spark-shell --master yarn-client --conf 
spark.executorEnv.SPARK_ABC=executor_val --conf  
spark.yarn.appMasterEnv.SPARK_ABC=am_val
```

Then check executor env variables by

```
sc.parallelize(1 to 1).flatMap \{ i => sys.env.toSeq 
}.collect.foreach(println)
```

We will always get `am_val` instead of `executor_val`. So we should not let 
AM to overwrite specifically set executor env variables.

## How was this patch tested?

Added UT and tested in local cluster.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-23635

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20799.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20799


commit afd88e1a6585851f307413fd39a72167b9d8224c
Author: jerryshao 
Date:   2018-03-12T03:21:40Z

AM env variable should not overwrite same name env variable set through 
spark.executorEnv.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1458/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20705
  
**[Test build #88167 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88167/testReport)**
 for PR 20705 at commit 
[`159489c`](https://github.com/apache/spark/commit/159489c884cd381584ade0734d6e215a2a67103c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20705
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20705
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20757: [SPARK-23595][SQL] ValidateExternalType should su...

2018-03-11 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20757#discussion_r173683323
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -1440,7 +1463,7 @@ case class ValidateExternalType(child: Expression, 
expected: DataType)
 Seq(classOf[java.math.BigDecimal], classOf[scala.math.BigDecimal], 
classOf[Decimal])
   .map(cls => s"$obj instanceof ${cls.getName}").mkString(" || ")
   case _: ArrayType =>
-s"$obj instanceof ${classOf[Seq[_]].getName} || 
$obj.getClass().isArray()"
+s"$obj.getClass().isArray() || $obj instanceof 
${classOf[Seq[_]].getName}"
--- End diff --

Based on the @rednaxelafx 's comment, I changed this codegen, too. 
https://github.com/apache/spark/pull/20757#discussion_r173006281


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20649
  
**[Test build #88166 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88166/testReport)**
 for PR 20649 at commit 
[`78e037d`](https://github.com/apache/spark/commit/78e037d06d2972062cac14fe8b503e0802a4c32e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...

2018-03-11 Thread xysun
Github user xysun commented on the issue:

https://github.com/apache/spark/pull/20649
  
Latest code review fixes pushed. cc @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-03-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r173682841
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed_types = (basestring, list, float, int)
--- End diff --

Should it be `basestring` or `str` as previously here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20649: [SPARK-23462][SQL] improve missing field error me...

2018-03-11 Thread xysun
Github user xysun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20649#discussion_r173682838
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -271,7 +271,9 @@ case class StructType(fields: Array[StructField]) 
extends DataType with Seq[Stru
*/
   def apply(name: String): StructField = {
 nameToField.getOrElse(name,
-  throw new IllegalArgumentException(s"""Field "$name" does not 
exist."""))
+  throw new IllegalArgumentException(
+s"""Field "$name" does not exist.
+   |Available fields: ${fieldNamesSet.mkString(", 
")}""".stripMargin))
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20797: [SPARK-23583][SQL] Invoke should support interpre...

2018-03-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20797#discussion_r173682156
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -266,8 +266,26 @@ case class Invoke(
   override def nullable: Boolean = targetObject.nullable || needNullCheck 
|| returnNullable
   override def children: Seq[Expression] = targetObject +: arguments
 
-  override def eval(input: InternalRow): Any =
-throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
+  override def eval(input: InternalRow): Any = {
+val obj = targetObject.eval(input)
+val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
+val argClasses = 
CallMethodViaReflection.expressionJavaClasses(arguments)
+val method = obj.getClass.getDeclaredMethod(functionName, argClasses : 
_*)
--- End diff --

We should also check if `obj` is `null` too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20797: [SPARK-23583][SQL] Invoke should support interpre...

2018-03-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20797#discussion_r173682087
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -266,8 +266,26 @@ case class Invoke(
   override def nullable: Boolean = targetObject.nullable || needNullCheck 
|| returnNullable
   override def children: Seq[Expression] = targetObject +: arguments
 
-  override def eval(input: InternalRow): Any =
-throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
+  override def eval(input: InternalRow): Any = {
+val obj = targetObject.eval(input)
+val args = arguments.map(e => e.eval(input).asInstanceOf[Object])
+val argClasses = 
CallMethodViaReflection.expressionJavaClasses(arguments)
+val method = obj.getClass.getDeclaredMethod(functionName, argClasses : 
_*)
+if (needNullCheck && args.exists(_ == null)) {
+  // return null if one of arguments is null
+  null
+} else {
+  val ret = method.invoke(obj, args: _*)
+
+  if (CodeGenerator.defaultValue(dataType) == "null") {
+ret
+  } else {
+// cast a primitive value using Boxed class
+val boxedClass = 
CallMethodViaReflection.typeBoxedJavaMapping(dataType)
+boxedClass.cast(ret)
--- End diff --

Do we need to do explicitly cast? Doesn't Scala cast it if it's needed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88164/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20797
  
**[Test build #88164 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88164/testReport)**
 for PR 20797 at commit 
[`3dbea12`](https://github.com/apache/spark/commit/3dbea1252f8e43bf0f9a29d515b17935b58582e8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20798
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20798
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88165/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20798
  
**[Test build #88165 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88165/testReport)**
 for PR 20798 at commit 
[`5ec810a`](https://github.com/apache/spark/commit/5ec810a7c36691df1877ffc11e6f06392d438485).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20798
  
**[Test build #88165 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88165/testReport)**
 for PR 20798 at commit 
[`5ec810a`](https://github.com/apache/spark/commit/5ec810a7c36691df1877ffc11e6f06392d438485).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20798
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20763
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20763
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88163/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20763
  
**[Test build #88163 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88163/testReport)**
 for PR 20763 at commit 
[`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20798
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20798
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-11 Thread mstewart141
Github user mstewart141 commented on the issue:

https://github.com/apache/spark/pull/20798
  
[WIP]
cc @HyukjinKwon
👍

i'd love to run tests here to make sure i haven't broken something. i will 
update pr with new tests once i set up testing better on my local box.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20798: [SPARK-23645][PYTHON] Allow python udfs to be cal...

2018-03-11 Thread mstewart141
GitHub user mstewart141 opened a pull request:

https://github.com/apache/spark/pull/20798

[SPARK-23645][PYTHON] Allow python udfs to be called with keyword arguments

## [WIP]

## What changes were proposed in this pull request?

Currently one can not pass keyword arguments in python UDFs. This patch 
allows keyword arguments to be mixed arbitrarily with positional arguments, as 
seen in normal python functions. 

UDFs accepting an arbitrary (undefined) number of columns are a different 
matter, and not addressed here.

## How was this patch tested?

I will add unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mstewart141/spark udfkw

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20798.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20798


commit 5ec810a7c36691df1877ffc11e6f06392d438485
Author: Michael (Stu) Stewart 
Date:   2018-03-11T20:38:29Z

[SPARK-23645][PYTHON] Allow python udfs to be called with keyword arguments




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...

2018-03-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20331
  
Please don't format [this 
comment](https://github.com/apache/spark/pull/20331#pullrequestreview-90358792) 
during next update, @gatorsmile . Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20797
  
**[Test build #88164 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88164/testReport)**
 for PR 20797 at commit 
[`3dbea12`](https://github.com/apache/spark/commit/3dbea1252f8e43bf0f9a29d515b17935b58582e8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-03-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Thank you so much for review and approval, @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1457/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

2018-03-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20687#discussion_r173667671
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
@@ -22,54 +22,34 @@ import 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
 
 /**
-* push down operations into [[CreateNamedStructLike]].
-*/
-object SimplifyCreateStructOps extends Rule[LogicalPlan] {
-  override def apply(plan: LogicalPlan): LogicalPlan = {
-plan.transformExpressionsUp {
-  // push down field extraction
+ * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and 
[[CreateMap]] expressions.
+ */
+object SimplifyExtractValueOps extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { 
case p =>
+p.transformExpressionsUp {
--- End diff --

Sorry for late response, @gatorsmile .
These are expression-level optimization rules. If the original expressions 
exists in `SELECT`, `GROUP BY`, and `HAVING`, those are simplified in the same 
way together. Do you have any concerning cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20797: [SPARK-23583][SQL] Invoke should support interpre...

2018-03-11 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/20797

[SPARK-23583][SQL] Invoke should support interpreted execution

## What changes were proposed in this pull request?

This pr added interpreted execution for `Invoke`.

## How was this patch tested?

Added tests in `ObjectExpressionsSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-28583

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20797.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20797


commit 3dbea1252f8e43bf0f9a29d515b17935b58582e8
Author: Kazuaki Ishizaki 
Date:   2018-03-11T20:07:32Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20763
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1456/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20763
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20763
  
**[Test build #88163 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88163/testReport)**
 for PR 20763 at commit 
[`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20763
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...

2018-03-11 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/20669#discussion_r173665102
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigPropertiesStep.scala
 ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit.steps
+
+import java.io.StringWriter
+import java.util.Properties
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+
+/**
+ * Create a config map with the driver configuration and attach it to the 
pod. This needs to
+ * come at the end of the driver configuration so that all modifications 
to the Spark config
+ * are reflected in the generated config map.
+ */
+private[spark] class DriverConfigPropertiesStep(resourceNamePrefix: String)
+extends DriverConfigurationStep {
+
+  override def configureDriver(spec: KubernetesDriverSpec): 
KubernetesDriverSpec = {
+val configMapName = s"$resourceNamePrefix-driver-conf-map"
+val configMap = buildConfigMap(configMapName, spec.driverSparkConf)
--- End diff --

I will also include `--conf spark.driver.extraJavaOptions=` in the 
spark-submit launched in the driver


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20763
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20763
  
**[Test build #88162 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88162/testReport)**
 for PR 20763 at commit 
[`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20763
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88162/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-03-11 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/20683
  
@jerryshao  In Spark Streaming, I think ```.tmp``` is used as a suffix to 
indicate that the object was a file, although I do not know if this is 
universal.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...

2018-03-11 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/20669#discussion_r173663339
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverConfigPropertiesStep.scala
 ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.submit.steps
+
+import java.io.StringWriter
+import java.util.Properties
+
+import io.fabric8.kubernetes.api.model._
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.KubernetesDriverSpec
+
+/**
+ * Create a config map with the driver configuration and attach it to the 
pod. This needs to
+ * come at the end of the driver configuration so that all modifications 
to the Spark config
+ * are reflected in the generated config map.
+ */
+private[spark] class DriverConfigPropertiesStep(resourceNamePrefix: String)
+extends DriverConfigurationStep {
+
+  override def configureDriver(spec: KubernetesDriverSpec): 
KubernetesDriverSpec = {
+val configMapName = s"$resourceNamePrefix-driver-conf-map"
+val configMap = buildConfigMap(configMapName, spec.driverSparkConf)
--- End diff --

I agree in that it should be in any order. I will restructure it as such. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Prevent crashes on schema inferring o...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20796
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Prevent crashes on schema inferring o...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20796
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-11 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20795#discussion_r173662663
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1192,10 +1193,18 @@ class Analyzer(
* @see https://issues.apache.org/jira/browse/SPARK-19737
*/
   object LookupFunctions extends Rule[LogicalPlan] {
-override def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformAllExpressions {
-  case f: UnresolvedFunction if !catalog.functionExists(f.name) =>
-withPosition(f) {
-  throw new 
NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName)
+override def apply(plan: LogicalPlan): LogicalPlan = {
+  val catalogFunctionNameSet = new 
mutable.HashSet[FunctionIdentifier]()
+plan.transformAllExpressions {
+  case f: UnresolvedFunction if 
catalogFunctionNameSet.contains(f.name) => f
+  case f: UnresolvedFunction if catalog.functionExists(f.name) =>
+catalogFunctionNameSet.add(f.name)
--- End diff --

sure, I will do that. Thanks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20796: [SPARK-23649][SQL] Prevent crashes on schema infe...

2018-03-11 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/20796

[SPARK-23649][SQL] Prevent crashes on schema inferring of CSV containing 
wrong UTF-8 chars

## What changes were proposed in this pull request?

The mapping of UTF-8 char's first byte to char's size doesn't cover whole 
range 0-255. It is defined only for 0-253:

https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L60-L65

https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java#L190

If the first byte of a char is 253-255, IndexOutOfBoundsException is 
thrown. Besides of that values for 244-252 are not correct according to recent 
unicode standard for UTF-8: 
http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf

As a consequence of the exception above, the length of input string in 
UTF-8 encoding cannot be calculated if the string contains chars started from 
253 code. It is visible on user's side as for example crashing of schema 
inferring of csv file which contains such chars but the file can be read if the 
schema is specified explicitly or if the mode set to multiline.

The proposed changes build correct mapping of first byte of UTF-8 char to 
its size (now it covers all cases) and skip disallowed chars (counts it as one 
octet).

## How was this patch tested?

Added a test and a file with a char which is disallowed in UTF-8 - 0xFF.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 skip-wrong-utf8-chars

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20796.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20796


commit 6d6b3ca8eedc7bf6381cb8d746fe8a3ee0a281b2
Author: Maxim Gekk 
Date:   2018-02-27T17:41:01Z

Test: skip bytes disallowed in UTF-8

commit 0f474a033f152340519b10676026b80b7593c2f5
Author: Maxim Gekk 
Date:   2018-02-27T17:43:53Z

Making correct map of first byte to char size in UTF-8 and skip bytes 
disallowed in UTF-8

commit 2ee661618a40544a8cbf3ac0794a1a6b86b6b62d
Author: Maxim Gekk 
Date:   2018-02-28T13:26:21Z

Check inferred schema and returned bad string

commit d6c5f02ea1a08513a54ea9f3b30986dd92188b3e
Author: Maxim Gekk 
Date:   2018-02-28T14:04:22Z

The test csv was simplified




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20795#discussion_r173660285
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1192,10 +1193,18 @@ class Analyzer(
* @see https://issues.apache.org/jira/browse/SPARK-19737
*/
   object LookupFunctions extends Rule[LogicalPlan] {
-override def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformAllExpressions {
-  case f: UnresolvedFunction if !catalog.functionExists(f.name) =>
-withPosition(f) {
-  throw new 
NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName)
+override def apply(plan: LogicalPlan): LogicalPlan = {
+  val catalogFunctionNameSet = new 
mutable.HashSet[FunctionIdentifier]()
+plan.transformAllExpressions {
+  case f: UnresolvedFunction if 
catalogFunctionNameSet.contains(f.name) => f
+  case f: UnresolvedFunction if catalog.functionExists(f.name) =>
+catalogFunctionNameSet.add(f.name)
--- End diff --

Normalize the name before adding it to the cache? This can cover more cases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread kevinyu98
Github user kevinyu98 commented on the issue:

https://github.com/apache/spark/pull/20795
  
@viirya Thanks a lot. I will create a new test file LookupFunctionsSuite 
under sql/catalyst/analysis. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20763
  
**[Test build #88162 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88162/testReport)**
 for PR 20763 at commit 
[`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20763
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1455/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20763
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20763
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [WIP][SPARK-23643] Shrinking the buffer in hashSeed up t...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20793
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88160/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [WIP][SPARK-23643] Shrinking the buffer in hashSeed up t...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20793
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [WIP][SPARK-23643] Shrinking the buffer in hashSeed up t...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20793
  
**[Test build #88160 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88160/testReport)**
 for PR 20793 at commit 
[`177afcc`](https://github.com/apache/spark/commit/177afcc4277b604b783aef40d86d93d6a9add6fc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20795
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88161/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20795
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20795
  
**[Test build #88161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88161/testReport)**
 for PR 20795 at commit 
[`701100c`](https://github.com/apache/spark/commit/701100c11126d7437dc03ef20b484e84e2f9cb2a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20795
  
I've added a test like this:

```scala
class LookupFunctionsSuite extends PlanTest {

  test("SPARK-23486: LookupFunctions should not check the same function 
name more than once") {
val externalCatalog = new CustomInMemoryCatalog
val analyzer = {
  val conf = new SQLConf()
  val catalog = new SessionCatalog(externalCatalog, 
FunctionRegistry.builtin, conf)
  catalog.createDatabase(
CatalogDatabase("default", "", new URI("loc"), Map.empty),
ignoreIfExists = false)
  new Analyzer(catalog, conf)
}

val unresolvedFunc = UnresolvedFunction("func", Seq.empty, false)
val plan = Project(
  Seq(Alias(unresolvedFunc, "call1")(), Alias(unresolvedFunc, 
"call2")()),
  table("TaBlE"))
analyzer.LookupFunctions.apply(plan)
assert(externalCatalog.getFunctionExistsCalledTimes == 1)
  }
}

class CustomInMemoryCatalog extends InMemoryCatalog {

  private var functionExistsCalledTimes: Int = 0

  override def functionExists(db: String, funcName: String): Boolean = 
synchronized {
functionExistsCalledTimes = functionExistsCalledTimes + 1
true
  }

  def getFunctionExistsCalledTimes: Int = functionExistsCalledTimes
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20795#discussion_r173655309
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1192,10 +1193,18 @@ class Analyzer(
* @see https://issues.apache.org/jira/browse/SPARK-19737
*/
   object LookupFunctions extends Rule[LogicalPlan] {
-override def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformAllExpressions {
-  case f: UnresolvedFunction if !catalog.functionExists(f.name) =>
-withPosition(f) {
-  throw new 
NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName)
+override def apply(plan: LogicalPlan): LogicalPlan = {
+  val catalogFunctionNameSet = new 
mutable.HashSet[FunctionIdentifier]()
+plan.transformAllExpressions {
--- End diff --

style: indent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hint in p...

2018-03-11 Thread DylanGuedes
Github user DylanGuedes commented on the issue:

https://github.com/apache/spark/pull/20788
  
Hi, I added two subtasks (one for Python and one for R) 
[here](https://issues.apache.org/jira/browse/SPARK-21030)

Should I close this PR and open a new one or it is ok to just rename it to 
the new JIRA id?

@felixcheung I don't know a good way to test it, do you have any suggestion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20795
  
**[Test build #88161 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88161/testReport)**
 for PR 20795 at commit 
[`701100c`](https://github.com/apache/spark/commit/701100c11126d7437dc03ef20b484e84e2f9cb2a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20795
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20795
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20795
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20795
  
ok to test


(seems uppercase not working somehow with Jenkins .. )


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20793
  
**[Test build #88160 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88160/testReport)**
 for PR 20793 at commit 
[`177afcc`](https://github.com/apache/spark/commit/177afcc4277b604b783aef40d86d93d6a9add6fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-11 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20795
  
Ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-11 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20793
  
The question is that existing output of pseudo random/sample is guaranteed 
by public API. It seems it doesn't. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-11 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20793
  
At least some tests expect that particular values would be result of 
sample/random: 
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L550-L564
 . 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org