[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...

2018-10-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22812


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22812
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22309
  
According to the second rule, value class in Array needs to be 
instantiated. In current code, I think it should go to the special handling 
case of value class. Can you also add a test to verify it is working?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22309
  
How about value class in Array? Do we support it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228734242
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
+  // as a field, value class is represented by its underlying type
+  val trueFieldType =
+if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) 
else fieldType
+
   val fieldValue = Invoke(
-AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(fieldType),
-returnNullable = !fieldType.typeSymbol.asClass.isPrimitive)
-  val clsName = getClassNameFromType(fieldType)
+AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(trueFieldType),
+returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive)
+  val clsName = getClassNameFromType(trueFieldType)
--- End diff --

Thanks for explaining! In order to cover both value class instantiated and 
not instantiated cases, I think we may need this special handling.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228734075
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

yes @viirya 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22867
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22867
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22867
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN...

2018-10-27 Thread gss2002
GitHub user gss2002 opened a pull request:

https://github.com/apache/spark/pull/22867

[SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails …

…due lack of access to tmpDir from $PWD to HDFS

WriteAheadLogBackedBlockRDD usage of java.io.tmpdir will fail if $PWD 
resolves to a folder in HDFS and the Spark YARN Cluster job does not have the 
correct access to this folder in regards to the dummy folder. So this patch 
provides an option to set spark.streaming.receiver.blockStore.tmpdir to 
override java.io.tmpdir which sets $PWD from YARN Cluster mode.

## What changes were proposed in this pull request?
This change provides an option to override the java.io.tmpdir option so 
that when $PWD is resolved in YARN Cluster mode Spark does not attempt to use 
this folder and instead use the folder provided with the following option: 
spark.streaming.receiver.blockStore.tmpdir

## How was this patch tested?
Patch was manually tested on a Spark Streaming Job with Write Ahead logs in 
Cluster mode.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gss2002/spark SPARK-25778

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22867.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22867


commit 4970827038fb18f74c1b7975d7bfc00609dc9405
Author: gss2002 
Date:   2018-10-28T04:22:31Z

[SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due 
lack of access to tmpDir from $PWD to HDFS
WriteAheadLogBackedBlockRDD usage of java.io.tmpdir will fail if $PWD 
resolves to a folder in HDFS and the Spark YARN Cluster job does not have the 
correct access to this folder in regards to the dummy folder. So this patch 
provides an option to set spark.streaming.receiver.blockStore.tmpdir to 
override java.io.tmpdir which sets $PWD from YARN Cluster mode.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733929
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

So for Dataset, can we conclude that if a value class is top-level, it will 
be instantiated, but if it is nested field, it will be just underlying type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733771
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

No in that case `Id` will be `Int`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733678
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

If the `T` is `User`, will `Id` class be instantiated?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22797: [SPARK-19851][SQL] Add support for EVERY and ANY ...

2018-10-27 Thread dilipbiswal
Github user dilipbiswal closed the pull request at:

https://github.com/apache/spark/pull/22797


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22809
  
Thanks a lot @cloud-fan @mgaido91 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-10-27 Thread dilipbiswal
Github user dilipbiswal closed the pull request at:

https://github.com/apache/spark/pull/22047


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22812
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98147/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22812
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22812
  
**[Test build #98147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98147/testReport)**
 for PR 22812 at commit 
[`517bebf`](https://github.com/apache/spark/commit/517bebfb1e49f2315019696a50b657dcf715778c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22309
  
**[Test build #98155 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98155/testReport)**
 for PR 22309 at commit 
[`1eab8b5`](https://github.com/apache/spark/commit/1eab8b5c30695fdd25d02961e40d6ca9c0fda601).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

Oh, sorry I didn't see the identify case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread mt40
Github user mt40 commented on the issue:

https://github.com/apache/spark/pull/22309
  
@cloud-fan @viirya I was silly @_@, I forgot to push the change for that 
part and missed that error message from Jenkins as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733511
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

hmm, I think the first rule is that a value class is treated as another 
type. Isn't `T` just `Id`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22309
  
```
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:401:
 value isDefined is not a member of 
org.apache.spark.sql.catalyst.expressions.Expression
[error] if (path.isDefined) {
[error]  ^
```

You need to rebase with current master. `path` is not `Option` now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733409
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

it is the 1st rule where `Id` is treated as a type parameter, you can see 
the `def identity[T]` example


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22309
  
try to build sql/core?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread mt40
Github user mt40 commented on the issue:

https://github.com/apache/spark/pull/22309
  
hi @viirya, may I ask whether Jenkins is having issue? It reports
```
[error] (catalyst/compile:compileIncremental) Compilation failed
[error] Total time: 175 s, completed Oct 27, 2018 8:33:41 PM
[error] running /home/jenkins/workspace/SparkPullRequestBuilder/build/sbt 
-Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn 
-Pspark-ganglia-lgpl -Phive -Pmesos test:package 
streaming-kinesis-asl-assembly/assembly ; received return code 1
```
but I can build project `catalyst` fine on my laptop


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733287
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

> For example, in method encodeDecodeTest, if we pass an instance of Id as 
input, it will not be converted to Int.

Why in this case, `Id` won't be a Int? It seems to not be in the three 
rules.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22309
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98154/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22309
  
**[Test build #98154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98154/testReport)**
 for PR 22309 at commit 
[`bf8a6de`](https://github.com/apache/spark/commit/bf8a6de0bd6029c7a94d00d2c0f04e502d440e9f).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  encodeDecodeTest(ValueContainer(1, StringWrapper(null)), \"nested 
value class with null\")`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22309
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733161
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -379,6 +388,28 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for top-level value class, if it is used as another 
type
+// (e.g. as its parent trait or generic), the compiler keeps the 
class
+// so we must provide an instance of the class too. In other cases,
+// the compiler will handle wrapping/unwrapping for us 
automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
--- End diff --

No, I haven't rebased since last week.
Hmm, now since `path` is not `Option` anymore, I think I have to use `if 
(walkedTypePath.length > 1)` to have the same logic. But this seems a little 
bit hacky. Do you have any suggestion on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733107
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

yes, but there are some cases where an instance of value class must be 
created. You can see this [allocation 
rule](https://docs.scala-lang.org/overviews/core/value-classes.html#allocation-details).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22309
  
**[Test build #98154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98154/testReport)**
 for PR 22309 at commit 
[`bf8a6de`](https://github.com/apache/spark/commit/bf8a6de0bd6029c7a94d00d2c0f04e502d440e9f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228733087
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
 ---
@@ -297,11 +307,16 @@ class ExpressionEncoderSuite extends 
CodegenInterpretedPlanTest with AnalysisTes
 ExpressionEncoder.tuple(intEnc, ExpressionEncoder.tuple(intEnc, 
longEnc))
   }
 
+  // test for Scala value class
   encodeDecodeTest(
 PrimitiveValueClass(42), "primitive value class")
-
   encodeDecodeTest(
 ReferenceValueClass(ReferenceValueClass.Container(1)), "reference 
value class")
+  encodeDecodeTest(StringWrapper("a"), "value class string")
+  encodeDecodeTest(ValueContainer(1, StringWrapper("b")), "value class 
nested")
+  encodeDecodeTest(
--- End diff --

sure, I added a test with `StringWrapper(null)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21402: SPARK-24355 Spark external shuffle server improvement to...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21402
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228732860
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

So after Scala compile, value class will be its underlying type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] map basestring to str for...

2018-10-27 Thread cloud-fan
Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/22858


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] map basestring to str for python...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22858
  
title updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98146/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228732805
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

@viirya we cannot because after compile, field `id: Id` becomes `id: Int`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #98146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98146/testReport)**
 for PR 21732 at commit 
[`fec1cac`](https://github.com/apache/spark/commit/fec1cac2c5f8fa5226001820c24fe5fc8304fe3f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
Oops, mind fixing PR title too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
@cloud-fan, thanks for doing this backport!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
Merged to branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98153/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98153/testReport)**
 for PR 22866 at commit 
[`00ae00a`](https://github.com/apache/spark/commit/00ae00ae4b365089a41e76e36e18955f62c48480).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98153/testReport)**
 for PR 22866 at commit 
[`00ae00a`](https://github.com/apache/spark/commit/00ae00ae4b365089a41e76e36e18955f62c48480).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4567/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22817: [SPARK-25816][SQL] Fix attribute resolution in ne...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22817#discussion_r228731805
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2578,4 +2578,12 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 Row ("abc", 1))
 }
   }
+
+  test("SPARK-25816 ResolveReferences works with nested extractors") {
+val df0 = Seq((1, Map(1 -> "a")), (2, Map(2 -> "b"))).toDF("1", "2")
+val df1 = df0.select($"1".as("2"), $"2".as("1"))
+val df2 = df1.filter($"1"(map_keys($"1")(0)) > "a")
--- End diff --

+1, I think the test can be simplified
```
val df = Seq((1, Map(1 -> "a")), (2, Map(2 -> "b"))).toDF("key", "map")
checkAnswer(df.select($"map"($"key")), Row("a") :: Row("b") :: Nil)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22858
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98151/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22858
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22858
  
**[Test build #98151 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98151/testReport)**
 for PR 22858 at commit 
[`1837449`](https://github.com/apache/spark/commit/18374490b689e88cdca1f0e191efbadc0fc28d46).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22817: [SPARK-25816][SQL] Fix attribute resolution in ne...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22817#discussion_r228731772
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2578,4 +2578,12 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 Row ("abc", 1))
 }
   }
+
+  test("SPARK-25816 ResolveReferences works with nested extractors") {
+val df0 = Seq((1, Map(1 -> "a")), (2, Map(2 -> "b"))).toDF("1", "2")
--- End diff --

can we use a normal name like `i`, `j` instead of 1, 2?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98152/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98152/testReport)**
 for PR 22866 at commit 
[`6fc37f2`](https://github.com/apache/spark/commit/6fc37f2c872c769fd52f74495f0e1d8abc80f49c).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

2018-10-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22809


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22809
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22865
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98142/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228731568
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

Btw vectorized reader at parquet side is in progress at Parquet side. Maybe 
ideally 3.0 is a good target to replace.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22865
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22865
  
**[Test build #98142 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98142/testReport)**
 for PR 22865 at commit 
[`af8a85a`](https://github.com/apache/spark/commit/af8a85ae4a1e477801bf104af6d4909cd822ba01).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22865
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98143/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22865
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22865
  
**[Test build #98143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98143/testReport)**
 for PR 22865 at commit 
[`af8a85a`](https://github.com/apache/spark/commit/af8a85ae4a1e477801bf104af6d4909cd822ba01).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228731507
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+// nested value class is treated as its underlying type
+// top level value class must be treated as a product
+val underlyingType = getUnderlyingTypeOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = t.typeSymbol.asClass.fullName
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
+  arg
--- End diff --

I have the same question. When we deserialize a value class, don't we 
always have `NewInstance` to construct it back? When constructing `User`, can't 
we pass in a `Id`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4566/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98152/testReport)**
 for PR 22866 at commit 
[`6fc37f2`](https://github.com/apache/spark/commit/6fc37f2c872c769fd52f74495f0e1d8abc80f49c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22853: [SPARK-25845][SQL] Fix MatchError for calendar interval ...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22853
  
LGTM. Do we target it to 2.4? The API in 2.4 is deprecated so I'm not sure 
if we still need to backport bug fixes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228731385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

I didn't document this intentionally because 
`spark.sql.parquet.enableVectorizedReader` can be disabled for other conditions 
even if it's enabled (like for complex types).

Shall we clearly document this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4564/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22858
  
**[Test build #98151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98151/testReport)**
 for PR 22858 at commit 
[`1837449`](https://github.com/apache/spark/commit/18374490b689e88cdca1f0e191efbadc0fc28d46).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22858
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98150 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98150/testReport)**
 for PR 22866 at commit 
[`ba37a8a`](https://github.com/apache/spark/commit/ba37a8aefaf52132f3046702fe3d808455c8d85d).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22858
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4565/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98150/testReport)**
 for PR 22866 at commit 
[`ba37a8a`](https://github.com/apache/spark/commit/ba37a8aefaf52132f3046702fe3d808455c8d85d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98150/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22858
  
Yup, I think strictly we should change. Looks there are two occurrences at 
`udf` and `pands_udf` `isinstance(..., str)`.

Another problem at PySpark is, inconsistent type comparison like type(...) 
== t` vs `isinstance(..., t)`. For instance, `type(...) == dict` vs 
`isinstance(..., dict)` - the former does not allow `OrderedDict` but the later 
allows.

Another problem is, some types like `bool` at Python inherits `int`. In 
this case, `isinstance(...)` might produce unexpected results, for instance, 

```python
>>> isinstance(True, int)
True
```

I was nervous about the cases above and didn't fix those changes so far. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4563/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98149/testReport)**
 for PR 22866 at commit 
[`78f47e8`](https://github.com/apache/spark/commit/78f47e899745b2bbd81c9aba49cc9a9fb9f10f2f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread mt40
Github user mt40 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228731280
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -358,4 +368,20 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, 
Int)]) == 1)
 assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, 
java.lang.Integer)]) == 0)
   }
+
+  test("schema for case class that is a value class") {
+val schema = schemaFor[TestingValueClass.IntWrapper]
+assert(schema === Schema(IntegerType, nullable = false))
+  }
+
+  test("schema for case class that contains value class fields") {
+val schema = schemaFor[TestingValueClass.ValueClassData]
+assert(schema === Schema(
+  StructType(Seq(
+StructField("intField", IntegerType, nullable = false),
+StructField("wrappedInt", IntegerType, nullable = false),
--- End diff --

yup, value class in general cannot be null since it is a subtype of AnyVal


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98149/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98149/testReport)**
 for PR 22866 at commit 
[`78f47e8`](https://github.com/apache/spark/commit/78f47e899745b2bbd81c9aba49cc9a9fb9f10f2f).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228731247
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -379,6 +388,28 @@ object ScalaReflection extends ScalaReflection {
   dataType = ObjectType(udt.getClass))
 Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
 
+  case t if isValueClass(t) =>
+val (_, underlyingType) = getUnderlyingParameterOf(t)
+val underlyingClsName = getClassNameFromType(underlyingType)
+val clsName = getUnerasedClassNameFromType(t)
+val newTypePath = s"""- Scala value class: 
$clsName($underlyingClsName)""" +:
+  walkedTypePath
+
+// Nested value class is treated as its underlying type
+// because the compiler will convert value class in the schema to
+// its underlying type.
+// However, for top-level value class, if it is used as another 
type
+// (e.g. as its parent trait or generic), the compiler keeps the 
class
+// so we must provide an instance of the class too. In other cases,
+// the compiler will handle wrapping/unwrapping for us 
automatically.
+val arg = deserializerFor(underlyingType, path, newTypePath)
+if (path.isDefined) {
--- End diff --

did you rebase? I think `path` is not `Option` anymore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98148/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98148 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98148/testReport)**
 for PR 22866 at commit 
[`b16de30`](https://github.com/apache/spark/commit/b16de300715749120957c8c185284fe9d72d5749).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228731209
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
 ---
@@ -297,11 +307,16 @@ class ExpressionEncoderSuite extends 
CodegenInterpretedPlanTest with AnalysisTes
 ExpressionEncoder.tuple(intEnc, ExpressionEncoder.tuple(intEnc, 
longEnc))
   }
 
+  // test for Scala value class
   encodeDecodeTest(
 PrimitiveValueClass(42), "primitive value class")
-
   encodeDecodeTest(
 ReferenceValueClass(ReferenceValueClass.Container(1)), "reference 
value class")
+  encodeDecodeTest(StringWrapper("a"), "value class string")
+  encodeDecodeTest(ValueContainer(1, StringWrapper("b")), "value class 
nested")
+  encodeDecodeTest(
--- End diff --

can we also test with null values?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22866
  
**[Test build #98148 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98148/testReport)**
 for PR 22866 at commit 
[`b16de30`](https://github.com/apache/spark/commit/b16de300715749120957c8c185284fe9d72d5749).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22866: [SPARK-12172][SPARKR] Remove internal-only RDD me...

2018-10-27 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/22866

[SPARK-12172][SPARKR] Remove internal-only RDD methods

## What changes were proposed in this pull request?

Remove non-public internal only methods for RDD in SparkR

## How was this patch tested?

Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rrddapi2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22866


commit b16de300715749120957c8c185284fe9d72d5749
Author: Felix Cheung 
Date:   2018-10-28T01:05:13Z

remove methods




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] use str instead of basest...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22858#discussion_r228731178
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2326,7 +2326,7 @@ def schema_of_json(json):
 >>> df.select(schema_of_json('{"a": 0}').alias("json")).collect()
 [Row(json=u'struct')]
 """
-if isinstance(json, basestring):
+if isinstance(json, str):
--- End diff --

Yea we should. They are put only when it's needed because there are so many 
cases like that (for instance, imap in Python 2 and map in Python 3)

Looks that's added in another PR in master beach only.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >