[GitHub] spark pull request #22812: [SPARK-25817][SQL] Dataset encoder should support...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22812 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22812 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22309 According to the second rule, value class in Array needs to be instantiated. In current code, I think it should go to the special handling case of value class. Can you also add a test to verify it is working? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22309 How about value class in Array? Do we support it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228734242 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -635,13 +675,17 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } + // as a field, value class is represented by its underlying type + val trueFieldType = +if (isValueClass(fieldType)) getUnderlyingTypeOf(fieldType) else fieldType + val fieldValue = Invoke( -AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(fieldType), -returnNullable = !fieldType.typeSymbol.asClass.isPrimitive) - val clsName = getClassNameFromType(fieldType) +AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(trueFieldType), +returnNullable = !trueFieldType.typeSymbol.asClass.isPrimitive) + val clsName = getClassNameFromType(trueFieldType) --- End diff -- Thanks for explaining! In order to cover both value class instantiated and not instantiated cases, I think we may need this special handling. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228734075 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- yes @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22867 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22867 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22867 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22867: [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN...
GitHub user gss2002 opened a pull request: https://github.com/apache/spark/pull/22867 [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails ⦠â¦due lack of access to tmpDir from $PWD to HDFS WriteAheadLogBackedBlockRDD usage of java.io.tmpdir will fail if $PWD resolves to a folder in HDFS and the Spark YARN Cluster job does not have the correct access to this folder in regards to the dummy folder. So this patch provides an option to set spark.streaming.receiver.blockStore.tmpdir to override java.io.tmpdir which sets $PWD from YARN Cluster mode. ## What changes were proposed in this pull request? This change provides an option to override the java.io.tmpdir option so that when $PWD is resolved in YARN Cluster mode Spark does not attempt to use this folder and instead use the folder provided with the following option: spark.streaming.receiver.blockStore.tmpdir ## How was this patch tested? Patch was manually tested on a Spark Streaming Job with Write Ahead logs in Cluster mode. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gss2002/spark SPARK-25778 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22867.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22867 commit 4970827038fb18f74c1b7975d7bfc00609dc9405 Author: gss2002 Date: 2018-10-28T04:22:31Z [SPARK-25778] WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS WriteAheadLogBackedBlockRDD usage of java.io.tmpdir will fail if $PWD resolves to a folder in HDFS and the Spark YARN Cluster job does not have the correct access to this folder in regards to the dummy folder. So this patch provides an option to set spark.streaming.receiver.blockStore.tmpdir to override java.io.tmpdir which sets $PWD from YARN Cluster mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733929 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- So for Dataset, can we conclude that if a value class is top-level, it will be instantiated, but if it is nested field, it will be just underlying type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733771 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- No in that case `Id` will be `Int` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733678 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- If the `T` is `User`, will `Id` class be instantiated? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22797: [SPARK-19851][SQL] Add support for EVERY and ANY ...
Github user dilipbiswal closed the pull request at: https://github.com/apache/spark/pull/22797 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22809 Thanks a lot @cloud-fan @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user dilipbiswal closed the pull request at: https://github.com/apache/spark/pull/22047 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22812 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98147/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22812 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22812 **[Test build #98147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98147/testReport)** for PR 22812 at commit [`517bebf`](https://github.com/apache/spark/commit/517bebfb1e49f2315019696a50b657dcf715778c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22309 **[Test build #98155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98155/testReport)** for PR 22309 at commit [`1eab8b5`](https://github.com/apache/spark/commit/1eab8b5c30695fdd25d02961e40d6ca9c0fda601). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733525 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- Oh, sorry I didn't see the identify case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user mt40 commented on the issue: https://github.com/apache/spark/pull/22309 @cloud-fan @viirya I was silly @_@, I forgot to push the change for that part and missed that error message from Jenkins as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733511 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- hmm, I think the first rule is that a value class is treated as another type. Isn't `T` just `Id`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22309 ``` [error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:401: value isDefined is not a member of org.apache.spark.sql.catalyst.expressions.Expression [error] if (path.isDefined) { [error] ^ ``` You need to rebase with current master. `path` is not `Option` now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733409 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- it is the 1st rule where `Id` is treated as a type parameter, you can see the `def identity[T]` example --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22309 try to build sql/core? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user mt40 commented on the issue: https://github.com/apache/spark/pull/22309 hi @viirya, may I ask whether Jenkins is having issue? It reports ``` [error] (catalyst/compile:compileIncremental) Compilation failed [error] Total time: 175 s, completed Oct 27, 2018 8:33:41 PM [error] running /home/jenkins/workspace/SparkPullRequestBuilder/build/sbt -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly ; received return code 1 ``` but I can build project `catalyst` fine on my laptop --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733287 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- > For example, in method encodeDecodeTest, if we pass an instance of Id as input, it will not be converted to Int. Why in this case, `Id` won't be a Int? It seems to not be in the three rules. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22309 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98154/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22309 **[Test build #98154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98154/testReport)** for PR 22309 at commit [`bf8a6de`](https://github.com/apache/spark/commit/bf8a6de0bd6029c7a94d00d2c0f04e502d440e9f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` encodeDecodeTest(ValueContainer(1, StringWrapper(null)), \"nested value class with null\")` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22309 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733161 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -379,6 +388,28 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for top-level value class, if it is used as another type +// (e.g. as its parent trait or generic), the compiler keeps the class +// so we must provide an instance of the class too. In other cases, +// the compiler will handle wrapping/unwrapping for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { --- End diff -- No, I haven't rebased since last week. Hmm, now since `path` is not `Option` anymore, I think I have to use `if (walkedTypePath.length > 1)` to have the same logic. But this seems a little bit hacky. Do you have any suggestion on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733107 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- yes, but there are some cases where an instance of value class must be created. You can see this [allocation rule](https://docs.scala-lang.org/overviews/core/value-classes.html#allocation-details). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22309 **[Test build #98154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98154/testReport)** for PR 22309 at commit [`bf8a6de`](https://github.com/apache/spark/commit/bf8a6de0bd6029c7a94d00d2c0f04e502d440e9f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228733087 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala --- @@ -297,11 +307,16 @@ class ExpressionEncoderSuite extends CodegenInterpretedPlanTest with AnalysisTes ExpressionEncoder.tuple(intEnc, ExpressionEncoder.tuple(intEnc, longEnc)) } + // test for Scala value class encodeDecodeTest( PrimitiveValueClass(42), "primitive value class") - encodeDecodeTest( ReferenceValueClass(ReferenceValueClass.Container(1)), "reference value class") + encodeDecodeTest(StringWrapper("a"), "value class string") + encodeDecodeTest(ValueContainer(1, StringWrapper("b")), "value class nested") + encodeDecodeTest( --- End diff -- sure, I added a test with `StringWrapper(null)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21402: SPARK-24355 Spark external shuffle server improvement to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21402 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228732860 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- So after Scala compile, value class will be its underlying type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] map basestring to str for...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/22858 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] map basestring to str for python...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22858 title updated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98146/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228732805 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- @viirya we cannot because after compile, field `id: Id` becomes `id: Int` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #98146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98146/testReport)** for PR 21732 at commit [`fec1cac`](https://github.com/apache/spark/commit/fec1cac2c5f8fa5226001820c24fe5fc8304fe3f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 Oops, mind fixing PR title too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 @cloud-fan, thanks for doing this backport! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 Merged to branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98153/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98153/testReport)** for PR 22866 at commit [`00ae00a`](https://github.com/apache/spark/commit/00ae00ae4b365089a41e76e36e18955f62c48480). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98153/testReport)** for PR 22866 at commit [`00ae00a`](https://github.com/apache/spark/commit/00ae00ae4b365089a41e76e36e18955f62c48480). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4567/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22817: [SPARK-25816][SQL] Fix attribute resolution in ne...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22817#discussion_r228731805 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2578,4 +2578,12 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { Row ("abc", 1)) } } + + test("SPARK-25816 ResolveReferences works with nested extractors") { +val df0 = Seq((1, Map(1 -> "a")), (2, Map(2 -> "b"))).toDF("1", "2") +val df1 = df0.select($"1".as("2"), $"2".as("1")) +val df2 = df1.filter($"1"(map_keys($"1")(0)) > "a") --- End diff -- +1, I think the test can be simplified ``` val df = Seq((1, Map(1 -> "a")), (2, Map(2 -> "b"))).toDF("key", "map") checkAnswer(df.select($"map"($"key")), Row("a") :: Row("b") :: Nil) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98151/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22858 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22858 **[Test build #98151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98151/testReport)** for PR 22858 at commit [`1837449`](https://github.com/apache/spark/commit/18374490b689e88cdca1f0e191efbadc0fc28d46). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22817: [SPARK-25816][SQL] Fix attribute resolution in ne...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22817#discussion_r228731772 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2578,4 +2578,12 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { Row ("abc", 1)) } } + + test("SPARK-25816 ResolveReferences works with nested extractors") { +val df0 = Seq((1, Map(1 -> "a")), (2, Map(2 -> "b"))).toDF("1", "2") --- End diff -- can we use a normal name like `i`, `j` instead of 1, 2? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98152/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98152/testReport)** for PR 22866 at commit [`6fc37f2`](https://github.com/apache/spark/commit/6fc37f2c872c769fd52f74495f0e1d8abc80f49c). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22809 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22809 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22865 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98142/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22865#discussion_r228731568 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -462,7 +462,7 @@ object SQLConf { val PARQUET_RECORD_FILTER_ENABLED = buildConf("spark.sql.parquet.recordLevelFilter.enabled") .doc("If true, enables Parquet's native record-level filtering using the pushed down " + "filters. This configuration only has an effect when 'spark.sql.parquet.filterPushdown' " + - "is enabled.") + "is enabled and spark.sql.parquet.enableVectorizedReader is disabled.") --- End diff -- Btw vectorized reader at parquet side is in progress at Parquet side. Maybe ideally 3.0 is a good target to replace. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22865 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22865 **[Test build #98142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98142/testReport)** for PR 22865 at commit [`af8a85a`](https://github.com/apache/spark/commit/af8a85ae4a1e477801bf104af6d4909cd822ba01). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22865 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98143/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22865 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22865 **[Test build #98143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98143/testReport)** for PR 22865 at commit [`af8a85a`](https://github.com/apache/spark/commit/af8a85ae4a1e477801bf104af6d4909cd822ba01). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228731507 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -376,6 +387,23 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +// nested value class is treated as its underlying type +// top level value class must be treated as a product +val underlyingType = getUnderlyingTypeOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = t.typeSymbol.asClass.fullName +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { + arg --- End diff -- I have the same question. When we deserialize a value class, don't we always have `NewInstance` to construct it back? When constructing `User`, can't we pass in a `Id`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4566/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98152/testReport)** for PR 22866 at commit [`6fc37f2`](https://github.com/apache/spark/commit/6fc37f2c872c769fd52f74495f0e1d8abc80f49c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22853: [SPARK-25845][SQL] Fix MatchError for calendar interval ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22853 LGTM. Do we target it to 2.4? The API in 2.4 is deprecated so I'm not sure if we still need to backport bug fixes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22865#discussion_r228731385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -462,7 +462,7 @@ object SQLConf { val PARQUET_RECORD_FILTER_ENABLED = buildConf("spark.sql.parquet.recordLevelFilter.enabled") .doc("If true, enables Parquet's native record-level filtering using the pushed down " + "filters. This configuration only has an effect when 'spark.sql.parquet.filterPushdown' " + - "is enabled.") + "is enabled and spark.sql.parquet.enableVectorizedReader is disabled.") --- End diff -- I didn't document this intentionally because `spark.sql.parquet.enableVectorizedReader` can be disabled for other conditions even if it's enabled (like for complex types). Shall we clearly document this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4564/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22858 **[Test build #98151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98151/testReport)** for PR 22858 at commit [`1837449`](https://github.com/apache/spark/commit/18374490b689e88cdca1f0e191efbadc0fc28d46). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22858 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98150/testReport)** for PR 22866 at commit [`ba37a8a`](https://github.com/apache/spark/commit/ba37a8aefaf52132f3046702fe3d808455c8d85d). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4565/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98150/testReport)** for PR 22866 at commit [`ba37a8a`](https://github.com/apache/spark/commit/ba37a8aefaf52132f3046702fe3d808455c8d85d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98150/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22858 Yup, I think strictly we should change. Looks there are two occurrences at `udf` and `pands_udf` `isinstance(..., str)`. Another problem at PySpark is, inconsistent type comparison like type(...) == t` vs `isinstance(..., t)`. For instance, `type(...) == dict` vs `isinstance(..., dict)` - the former does not allow `OrderedDict` but the later allows. Another problem is, some types like `bool` at Python inherits `int`. In this case, `isinstance(...)` might produce unexpected results, for instance, ```python >>> isinstance(True, int) True ``` I was nervous about the cases above and didn't fix those changes so far. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4563/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98149/testReport)** for PR 22866 at commit [`78f47e8`](https://github.com/apache/spark/commit/78f47e899745b2bbd81c9aba49cc9a9fb9f10f2f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user mt40 commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228731280 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala --- @@ -358,4 +368,20 @@ class ScalaReflectionSuite extends SparkFunSuite { assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, Int)]) == 1) assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, java.lang.Integer)]) == 0) } + + test("schema for case class that is a value class") { +val schema = schemaFor[TestingValueClass.IntWrapper] +assert(schema === Schema(IntegerType, nullable = false)) + } + + test("schema for case class that contains value class fields") { +val schema = schemaFor[TestingValueClass.ValueClassData] +assert(schema === Schema( + StructType(Seq( +StructField("intField", IntegerType, nullable = false), +StructField("wrappedInt", IntegerType, nullable = false), --- End diff -- yup, value class in general cannot be null since it is a subtype of AnyVal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98149/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98149/testReport)** for PR 22866 at commit [`78f47e8`](https://github.com/apache/spark/commit/78f47e899745b2bbd81c9aba49cc9a9fb9f10f2f). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228731247 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -379,6 +388,28 @@ object ScalaReflection extends ScalaReflection { dataType = ObjectType(udt.getClass)) Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: Nil) + case t if isValueClass(t) => +val (_, underlyingType) = getUnderlyingParameterOf(t) +val underlyingClsName = getClassNameFromType(underlyingType) +val clsName = getUnerasedClassNameFromType(t) +val newTypePath = s"""- Scala value class: $clsName($underlyingClsName)""" +: + walkedTypePath + +// Nested value class is treated as its underlying type +// because the compiler will convert value class in the schema to +// its underlying type. +// However, for top-level value class, if it is used as another type +// (e.g. as its parent trait or generic), the compiler keeps the class +// so we must provide an instance of the class too. In other cases, +// the compiler will handle wrapping/unwrapping for us automatically. +val arg = deserializerFor(underlyingType, path, newTypePath) +if (path.isDefined) { --- End diff -- did you rebase? I think `path` is not `Option` anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98148/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98148/testReport)** for PR 22866 at commit [`b16de30`](https://github.com/apache/spark/commit/b16de300715749120957c8c185284fe9d72d5749). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22309#discussion_r228731209 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala --- @@ -297,11 +307,16 @@ class ExpressionEncoderSuite extends CodegenInterpretedPlanTest with AnalysisTes ExpressionEncoder.tuple(intEnc, ExpressionEncoder.tuple(intEnc, longEnc)) } + // test for Scala value class encodeDecodeTest( PrimitiveValueClass(42), "primitive value class") - encodeDecodeTest( ReferenceValueClass(ReferenceValueClass.Container(1)), "reference value class") + encodeDecodeTest(StringWrapper("a"), "value class string") + encodeDecodeTest(ValueContainer(1, StringWrapper("b")), "value class nested") + encodeDecodeTest( --- End diff -- can we also test with null values? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: [SPARK-12172][SPARKR] Remove internal-only RDD methods
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22866 **[Test build #98148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98148/testReport)** for PR 22866 at commit [`b16de30`](https://github.com/apache/spark/commit/b16de300715749120957c8c185284fe9d72d5749). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22866: [SPARK-12172][SPARKR] Remove internal-only RDD me...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/22866 [SPARK-12172][SPARKR] Remove internal-only RDD methods ## What changes were proposed in this pull request? Remove non-public internal only methods for RDD in SparkR ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rrddapi2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22866 commit b16de300715749120957c8c185284fe9d72d5749 Author: Felix Cheung Date: 2018-10-28T01:05:13Z remove methods --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] use str instead of basest...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22858#discussion_r228731178 --- Diff: python/pyspark/sql/functions.py --- @@ -2326,7 +2326,7 @@ def schema_of_json(json): >>> df.select(schema_of_json('{"a": 0}').alias("json")).collect() [Row(json=u'struct')] """ -if isinstance(json, basestring): +if isinstance(json, str): --- End diff -- Yea we should. They are put only when it's needed because there are so many cases like that (for instance, imap in Python 2 and map in Python 3) Looks that's added in another PR in master beach only. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org