[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 @michalsenkyr please create 2 more tickets for the optimization you metioned in https://github.com/apache/spark/pull/16240#issuecomment-266318016 and the nested custom collection problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70943/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #70943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70943/testReport)** for PR 16240 at commit [`68810c4`](https://github.com/apache/spark/commit/68810c4efb445d237604b661511b161a1c42a9bd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #70943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70943/testReport)** for PR 16240 at commit [`68810c4`](https://github.com/apache/spark/commit/68810c4efb445d237604b661511b161a1c42a9bd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 For future reference: https://github.com/apache/spark/blob/master/dev/mima (script to run mima) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Not sure how to run MiMa tests locally so I tried my best to figure out what was necessary. Hope this fixes it. The downside of the fix is that I had to restore the original methods in `SQLImplicits`. I removed the `implicit` keyword and added deprecation annotations as only the new methods should be used from now on. Old code importing the methods explicitly should be fine now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 you need to fix mima: ``` [error] * method newDoubleSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLImplicits.newDoubleSeqEncoder") [error] * method newFloatSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLImplicits.newFloatSeqEncoder") [error] * method newByteSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLImplicits.newByteSeqEncoder") [error] * method newLongSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLImplicits.newLongSeqEncoder") [error] * method newStringSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLImplicits.newStringSeqEncoder") [error] * method newIntSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLImplicits.newIntSeqEncoder") [error] * method newBooleanSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLImplicits.newBooleanSeqEncoder") [error] * method newShortSeqEncoder()org.apache.spark.sql.Encoder in class org.apache.spark.sql.SQLImplicits does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLImplicits.newShortSeqEncoder") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #70859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70859/testReport)** for PR 16240 at commit [`efd0801`](https://github.com/apache/spark/commit/efd0801e24088b90c1157de0cb0bfe8159aeaac5). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SeqCC(s: Seq[Int])` * `case class ListCC(l: List[Int])` * `case class QueueCC(q: Queue[Int])` * `case class ComplexCC(seq: SeqCC, list: ListCC, queue: QueueCC)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70859/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #70859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70859/testReport)** for PR 16240 at commit [`efd0801`](https://github.com/apache/spark/commit/efd0801e24088b90c1157de0cb0bfe8159aeaac5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 LGTM, please create 2 more tickets for the optimization you metioned in https://github.com/apache/spark/pull/16240#issuecomment-266318016 and the nested custom collection. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 I actually read that but IDEA complained when I tried to place the `Product` encoder into a separate trait. So I opted for specificity. However, I tried it again right now and even though IDEA still complains, scalac compiles it, all the tests pass and it all works. So I will change it. I will also test whether this fixes the `toDS`-specific problems before pushing the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 How about we assign priority to implicit rules like http://stackoverflow.com/questions/1886953/is-there-a-way-to-control-which-implicit-conversion-will-be-the-default-used ? I think we should prefer `Seq` encoder over `Product` encoder, for `Seq with Product` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 None of them. The compilation will fail. That is why I had to provide those additional implicits. ``` scala> class Test[T] defined class Test scala> implicit def test1[T <: Seq[String]]: Test[T] = null test1: [T <: Seq[String]]=> Test[T] scala> implicit def test2[T <: Product]: Test[T] = null test2: [T <: Product]=> Test[T] scala> def test[T : Test](t: T) = null test: [T](t: T)(implicit evidence$1: Test[T])Null scala> test(List("abc")) :31: error: ambiguous implicit values: both method test1 of type [T <: Seq[String]]=> Test[T] and method test2 of type [T <: Product]=> Test[T] match expected type Test[List[String]] test(List("abc")) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 The overall strategy LGTM. > I had to alter and add new implicit encoders into SQLImplicits. The new encoders are for Seq with Product combination (essentially only List) to disambiguate between Seq and Product encoders. Does scala have a clear definition for this case? i.e. we have implicit for both type `A` and `B`, given type `A with B`, which implicit will be picked? For the optimization, we can do it in follow-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16240 /cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Possible optimization: Instead of conversions using `to`, we can use `Builder`s. This way we could get rid of the conversion overhead. This would require adding a new codegen method that would operate similarly to `MapObjects` but use a provided `Builder` to build the collection directly. I will wait for a response to this PR before attempting any more modifications. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Added support for arbitrary sequences. Now also Queues, ArrayBuffers and such can be used in datasets (all are serialized into ArrayType). I had to alter and add new implicit encoders into `SQLImplicits`. The new encoders are for `Seq` with `Product` combination (essentially only `List`) to disambiguate between `Seq` and `Product` encoders. However, I encountered a problem with implicits. When constructing a complex Dataset using `Seq.toDS` that includes a `Product` (like a case class) and a sequence, the encoder doesn't seem to be created. When constructed with `spark.createDataset` or when transforming an existing dataset, there is no problem. I added a workaround by defining a specific implicit just for `Seq`s. This makes the problem go away for existing usages, however other collections cannot be constructed by `Seq.toDS` unless `newProductSeqEncoder[A, T]` is created with the correct type parameters. If anybody knows how to fix this, let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #3488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3488/consoleFull)** for PR 16240 at commit [`8c15b47`](https://github.com/apache/spark/commit/8c15b475fb053aef19906d6a465309d299ca7b4d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16240 **[Test build #3488 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3488/consoleFull)** for PR 16240 at commit [`8c15b47`](https://github.com/apache/spark/commit/8c15b475fb053aef19906d6a465309d299ca7b4d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 I would like to add that the conversion is specific to `List[_]`. I can add support for arbitrary sequence types through the use of `CanBuildFrom` if it is desirable. We can also support `Set`s in this way by serializing into arrays. See SPARK-17414 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16240 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org