[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user lokkju commented on the issue: https://github.com/apache/spark/pull/16479 So it turns out just copying the conversion code doesn't work, as seen in spark-avro/#240 - and now I'm running into the same thing writing my own datasource. As an datasource in the end requires implementing a class that extends OutputWriter, and the OutputWriter interface changed, a datasource plugin doesn't seem to be able to support both pre and post versions of 2.2.x in the same plugin. Any suggestions on how to handle this, without requiring users to match a the spark version to the new datasource version? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16479 This is a common issue of the data source v1, it's not powerful enough and you have to use some Spark internal APIs and hit compatibility problem... AFAIK a workable solution is to create different branches for different Spark versions, or using some dirty reflection workarounds. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user lokkju commented on the issue: https://github.com/apache/spark/pull/16479 I'd be interested in the "dirty reflection workarounds", if you have examples. Not sure how I'd use reflection to handle conflicting interface definitions, but I'd love to how to. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16479 Here is a better solution I found: https://github.com/databricks/spark-avro/pull/217/files#diff-3086eddba29f4034c324541695a2357b implementing different `OutputWriterFactory` and switch then with build files. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user lokkju commented on the issue: https://github.com/apache/spark/pull/16479 So it essentially compiles each implementation against different spark versions, then *both* bytecodes are included in the final jar? Then reflection to instantiate it. That works, without too much pain. Might go that route, thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16479 **[Test build #70932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70932/testReport)** for PR 16479 at commit [`1de8e49`](https://github.com/apache/spark/commit/1de8e4946ed1f0c1ae5738b872acb6b995a8295f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16479 cc @liancheng @gatorsmile @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16479 **[Test build #70932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70932/testReport)** for PR 16479 at commit [`1de8e49`](https://github.com/apache/spark/commit/1de8e4946ed1f0c1ae5738b872acb6b995a8295f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16479 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70932/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16479 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16479 What is the benefit of making these changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16479 **[Test build #70947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70947/testReport)** for PR 16479 at commit [`79bb30c`](https://github.com/apache/spark/commit/79bb30cf222c43c98d4d52ab207d65fdca1f83b5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16479 @yhuai It removes unnecessary code to make the codebase easier to maintain. Besides, the libsvm relation should be a little faster as it doesn't need to go through a converter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16479 **[Test build #70947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70947/testReport)** for PR 16479 at commit [`79bb30c`](https://github.com/apache/spark/commit/79bb30cf222c43c98d4d52ab207d65fdca1f83b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70947/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16479 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16479 **[Test build #70971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70971/testReport)** for PR 16479 at commit [`110bcdf`](https://github.com/apache/spark/commit/110bcdfd81bd74379ef67c57c287e25fcb5b48ca). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16479 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16479 **[Test build #70971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70971/testReport)** for PR 16479 at commit [`110bcdf`](https://github.com/apache/spark/commit/110bcdfd81bd74379ef67c57c287e25fcb5b48ca). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16479 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70971/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16479 **[Test build #70973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70973/testReport)** for PR 16479 at commit [`902f17a`](https://github.com/apache/spark/commit/902f17abd1d076ef08347f383f4dd79e9a28e792). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16479 **[Test build #70973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70973/testReport)** for PR 16479 at commit [`902f17a`](https://github.com/apache/spark/commit/902f17abd1d076ef08347f383f4dd79e9a28e792). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70973/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16479 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16479 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16479 thanks for the review, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/16479 how "internal" are these interfaces really? every time a change like this is made spark-avro breaks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16479 Everything in package `org.apache.spark.sql.execution` should be internal to Spark SQL. Technically you can still implement `OutputWriter` outside of Spark, but there is no guarantee about the stability. Ideally we should not change any interface if unnecessary, but this change is reasonable. As an internal interface, it's more efficient to use `InternalRow` directly, instead of converting `InternalRow` to `Row` and then operate on `Row`. I'm sorry that this breaks spark-avro, but we can make spark-avro more efficient by switching to the new interface. Or we can just copy the previous conversion code to spark-avro, so that we can still covert `InternalRow` to `Row` and operate on `Row` in spark-avro. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/16479 i will just copy the conversion code over for now thx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org