[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19113 We didn't accept parquet 1.9.0 because it has a known performance regression, I think this one is fine, merging to master, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19113 If we need 2.5.x for the fix, then we need 2.5.x. It's worth picking up an update if it solves a real problem. And if we're going to update minor versions, it's generally good practice to pick the latest maintenance release unless there's a specific reason not to. I don't think we have any general policy against using the latest version of something; on the contrary. Parquet is more critical and perhaps less reliable about maintaining the exact behavior, so maybe deserves more caution, but this change seems fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19113 Since the expected release of our next version Spark 2.3 is the end of this year, we still can revert it back to 2.2.1 if we realize this release 2.5.4 introduces new bugs or performance regression. I am fine to merge it now. Let @rxin @marmbrus @cloud-fan do the final confirm. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19113 This release of Univocity was just out a few days ago. To me, this sound risky. We normally do not upgrade it to the latest version. This is why we are not using Parquet 1.9.0. Instead, we asking Parquet community to release 1.8.2. cc @rxin @marmbrus @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19113 With 2.7G data, I ran a simple Java problem with 2.5.4 and 2.2.1 with `CsvParser`, and simple e2e read tests. Elapsed time diff was roughly -1.7% ~ +1.2%. I think virtually no diff (or 0.5 improvement). I think we generally trust other communities and libraries we decided to add such as ORC, Parquet, Jackson and etc., and de-duplicate such efforts with the community support. I think we discussed about this before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19113 How about the other popular open source projects? Do you know whether which projects are using Univocity 2.5? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19113 Any performance measure from 2.2 to 2.5? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19113 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19113 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81368/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19113 **[Test build #81368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81368/testReport)** for PR 19113 at commit [`fa7eb51`](https://github.com/apache/spark/commit/fa7eb514cfd91bb405fd74680b08d5865911e3f0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19113 **[Test build #81368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81368/testReport)** for PR 19113 at commit [`fa7eb51`](https://github.com/apache/spark/commit/fa7eb514cfd91bb405fd74680b08d5865911e3f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org