[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-919308510 #33989 seems a promising direction. Close this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-875177916 Hmm, I looked at `isSharedClass`, looks like `common-lang3`, orc, etc. are already non-shared classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-875060042 > Oh I didn't even realize that Spark is using `hive-exec-core` jar. Does that mean it doesn't take advantage of the Guava shading from Hive 2.3.8+ at all? Yea, I'm afraid that it is true. If we want to completely isolate dependencies from Hive, we may need to relocate all included (but not relocated) dependencies in `hive-exec` w/o classifier. > One idea is to have Spark use [`hadoop-shaded-guava`](https://github.com/apache/hadoop-thirdparty) which is also 30.1.1-jre. It also makes sure that Spark always use the same Guava version as Hadoop. Even Spark uses `hadoop-shaded-guava`, but `hive-exec` still needs older Guava if we cannot use the version w/o classifier (due to other dependencies e.g. common-lang3, orc, parquet..) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-873359259 Encountered some issues. Although we can switch to hive-exec without classifier (shaded version) to get rid of above guava version issue, the shaded hive-exec contains (without relocation) some dependencies like commons-lang3, orc that are not same version with Spark and so they conflict. Because shaded hive-exec jar already includes these dependency jars, seems dependency exclusions in pom cannot exclude them. Currently seems we can just go back to Hive to shade every included dependencies? Any other thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-872735918 Hmm, from the failed tests below: org.apache.spark.sql.hive.DataSourceWithHiveMetastoreCatalogSuite org.apache.spark.sql.hive.HiveExternalCatalogSuite org.apache.spark.sql.hive.StatisticsSuite Since Guava 20, `com.google.common.collect.Iterators.emptyIterator()` is not public anymore. But I don't get it because Hive 2.3.8/2.3.9 shaded guava. Why it will use the newer guava upgraded here? ``` java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; from class org.apache.hadoop.hive.ql.exec.FetchOperator at org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$runHive$1(HiveClientImpl.scala:831) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-871990967 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-871122684 I'm not against to this point. I can change to latest guava and see what CI tells. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-871101530 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-870850274 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-870811211 try this again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-694355931 Isn't HADOOP-14284 resolved as Invalid? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-670608710 @dongjoon-hyun Thanks for the comment. Yeah, it doesn't make sense to upgrade to Hive 4 in short or midterm. I'm working on upgrade Guava 27 and shading Guava in Hive too. I hope it can be part of Hive 2.3.8. I will close this for now. Once the work at Hive gets progress, I can reopen this. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-668878065 I did some tests. Few changes are required to pass the failed Hive tests: 1. Shading Guava at hive-exec packaging and a few code changes to hive-common and hive-exec regarding Guava usage 2. Don't use `core` classifier for hive dependencies in Spark But this just upgrades Guava version used in Spark. Hive dependencies still use older Guava with the reported CVE. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-668116815 Opened https://issues.apache.org/jira/browse/HIVE-23980 and see if Hive people has some ideas. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29326: [WIP][SPARK-32502][BUILD] Upgrade Guava to 27.0-jre and Hadoop to 3.2.1
viirya commented on pull request #29326: URL: https://github.com/apache/spark/pull/29326#issuecomment-667801138 It is a trouble that hive-exec uses a method that became package-private since Guava version 20. So there is incompatibility with Guava versions > 19.0. ``` sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; from class org.apache.hadoop.hive.ql.exec.FetchOperator at org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) ``` hive-exec doesn't shade Guava until https://issues.apache.org/jira/browse/HIVE-22126 that targets 4.0.0. This seems a dead end for upgrading Guava in Spark for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org