[GitHub] [hudi] a0x commented on issue #4442: [SUPPORT] PySpark(3.1.2) with Hudi(0.10.0) failed when querying spark sql
a0x commented on issue #4442: URL: https://github.com/apache/hudi/issues/4442#issuecomment-1005363447 Finally I fixed this problem by removing aws deps in `packing/hudi-spark-bundle/pom.xml` and recompiling it myself. ```xml com.amazonaws:dynamodb-lock-client com.amazonaws:aws-java-sdk-dynamodb com.amazonaws:aws-java-sdk-core ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] a0x commented on issue #4442: [SUPPORT] PySpark(3.1.2) with Hudi(0.10.0) failed when querying spark sql
a0x commented on issue #4442: URL: https://github.com/apache/hudi/issues/4442#issuecomment-1004539542 @kazdy I did recompile Hudi packages as the mentioned config, yet the error remains. This is an interesting problem, because all things good in `spark-shell`, yet the problem occues **only in PySpark**. So I think the lib confliction is hidden in the diff between `spark-shell` and `pyspark`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] a0x commented on issue #4442: [SUPPORT] PySpark(3.1.2) with Hudi(0.10.0) failed when querying spark sql
a0x commented on issue #4442: URL: https://github.com/apache/hudi/issues/4442#issuecomment-1004486507 > I have the same issue when running hudi on emr. This issue seems to have the same root cause as in this one: #4474 . The solution is to shade and relocate aws dependencies introduced in hudi-aws: > > > For our internal hudi version, we shade aws dependencies, you can add new relocation and build a new bundle package: > > For example, to shade aws dependencies in spark, add following codes in **packaging/hudi-spark-bundle/pom.xml** > > ``` > > > > > > com.amazonaws. > > ${spark.bundle.spark.shade.prefix}com.amazonaws. > > > > ``` > > @xushiyan should this relocation be added to the official hudi release to avoid such conflicts? Thank you! This should work. But shall we shade all aws deps in Spark? I'm worrying about the side effict, but let me have a try before replying in that issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] a0x commented on issue #4442: [SUPPORT] PySpark(3.1.2) with Hudi(0.10.0) failed when querying spark sql
a0x commented on issue #4442: URL: https://github.com/apache/hudi/issues/4442#issuecomment-1004485623 @xushiyan Thanks for your reply. Do you mean not to replace Hudi 0.8.0 bundled in EMR and start spark session with Hudi 0.10.0 which in another separated dir? To be honest I think it's not a good idea. When I dug into the error, I realized this was the problem inside the aws java sdk bundled in EMR Spark library. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org