[GitHub] [hudi] a0x commented on issue #4442: [SUPPORT] PySpark(3.1.2) with Hudi(0.10.0) failed when querying spark sql

2022-01-04 Thread GitBox


a0x commented on issue #4442:
URL: https://github.com/apache/hudi/issues/4442#issuecomment-1005363447


   Finally I fixed this problem by removing aws deps in 
`packing/hudi-spark-bundle/pom.xml` and recompiling it myself.
   
   ```xml
   
   com.amazonaws:dynamodb-lock-client
   com.amazonaws:aws-java-sdk-dynamodb
   com.amazonaws:aws-java-sdk-core
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] a0x commented on issue #4442: [SUPPORT] PySpark(3.1.2) with Hudi(0.10.0) failed when querying spark sql

2022-01-03 Thread GitBox


a0x commented on issue #4442:
URL: https://github.com/apache/hudi/issues/4442#issuecomment-1004539542


   @kazdy I did recompile Hudi packages as the mentioned config, yet the error 
remains.
   
   This is an interesting problem, because all things good in `spark-shell`, 
yet the problem occues **only in PySpark**.
   
   So I think the lib confliction is hidden in the diff between `spark-shell` 
and `pyspark`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] a0x commented on issue #4442: [SUPPORT] PySpark(3.1.2) with Hudi(0.10.0) failed when querying spark sql

2022-01-03 Thread GitBox


a0x commented on issue #4442:
URL: https://github.com/apache/hudi/issues/4442#issuecomment-1004486507


   > I have the same issue when running hudi on emr. This issue seems to have 
the same root cause as in this one: #4474 . The solution is to shade and 
relocate aws dependencies introduced in hudi-aws:
   > 
   > > For our internal hudi version, we shade aws dependencies, you can add 
new relocation and build a new bundle package:
   > > For example, to shade aws dependencies in spark, add following codes in 
**packaging/hudi-spark-bundle/pom.xml**
   > > ```
   > > 
   > > 
   > >  com.amazonaws.
   > >  
${spark.bundle.spark.shade.prefix}com.amazonaws.
   > > 
   > > ```
   > 
   > @xushiyan should this relocation be added to the official hudi release to 
avoid such conflicts?
   
   Thank you! This should work.
   
   But shall we shade all aws deps in Spark? I'm worrying about the side 
effict, but let me have a try before replying in that issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] a0x commented on issue #4442: [SUPPORT] PySpark(3.1.2) with Hudi(0.10.0) failed when querying spark sql

2022-01-03 Thread GitBox


a0x commented on issue #4442:
URL: https://github.com/apache/hudi/issues/4442#issuecomment-1004485623


   @xushiyan Thanks for your reply.
   
   Do you mean not to replace Hudi 0.8.0 bundled in EMR and start spark session 
with Hudi 0.10.0 which in another separated dir?
   
   To be honest I think it's not a good idea.
   
   When I dug into the error, I realized this was the problem inside the aws 
java sdk bundled in EMR Spark library.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org