hudi-bot opened a new issue, #15458: URL: https://github.com/apache/hudi/issues/15458
Currently, the GCS Ingestion (HUDI-4850) expects recent versions of Jars like protobuf and Guava to be provided to spark-submit explicitly, to override older versions shipped with Spark. These Jars are used by the gcs-connector which is a library from Google that helps connect to GCS. For more details see [https://docs.google.com/document/d/1VfvtdvhXw6oEHPgZ_4Be2rkPxIzE0kBCNUiVDsXnSAA/edit#] (section titled "Configure Spark to use newer versions of some Jars"). See if it's possible to create a shaded+fat jar of gcs-connector for this use case instead, and avoid specifying things to spark-submit on the command line. An alternate approach to consider for the long term is HUDI-4930 (slim bundles). ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-4931 - Type: Task - Epic: https://issues.apache.org/jira/browse/HUDI-1896 --- ## Comments 28/Sep/22 08:14;pramodbiligiri;Some useful references regarding this: - GCP docs on Cloud Storage connector: [https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage] - Hudi docs on GCS connectivity: https://hudi.apache.org/docs/gcs_hoodie/ ;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
