[ https://issues.apache.org/jira/browse/SPARK-48417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849399#comment-17849399 ]
Ravi Dalal commented on SPARK-48417: ------------------------------------ For anyone facing this issue, use following configuration to read file from GCS when spark.jars.packages is used: {code:java} config("spark.jars", "https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-2.2.22.jar") config("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS") config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"){code} When spark.jars.pacakges is not used, following configuration alone works: {code:java} config("spark.jars", "https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-2.2.22.jar") config("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS") {code} > Filesystems do not load with spark.jars.packages configuration > -------------------------------------------------------------- > > Key: SPARK-48417 > URL: https://issues.apache.org/jira/browse/SPARK-48417 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 3.5.1 > Reporter: Ravi Dalal > Priority: Major > Attachments: pyspark_mleap.py, > pyspark_spark_jar_package_config_logs.txt, > pyspark_without_spark_jar_package_config_logs.txt > > > When we use spark.jars.packages configuration parameter in Python > SparkSession Builder (Pyspark), it appears that the filesystems are not > loaded when session starts. Because of this, Spark fails to read file from > Google Cloud Storage (GCS) bucket (with GCS Connector). > I tested this with different packages so it does not appear specific to a > particular package. I will attach the sample code and debug logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org