[ https://issues.apache.org/jira/browse/SPARK-44883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dipayan Dev updated SPARK-44883: -------------------------------- Description: In our Organisation, we are using GCS bucket root location to point to our Hive table. Dataproc's latest 2.1 uses *Spark* *3.3.0* and this needs to be fixed. Spark Scala code to reproduce this issue {noformat} val DF = Seq(("test1", 123)).toDF("name", "num") DF.write.option("path", "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name") val DF1 = Seq(("test2", 125)).toDF("name", "num") DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name") java.lang.NullPointerException at org.apache.hadoop.fs.Path.<init>(Path.java:141) at org.apache.hadoop.fs.Path.<init>(Path.java:120) at org.apache.hadoop.fs.Path.suffix(Path.java:441) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254) {noformat} Looks like the issue is coming from Hadoop Path. {noformat} scala> import org.apache.hadoop.fs.Path import org.apache.hadoop.fs.Path scala> val path: Path = new Path("gs://test_dd123/") path: org.apache.hadoop.fs.Path = gs://test_dd123/ scala> path.suffix("/num=123") java.lang.NullPointerException at org.apache.hadoop.fs.Path.<init>(Path.java:150) at org.apache.hadoop.fs.Path.<init>(Path.java:129) at org.apache.hadoop.fs.Path.suffix(Path.java:450){noformat} Path.suffix throughs NPE when writing into GS buckets root. > Spark insertInto with location GCS bucket root causes NPE > --------------------------------------------------------- > > Key: SPARK-44883 > URL: https://issues.apache.org/jira/browse/SPARK-44883 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.3.0 > Reporter: Dipayan Dev > Priority: Minor > > In our Organisation, we are using GCS bucket root location to point to our > Hive table. Dataproc's latest 2.1 uses *Spark* *3.3.0* and this needs to be > fixed. > Spark Scala code to reproduce this issue > {noformat} > val DF = Seq(("test1", 123)).toDF("name", "num") > DF.write.option("path", > "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name") > val DF1 = Seq(("test2", 125)).toDF("name", "num") > DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name") > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.<init>(Path.java:141) > at org.apache.hadoop.fs.Path.<init>(Path.java:120) > at org.apache.hadoop.fs.Path.suffix(Path.java:441) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254) > {noformat} > Looks like the issue is coming from Hadoop Path. > {noformat} > scala> import org.apache.hadoop.fs.Path > import org.apache.hadoop.fs.Path > scala> val path: Path = new Path("gs://test_dd123/") > path: org.apache.hadoop.fs.Path = gs://test_dd123/ > scala> path.suffix("/num=123") > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.<init>(Path.java:150) > at org.apache.hadoop.fs.Path.<init>(Path.java:129) > at org.apache.hadoop.fs.Path.suffix(Path.java:450){noformat} > Path.suffix throughs NPE when writing into GS buckets root. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org