[jira] [Updated] (SPARK-44883) Spark insertInto with location GCS bucket root causes NPE

Dipayan Dev (Jira) Sun, 20 Aug 2023 04:03:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-44883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dipayan Dev updated SPARK-44883:
--------------------------------
    Description: 
In our Organisation, we are using GCS bucket root location to point to our Hive 
table. Dataproc's latest 2.1 uses *Spark* *3.3.0* and this needs to be fixed.

Spark Scala code to reproduce this issue
{noformat}
val DF = Seq(("test1", 123)).toDF("name", "num")
DF.write.option("path", 
"gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name")


val DF1 = Seq(("test2", 125)).toDF("name", "num")
DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name")


java.lang.NullPointerException
  at org.apache.hadoop.fs.Path.<init>(Path.java:141)
  at org.apache.hadoop.fs.Path.<init>(Path.java:120)
  at org.apache.hadoop.fs.Path.suffix(Path.java:441)
  at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)
 {noformat}
Looks like the issue is coming from Hadoop Path. 
{noformat}
scala> import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.Path
scala> val path: Path = new Path("gs://test_dd123/")
path: org.apache.hadoop.fs.Path = gs://test_dd123/

scala> path.suffix("/num=123")
java.lang.NullPointerException
  at org.apache.hadoop.fs.Path.<init>(Path.java:150)
  at org.apache.hadoop.fs.Path.<init>(Path.java:129)
  at org.apache.hadoop.fs.Path.suffix(Path.java:450){noformat}
Path.suffix throughs NPE when writing into GS buckets root. 

 

> Spark insertInto with location GCS bucket root causes NPE
> ---------------------------------------------------------
>
>                 Key: SPARK-44883
>                 URL: https://issues.apache.org/jira/browse/SPARK-44883
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Dipayan Dev
>            Priority: Minor
>
> In our Organisation, we are using GCS bucket root location to point to our 
> Hive table. Dataproc's latest 2.1 uses *Spark* *3.3.0* and this needs to be 
> fixed.
> Spark Scala code to reproduce this issue
> {noformat}
> val DF = Seq(("test1", 123)).toDF("name", "num")
> DF.write.option("path", 
> "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name")
> val DF1 = Seq(("test2", 125)).toDF("name", "num")
> DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name")
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.<init>(Path.java:141)
>   at org.apache.hadoop.fs.Path.<init>(Path.java:120)
>   at org.apache.hadoop.fs.Path.suffix(Path.java:441)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)
>  {noformat}
> Looks like the issue is coming from Hadoop Path. 
> {noformat}
> scala> import org.apache.hadoop.fs.Path
> import org.apache.hadoop.fs.Path
> scala> val path: Path = new Path("gs://test_dd123/")
> path: org.apache.hadoop.fs.Path = gs://test_dd123/
> scala> path.suffix("/num=123")
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.<init>(Path.java:150)
>   at org.apache.hadoop.fs.Path.<init>(Path.java:129)
>   at org.apache.hadoop.fs.Path.suffix(Path.java:450){noformat}
> Path.suffix throughs NPE when writing into GS buckets root. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44883) Spark insertInto with location GCS bucket root causes NPE

Reply via email to