[ 
https://issues.apache.org/jira/browse/SPARK-34955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34955:
------------------------------------

    Assignee: Kousuke Saruta  (was: Apache Spark)

> ADD JAR command cannot add jar files which contains whitespaces in the path
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-34955
>                 URL: https://issues.apache.org/jira/browse/SPARK-34955
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.7, 3.0.2, 3.2.0, 3.1.1
>            Reporter: Kousuke Saruta
>            Assignee: Kousuke Saruta
>            Priority: Major
>
> ADD JAR command cannot add jar files which contains white spaces in the path.
> If we have `/some/path/test file.jar` and execute the following command:
> {code}
> ADD JAR "/some/path/test file.jar";
> {code}
> The following exception is thrown.
> {code}
> 21/04/05 10:40:38 ERROR SparkSQLDriver: Failed in [add jar "/some/path/test 
> file.jar"]
> java.lang.IllegalArgumentException: Illegal character in path at index 9: 
> /some/path/test file.jar
>       at java.net.URI.create(URI.java:852)
>       at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:129)
>       at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:34)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> {code}
> This is because `HiveSessionStateBuilder` and `SessionStateBuilder` don't 
> check whether the form of the path is URI or plain path and it always regards 
> the path as URI form.
> Whitespces should be encoded to `%20` so `/some/path/test file.jar` is 
> rejected.
> We can resolve this part by checking whether the given path is URI form or 
> not.
> Unfortunatelly, if we fix this part, another problem occurs.
> When we execute `ADD JAR` command, Hive's `ADD JAR` command is executed in 
> `HiveClientImpl.addJar` and `AddResourceProcessor.run` is transitively 
> invoked.
> In `AddResourceProcessor.run`, the command line is just split by `\\s+` and 
> the path is also split into `/some/path/test` and `file.jar` and passed to 
> `ss.add_resources`.
> https://github.com/apache/hive/blob/f1e87137034e4ecbe39a859d4ef44319800016d7/ql/src/java/org/apache/hadoop/hive/ql/processors/AddResourceProcessor.java#L56-L75
> So, the command still fails.
> Even if we convert the form of the path to URI like 
> `file:/some/path/test%20file.jar` and execute the following command:
> {code}
> ADD JAR "file:/some/path/test%20file";
> {code}
> The following exception is thrown.
> {code}
> 21/04/05 10:40:53 ERROR SessionState: file:/some/path/test%20file.jar does 
> not exist
> java.lang.IllegalArgumentException: file:/some/path/test%20file.jar does not 
> exist
>       at 
> org.apache.hadoop.hive.ql.session.SessionState.validateFiles(SessionState.java:1168)
>       at 
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType.preHook(SessionState.java:1289)
>       at 
> org.apache.hadoop.hive.ql.session.SessionState$ResourceType$1.preHook(SessionState.java:1278)
>       at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1378)
>       at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1336)
>       at 
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:74)
> {code}
> The reason is `Utilities.realFile` invoked in `SessionState.validateFiles` 
> returns `null` as the result of `fs.exists(path)` is `false`.
> https://github.com/apache/hive/blob/f1e87137034e4ecbe39a859d4ef44319800016d7/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L1052-L1064
> `fs.exists` checks the existence of the given path by comparing the string 
> representation of Hadoop's `Path`.
> The string representation of `Path` is similar to URI but it's actually 
> different.
> `Path` doesn't encode the given path.
> For example, the URI form of `/some/path/jar file.jar` is 
> `file:/some/path/jar%20file.jar` but the `Path` form of it is 
> `file:/some/path/jar file.jar`. So `fs.exists` returns false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to