[ https://issues.apache.org/jira/browse/SPARK-25694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-25694: ---------------------------------- Affects Version/s: 3.0.0 2.4.4 > URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue > --------------------------------------------------------------------------- > > Key: SPARK-25694 > URL: https://issues.apache.org/jira/browse/SPARK-25694 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.4, 3.0.0 > Reporter: Bo Yang > Priority: Minor > > URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() > returns FsUrlConnection object, which is not compatible with > HttpURLConnection. This will cause exception when using some third party http > library (e.g. scalaj.http). > The following code in Spark 2.3.0 introduced the issue: > sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala: > {code} > object SharedState extends Logging { ... > URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()) ... > } > {code} > Here is the example exception when using scalaj.http in Spark: > {code} > StackTrace: scala.MatchError: > org.apache.hadoop.fs.FsUrlConnection:[http://wwww.example.com|http://wwww.example.com/] > (of class org.apache.hadoop.fs.FsUrlConnection) > at > scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343) > at scalaj.http.HttpRequest.exec(Http.scala:335) > at scalaj.http.HttpRequest.asString(Http.scala:455) > {code} > > One option to fix the issue is to return null in > URLStreamHandlerFactory.createURLStreamHandler when the protocol is > http/https, so it will use the default behavior and be compatible with > scalaj.http. Following is the code example: > {code} > class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with > Logging { > private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory() > override def createURLStreamHandler(protocol: String): URLStreamHandler = { > val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol) > if (handler == null) { > return null > } > if (protocol != null && > (protocol.equalsIgnoreCase("http") > || protocol.equalsIgnoreCase("https"))) { > // return null to use system default URLStreamHandler > null > } else { > handler > } > } > } > {code} > I would like to get some discussion here before submitting a pull request. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org