[ 
https://issues.apache.org/jira/browse/SPARK-25694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Yang updated SPARK-25694:
----------------------------
    Description: 
URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
returns FsUrlConnection object, which is not compatible with HttpURLConnection. 
This will cause exception when using some third party http library (e.g. 
scalaj.http).

The following code in Spark 2.3.0 introduced the issue:

sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
{quote}object SharedState extends Logging
Unknown macro: \{   ...   URL.setURLStreamHandlerFactory(new 
FsUrlStreamHandlerFactory())   ... }{quote}
 

Example exception when using scalaj.http in Spark:
 StackTrace: scala.MatchError: 
org.apache.hadoop.fs.FsUrlConnection:[http://wwww.example.com|http://wwww.example.com/]
 (of class org.apache.hadoop.fs.FsUrlConnection)
 at 
scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
 at scalaj.http.HttpRequest.exec(Http.scala:335)
 at scalaj.http.HttpRequest.asString(Http.scala:455)
  

 

One option to fix the issue is:

 


{quote}class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with 
Logging {

  private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()

  override def createURLStreamHandler(protocol: String): URLStreamHandler = {
    val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
    if (handler == null) {
      return null
    }

    if (protocol != null &&
      (protocol.equalsIgnoreCase("http")
      || protocol.equalsIgnoreCase("https"))) {
      // return null to use system default URLStreamHandler
      logDebug("Use system default URLStreamHandler for " + protocol)
      null
    } else {
      handler
    }
  }
}{quote}

 

 

  was:
URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
returns FsUrlConnection object, which is not compatible with HttpURLConnection. 
This will cause exception when using some third party http library (e.g. 
scalaj.http).



The following code in Spark 2.3.0 introduced the issue:

sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
{quote}object SharedState extends Logging {
  ...
  URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
  ...
}
{quote}
 

Example exception when using scalaj.http in Spark:
StackTrace: scala.MatchError: 
org.apache.hadoop.fs.FsUrlConnection:http://wwww.example.com (of class 
org.apache.hadoop.fs.FsUrlConnection)
    at 
scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
    at scalaj.http.HttpRequest.exec(Http.scala:335)
    at scalaj.http.HttpRequest.asString(Http.scala:455)
 

 

 


> URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection
> ---------------------------------------------------------------------
>
>                 Key: SPARK-25694
>                 URL: https://issues.apache.org/jira/browse/SPARK-25694
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2
>            Reporter: Bo Yang
>            Priority: Major
>
> URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
> returns FsUrlConnection object, which is not compatible with 
> HttpURLConnection. This will cause exception when using some third party http 
> library (e.g. scalaj.http).
> The following code in Spark 2.3.0 introduced the issue:
> sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
> {quote}object SharedState extends Logging
> Unknown macro: \{   ...   URL.setURLStreamHandlerFactory(new 
> FsUrlStreamHandlerFactory())   ... }{quote}
>  
> Example exception when using scalaj.http in Spark:
>  StackTrace: scala.MatchError: 
> org.apache.hadoop.fs.FsUrlConnection:[http://wwww.example.com|http://wwww.example.com/]
>  (of class org.apache.hadoop.fs.FsUrlConnection)
>  at 
> scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
>  at scalaj.http.HttpRequest.exec(Http.scala:335)
>  at scalaj.http.HttpRequest.asString(Http.scala:455)
>   
>  
> One option to fix the issue is:
>  
> {quote}class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory 
> with Logging {
>   private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()
>   override def createURLStreamHandler(protocol: String): URLStreamHandler = {
>     val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
>     if (handler == null) {
>       return null
>     }
>     if (protocol != null &&
>       (protocol.equalsIgnoreCase("http")
>       || protocol.equalsIgnoreCase("https"))) {
>       // return null to use system default URLStreamHandler
>       logDebug("Use system default URLStreamHandler for " + protocol)
>       null
>     } else {
>       handler
>     }
>   }
> }{quote}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to