[ 
https://issues.apache.org/jira/browse/SPARK-25694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Yang updated SPARK-25694:
----------------------------
    Description: 
URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
returns FsUrlConnection object, which is not compatible with HttpURLConnection. 
This will cause exception when using some third party http library (e.g. 
scalaj.http).

The following code in Spark 2.3.0 introduced the issue: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
{code}
object SharedState extends Logging  {   ...   
  URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())   ...
}
{code}

Here is the example exception when using scalaj.http in Spark:
{code}
 StackTrace: scala.MatchError: 
org.apache.hadoop.fs.FsUrlConnection:[http://wwww.example.com|http://wwww.example.com/]
 (of class org.apache.hadoop.fs.FsUrlConnection)
 at 
scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
 at scalaj.http.HttpRequest.exec(Http.scala:335)
 at scalaj.http.HttpRequest.asString(Http.scala:455)
{code}
  
One option to fix the issue is to return null in 
URLStreamHandlerFactory.createURLStreamHandler when the protocol is http/https, 
so it will use the default behavior and be compatible with scalaj.http. 
Following is the code example:

{code}
class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with Logging 
{

  private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()

  override def createURLStreamHandler(protocol: String): URLStreamHandler = {
    val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
    if (handler == null) {
      return null
    }

    if (protocol != null &&
      (protocol.equalsIgnoreCase("http")
      || protocol.equalsIgnoreCase("https"))) {
      // return null to use system default URLStreamHandler
      null
    } else {
      handler
    }
  }
}
{code}

I would like to get some discussion here before submitting a pull request.


  was:
URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
returns FsUrlConnection object, which is not compatible with HttpURLConnection. 
This will cause exception when using some third party http library (e.g. 
scalaj.http).

The following code in Spark 2.3.0 introduced the issue: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
{code}
object SharedState extends Logging  {   ...   
  URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())   ...
}
{code}

Here is the example exception when using scalaj.http in Spark:
{code}
 StackTrace: scala.MatchError: 
org.apache.hadoop.fs.FsUrlConnection:[http://wwww.example.com|http://wwww.example.com/]
 (of class org.apache.hadoop.fs.FsUrlConnection)
 at 
scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
 at scalaj.http.HttpRequest.exec(Http.scala:335)
 at scalaj.http.HttpRequest.asString(Http.scala:455)
{code}
  
One option to fix the issue is to return null in 
URLStreamHandlerFactory.createURLStreamHandler when the protocol is http/https, 
so it will use the default behavior and be compatible with scalaj.http. 
Following is the code example:

{code}
class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with Logging 
{

  private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()

  override def createURLStreamHandler(protocol: String): URLStreamHandler = {
    val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
    if (handler == null) {
      return null
    }

    if (protocol != null &&
      (protocol.equalsIgnoreCase("http")
      || protocol.equalsIgnoreCase("https"))) {
      // return null to use system default URLStreamHandler
      null
    } else {
      handler
    }
  }
}
{code}



> URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-25694
>                 URL: https://issues.apache.org/jira/browse/SPARK-25694
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2
>            Reporter: Bo Yang
>            Priority: Minor
>
> URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
> returns FsUrlConnection object, which is not compatible with 
> HttpURLConnection. This will cause exception when using some third party http 
> library (e.g. scalaj.http).
> The following code in Spark 2.3.0 introduced the issue: 
> sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
> {code}
> object SharedState extends Logging  {   ...   
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())   ...
> }
> {code}
> Here is the example exception when using scalaj.http in Spark:
> {code}
>  StackTrace: scala.MatchError: 
> org.apache.hadoop.fs.FsUrlConnection:[http://wwww.example.com|http://wwww.example.com/]
>  (of class org.apache.hadoop.fs.FsUrlConnection)
>  at 
> scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
>  at scalaj.http.HttpRequest.exec(Http.scala:335)
>  at scalaj.http.HttpRequest.asString(Http.scala:455)
> {code}
>   
> One option to fix the issue is to return null in 
> URLStreamHandlerFactory.createURLStreamHandler when the protocol is 
> http/https, so it will use the default behavior and be compatible with 
> scalaj.http. Following is the code example:
> {code}
> class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with 
> Logging {
>   private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()
>   override def createURLStreamHandler(protocol: String): URLStreamHandler = {
>     val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
>     if (handler == null) {
>       return null
>     }
>     if (protocol != null &&
>       (protocol.equalsIgnoreCase("http")
>       || protocol.equalsIgnoreCase("https"))) {
>       // return null to use system default URLStreamHandler
>       null
>     } else {
>       handler
>     }
>   }
> }
> {code}
> I would like to get some discussion here before submitting a pull request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to