[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs

2017-03-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931350#comment-15931350
 ] 

Apache Spark commented on SPARK-18910:
--

User 'weiqingy' has created a pull request for this issue:
https://github.com/apache/spark/pull/17342

> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> {code}
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> {code}
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> {code}
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>   new File(path).toURI.toURL
> } else {
>   // `path` is a URL with a scheme
>   {color:red}uri.toURL{color}
> }
> jarClassLoader.addURL(jarURL)
> Thread.currentThread().setContextClassLoader(jarClassLoader)
>   }
> {code}
> I think we should setURLStreamHandlerFactory method on URL with an instance 
> of FsUrlStreamHandlerFactory, just like:
> {code}
> static {
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-20 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764350#comment-15764350
 ] 

Yuming Wang commented on SPARK-18910:
-

This should be a duplicate of SPARK-12868.

> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> {code}
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> {code}
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> {code}
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>   new File(path).toURI.toURL
> } else {
>   // `path` is a URL with a scheme
>   {color:red}uri.toURL{color}
> }
> jarClassLoader.addURL(jarURL)
> Thread.currentThread().setContextClassLoader(jarClassLoader)
>   }
> {code}
> I think we should setURLStreamHandlerFactory method on URL with an instance 
> of FsUrlStreamHandlerFactory, just like:
> {code}
> static {
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-16 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756433#comment-15756433
 ] 

Apache Spark commented on SPARK-18910:
--

User 'shenh062326' has created a pull request for this issue:
https://github.com/apache/spark/pull/16324

> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> {code}
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> {code}
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> {code}
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>   new File(path).toURI.toURL
> } else {
>   // `path` is a URL with a scheme
>   {color:red}uri.toURL{color}
> }
> jarClassLoader.addURL(jarURL)
> Thread.currentThread().setContextClassLoader(jarClassLoader)
>   }
> {code}
> I think we should setURLStreamHandlerFactory method on URL with an instance 
> of FsUrlStreamHandlerFactory, just like:
> {code}
> static {
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-16 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756385#comment-15756385
 ] 

Hong Shen commented on SPARK-18910:
---

Should I add a pull request to resolve this problem?

> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> 
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> 
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> 
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>   new File(path).toURI.toURL
> } else {
>   // `path` is a URL with a scheme
>   uri.toURL
> }
> jarClassLoader.addURL(jarURL)
> Thread.currentThread().setContextClassLoader(jarClassLoader)
>   }
> 
> I think we should setURLStreamHandlerFactory method on URL with an instance 
> of FsUrlStreamHandlerFactory, just like:
> 
> static {
>   // This method can be called at most once in a given JVM.
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
> }
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org