[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs
[ https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931350#comment-15931350 ] Apache Spark commented on SPARK-18910: -- User 'weiqingy' has created a pull request for this issue: https://github.com/apache/spark/pull/17342 > Can't use UDF that jar file in hdfs > --- > > Key: SPARK-18910 > URL: https://issues.apache.org/jira/browse/SPARK-18910 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hong Shen > > When I create a UDF that jar file in hdfs, I can't use the UDF. > {code} > spark-sql> create function trans_array as 'com.test.udf.TransArray' using > jar > 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar'; > spark-sql> describe function trans_array; > Function: test_db.trans_array > Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray > Usage: N/A. > Time taken: 0.127 seconds, Fetched 3 row(s) > spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) > from test_spark limit 10; > Error in query: Undefined function: 'trans_array'. This function is neither a > registered temporary function nor a permanent function registered in the > database 'test_db'.; line 1 pos 7 > {code} > The reason is when > org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, > the uri.toURL throw exception with " failed unknown protocol: hdfs" > {code} > def addJar(path: String): Unit = { > sparkSession.sparkContext.addJar(path) > val uri = new Path(path).toUri > val jarURL = if (uri.getScheme == null) { > // `path` is a local file path without a URL scheme > new File(path).toURI.toURL > } else { > // `path` is a URL with a scheme > {color:red}uri.toURL{color} > } > jarClassLoader.addURL(jarURL) > Thread.currentThread().setContextClassLoader(jarClassLoader) > } > {code} > I think we should setURLStreamHandlerFactory method on URL with an instance > of FsUrlStreamHandlerFactory, just like: > {code} > static { > URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs
[ https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764350#comment-15764350 ] Yuming Wang commented on SPARK-18910: - This should be a duplicate of SPARK-12868. > Can't use UDF that jar file in hdfs > --- > > Key: SPARK-18910 > URL: https://issues.apache.org/jira/browse/SPARK-18910 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hong Shen > > When I create a UDF that jar file in hdfs, I can't use the UDF. > {code} > spark-sql> create function trans_array as 'com.test.udf.TransArray' using > jar > 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar'; > spark-sql> describe function trans_array; > Function: test_db.trans_array > Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray > Usage: N/A. > Time taken: 0.127 seconds, Fetched 3 row(s) > spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) > from test_spark limit 10; > Error in query: Undefined function: 'trans_array'. This function is neither a > registered temporary function nor a permanent function registered in the > database 'test_db'.; line 1 pos 7 > {code} > The reason is when > org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, > the uri.toURL throw exception with " failed unknown protocol: hdfs" > {code} > def addJar(path: String): Unit = { > sparkSession.sparkContext.addJar(path) > val uri = new Path(path).toUri > val jarURL = if (uri.getScheme == null) { > // `path` is a local file path without a URL scheme > new File(path).toURI.toURL > } else { > // `path` is a URL with a scheme > {color:red}uri.toURL{color} > } > jarClassLoader.addURL(jarURL) > Thread.currentThread().setContextClassLoader(jarClassLoader) > } > {code} > I think we should setURLStreamHandlerFactory method on URL with an instance > of FsUrlStreamHandlerFactory, just like: > {code} > static { > URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs
[ https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756433#comment-15756433 ] Apache Spark commented on SPARK-18910: -- User 'shenh062326' has created a pull request for this issue: https://github.com/apache/spark/pull/16324 > Can't use UDF that jar file in hdfs > --- > > Key: SPARK-18910 > URL: https://issues.apache.org/jira/browse/SPARK-18910 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hong Shen > > When I create a UDF that jar file in hdfs, I can't use the UDF. > {code} > spark-sql> create function trans_array as 'com.test.udf.TransArray' using > jar > 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar'; > spark-sql> describe function trans_array; > Function: test_db.trans_array > Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray > Usage: N/A. > Time taken: 0.127 seconds, Fetched 3 row(s) > spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) > from test_spark limit 10; > Error in query: Undefined function: 'trans_array'. This function is neither a > registered temporary function nor a permanent function registered in the > database 'test_db'.; line 1 pos 7 > {code} > The reason is when > org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, > the uri.toURL throw exception with " failed unknown protocol: hdfs" > {code} > def addJar(path: String): Unit = { > sparkSession.sparkContext.addJar(path) > val uri = new Path(path).toUri > val jarURL = if (uri.getScheme == null) { > // `path` is a local file path without a URL scheme > new File(path).toURI.toURL > } else { > // `path` is a URL with a scheme > {color:red}uri.toURL{color} > } > jarClassLoader.addURL(jarURL) > Thread.currentThread().setContextClassLoader(jarClassLoader) > } > {code} > I think we should setURLStreamHandlerFactory method on URL with an instance > of FsUrlStreamHandlerFactory, just like: > {code} > static { > URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs
[ https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756385#comment-15756385 ] Hong Shen commented on SPARK-18910: --- Should I add a pull request to resolve this problem? > Can't use UDF that jar file in hdfs > --- > > Key: SPARK-18910 > URL: https://issues.apache.org/jira/browse/SPARK-18910 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hong Shen > > When I create a UDF that jar file in hdfs, I can't use the UDF. > > spark-sql> create function trans_array as 'com.test.udf.TransArray' using > jar > 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar'; > spark-sql> describe function trans_array; > Function: test_db.trans_array > Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray > Usage: N/A. > Time taken: 0.127 seconds, Fetched 3 row(s) > spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) > from test_spark limit 10; > Error in query: Undefined function: 'trans_array'. This function is neither a > registered temporary function nor a permanent function registered in the > database 'test_db'.; line 1 pos 7 > > The reason is when > org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, > the uri.toURL throw exception with " failed unknown protocol: hdfs" > > def addJar(path: String): Unit = { > sparkSession.sparkContext.addJar(path) > val uri = new Path(path).toUri > val jarURL = if (uri.getScheme == null) { > // `path` is a local file path without a URL scheme > new File(path).toURI.toURL > } else { > // `path` is a URL with a scheme > uri.toURL > } > jarClassLoader.addURL(jarURL) > Thread.currentThread().setContextClassLoader(jarClassLoader) > } > > I think we should setURLStreamHandlerFactory method on URL with an instance > of FsUrlStreamHandlerFactory, just like: > > static { > // This method can be called at most once in a given JVM. > URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org