[ https://issues.apache.org/jira/browse/SPARK-16055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346708#comment-15346708 ]
Shivaram Venkataraman commented on SPARK-16055: ----------------------------------------------- Thanks [~KrishnaKalyan3] -- Feel free to pick this up. Once you create a PR it will automatically be assigned to you. You can see more details at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCodeChanges > sparkR.init() can not load sparkPackages when executing an R file > ----------------------------------------------------------------- > > Key: SPARK-16055 > URL: https://issues.apache.org/jira/browse/SPARK-16055 > Project: Spark > Issue Type: Brainstorming > Components: SparkR > Affects Versions: 1.6.1 > Reporter: Sun Rui > Priority: Minor > > This is an issue reported in the Spark user mailing list. Refer to > http://comments.gmane.org/gmane.comp.lang.scala.spark.user/35742 > This issue does not occur in an interactive SparkR session, while it does > occur when executing an R file. > The following example code can be put into an R file to reproduce this issue: > {code} > .libPaths(c("/home/user/spark-1.6.1-bin-hadoop2.6/R/lib",.libPaths())) > Sys.setenv(SPARK_HOME="/home/user/spark-1.6.1-bin-hadoop2.6") > library("SparkR") > sc <- sparkR.init(sparkPackages = "com.databricks:spark-csv_2.11:1.4.0") > sqlContext <- sparkRSQL.init(sc) > df <- read.df(sqlContext, > "file:///home/user/spark-1.6.1-bin-hadoop2.6/data/mllib/sample_tree_data.csv","csv") > showDF(df) > {code} > The error message is as such: > {panel} > 16/06/19 15:48:56 ERROR RBackendHandler: loadDF on > org.apache.spark.sql.api.r.SQLUtils failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.ClassNotFoundException: Failed to find data source: csv. Please > find packages at http://spark-packages.org > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) > at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:160) > at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141) > at > org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala > Calls: read.df -> callJStatic -> invokeJava > Execution halted > {panel} > The reason behind this is that in case you execute an R file, the R backend > launches before the R interpreter, so there is no opportunity for packages > specified with ‘sparkPackages’ to be processed. > This JIRA issue is to track this issue. An appropriate solution is to be > discussed. Maybe documentation the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org