Re: sparkR.init() can not load sparkPackages.

2016-06-19 Thread Sun Rui
Hi, Joseph,

This is a known issue but not a bug.

This issue does not occur when you use interactive SparkR session, while it 
does occur when you execute an R file.

The reason behind this is that in case you execute an R file, the R backend 
launches before the R interpreter, so there is no opportunity for packages 
specified with ‘sparkPackages’ to be processed.

For now, if you want to execute an R file with additional spark packages, 
please use the “--packages” command line option.

> On Jun 17, 2016, at 10:46, Joseph  wrote:
> 
> Hi all,
> 
> I find an issue in sparkR, maybe it's a bug:
> 
> When I read csv file, it's normal to use the following way:
> ${SPARK_HOME}/bin/spark-submit  --packages 
> com.databricks:spark-csv_2.11:1.4.0   example.R 
> 
> But using the following way will give an error:
> sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.4.0")
> 
> 16/06/17 09:54:12 ERROR RBackendHandler: loadDF on 
> org.apache.spark.sql.api.r.SQLUtils failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.ClassNotFoundException: Failed to find data source: csv. Please 
> find packages at http://spark-packages.org 
>   at 
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
> 
> It is obvious that the sparkR.init() does not load the specified package!
> -
> 
> Appendix:
> The complete code for example.R:
> 
> if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
>   Sys.setenv(SPARK_HOME = "/home/hadoop/spark-1.6.1-bin-hadoop2.6")
> }
> 
> library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
> 
> sc <- sparkR.init(master = "local[2]", sparkEnvir = 
> list(spark.driver.memory="1g"), 
> sparkPackages="com.databricks:spark-csv_2.11:1.4.0")
> 
> sqlContext <- sparkRSQL.init(sc)
> people <- read.df(sqlContext, 
> "file:/home/hadoop/spark-1.6.1-bin-hadoop2.6/data/mllib/sample_tree_data.csv",
>  "csv")
> registerTempTable(people, "people")
> teenagers <- sql(sqlContext, "SELECT * FROM people")
> head(teenagers)
> 
> Joseph



sparkR.init() can not load sparkPackages.

2016-06-16 Thread Joseph
Hi all,

I find an issue in sparkR, maybe it's a bug:

When I read csv file, it's normal to use the following way:
${SPARK_HOME}/bin/spark-submit  --packages com.databricks:spark-csv_2.11:1.4.0  
 example.R 

But using the following way will give an error:
sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.4.0")

16/06/17 09:54:12 ERROR RBackendHandler: loadDF on 
org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  java.lang.ClassNotFoundException: Failed to find data source: csv. Please 
find packages at http://spark-packages.org
at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)

It is obvious that the sparkR.init() does not load the specified package!
-

Appendix:
The complete code for example.R:

if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
  Sys.setenv(SPARK_HOME = "/home/hadoop/spark-1.6.1-bin-hadoop2.6")
}

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))

sc <- sparkR.init(master = "local[2]", sparkEnvir = 
list(spark.driver.memory="1g"), 
sparkPackages="com.databricks:spark-csv_2.11:1.4.0")

sqlContext <- sparkRSQL.init(sc)
people <- read.df(sqlContext, 
"file:/home/hadoop/spark-1.6.1-bin-hadoop2.6/data/mllib/sample_tree_data.csv", 
"csv")
registerTempTable(people, "people")
teenagers <- sql(sqlContext, "SELECT * FROM people")
head(teenagers)



Joseph