Re: sparkR ORC support.

Sandeep Khurana Tue, 12 Jan 2016 01:12:07 -0800

Running this gave

16/01/12 04:06:54 INFO BlockManagerMaster: Registered
BlockManagerError in writeJobj(con, object) : invalid jobj 3



How does it know which hive schema to connect to?



On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> It looks like you have overwritten sc. Could you try this:
>
>
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <- sparkR.init()
> hivecontext <- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>
>
>
> ------------------------------
> Date: Tue, 12 Jan 2016 14:28:58 +0530
> Subject: Re: sparkR ORC support.
> From: sand...@infoworks.io
> To: felixcheun...@hotmail.com
> CC: yblia...@gmail.com; user@spark.apache.org; premsure...@gmail.com;
> deepakmc...@gmail.com
>
>
> The code is very simple, pasted below .
> hive-site.xml is in spark conf already. I still see this error
>
> Error in writeJobj(con, object) : invalid jobj 3
>
> after running the script  below
>
>
> script
> =======
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <<- sparkR.init()
> sc <<- sparkRHive.init()
> hivecontext <<- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
> #View(df)
>
>
> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>
> Yes, as Yanbo suggested, it looks like there is something wrong with the
> sqlContext.
>
> Could you forward us your code please?
>
>
>
>
>
> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yblia...@gmail.com>
> wrote:
>
> You should ensure your sqlContext is HiveContext.
>
> sc <- sparkR.init()
>
> sqlContext <- sparkRHive.init(sc)
>
>
> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sand...@infoworks.io>:
>
> Felix
>
> I tried the option suggested by you.  It gave below error.  I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>
> Firstly I don't have ORC data to verify but this should work:
>
> df <- loadDF(sqlContext, "data/path", "orc")
>
> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
> should be called after sparkR.init() - please check if there is any error
> message there.
>
> _____________________________
> From: Prem Sure <premsure...@gmail.com>
> Sent: Tuesday, January 5, 2016 8:12 AM
> Subject: Re: sparkR ORC support.
> To: Sandeep Khurana <sand...@infoworks.io>
> Cc: spark users <user@spark.apache.org>, Deepak Sharma <
> deepakmc...@gmail.com>
>
>
>
> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>
>
> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sand...@infoworks.io>
> wrote:
>
> Also, do I need to setup hive in spark as per the link
> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
> ?
>
> We might need to copy hdfs-site.xml file to spark conf directory ?
>
> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sand...@infoworks.io>
> wrote:
>
> Deepak
>
> Tried this. Getting this error now
>
> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused 
> argument ("")
>
>
> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <deepakmc...@gmail.com>
> wrote:
>
> Hi Sandeep
> can you try this ?
>
> results <- sql(hivecontext, "FROM test SELECT id","")
>
> Thanks
> Deepak
>
>
> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sand...@infoworks.io>
> wrote:
>
> Thanks Deepak.
>
> I tried this as well. I created a hivecontext   with  "hivecontext <<-
> sparkRHive.init(sc) "  .
>
> When I tried to read hive table from this ,
>
> results <- sql(hivecontext, "FROM test SELECT id")
>
> I get below error,
>
> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If 
> SparkR was restarted, Spark operations need to be re-executed.
>
>
> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com>
> wrote:
>
> Hi Sandeep
> I am not sure if ORC can be read directly in R.
> But there can be a workaround .First create hive table on top of ORC files
> and then access hive table in R.
>
> Thanks
> Deepak
>
> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sand...@infoworks.io>
> wrote:
>
> Hello
>
> I need to read an ORC files in hdfs in R using spark. I am not able to
> find a package to do that.
>
> Can anyone help with documentation or example for this purpose?
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Reply via email to