Re: Spark R - Loading Third Party R Library in YARN Executors

2016-08-17 Thread Shivaram Venkataraman
I think you can also pass in a zip file using the --files option
(http://spark.apache.org/docs/latest/running-on-yarn.html has some
examples). The files should then be present in the current working
directory of the driver R process.

Thanks
Shivaram

On Wed, Aug 17, 2016 at 4:16 AM, Felix Cheung <felixcheun...@hotmail.com> wrote:
> When you call library(), that is the library loading function in native R.
> As of now it does not support HDFS but there are several packages out there
> that might help.
>
> Another approach is to have a prefetch/installation mechanism to call HDFS
> command to download the R package from HDFS onto the worker node first.
>
>
> _
> From: Senthil Kumar <senthilec...@gmail.com>
> Sent: Wednesday, August 17, 2016 2:23 AM
> Subject: Spark R - Loading Third Party R Library in YARN Executors
> To: Senthil kumar <senthilec...@gmail.com>, <du...@ebay.com>,
> <jiaj...@ebay.com>, <dev@spark.apache.org>
>
>
>
> Hi All ,  We are using Spark 1.6 Version R library .. Below is our code
> which Loads the THIRD Party Library .
>
>
> library("BreakoutDetection", lib.loc = "hdfs://xx/BreakoutDetection/") :
> library("BreakoutDetection", lib.loc = "//xx/BreakoutDetection/") :
>
>
> When i try to execute the code using LOCAL Mode , Spark R code is Working
> fine without any issue . If i submit the Job in Cluster , we will end up
> with error.
>
> error in evaluating the argument 'X' in selecting a method for function
> 'lapply': Error in library("BreakoutDetection", lib.loc =
> "hdfs://xxx/BreakoutDetection/") :
>   no library trees found in 'lib.loc'
> Calls: f ... lapply -> FUN -> mainProcess -> angleValid -> library
>
>
> Can't we read libraries in R as below ?
> library("BreakoutDetection", lib.loc = "hdfs://xx/BreakoutDetection/") :
>
> If not what is the other way to solve this problem ?
>
> Since our cluster having close to 2500 nodes we cant copy the Third Party
> Libs to all nodes .. Copying to all DNs is not good practice too ..
>
> Can someone help me here How to load R libs from HDFS or any other way  ?
>
>
> --Senthil
>
>
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Spark R - Loading Third Party R Library in YARN Executors

2016-08-17 Thread Senthil Kumar
Hi All ,  We are using Spark 1.6 Version R library .. Below is our code
which Loads the THIRD Party Library .


library("BreakoutDetection", lib.loc = "*hdfs://xx/BreakoutDetection/*")
:
library("BreakoutDetection", lib.loc = "*//xx/BreakoutDetection/*") :


When i try to execute the code using LOCAL Mode , Spark R code is Working
fine without any issue . If i submit the Job in Cluster , we will end up
with error.

*error in evaluating the argument 'X' in selecting a method for function
'lapply': Error in library("BreakoutDetection", lib.loc =
"hdfs://xxx/BreakoutDetection/") :*
*  no library trees found in 'lib.loc'*
*Calls: f ... lapply -> FUN -> mainProcess -> angleValid -> library*


Can't we read libraries in R as below ?
library("BreakoutDetection", lib.loc = "*hdfs://xx/BreakoutDetection/*")
:

If not what is the other way to solve this problem ?

Since our cluster having close to 2500 nodes we cant copy the Third Party
Libs to all nodes .. Copying to all DNs is not good practice too ..

Can someone help me here How to load R libs from HDFS or any other way  ?


--Senthil