Re: Creating a SparkR standalone job
Thanks for attaching code. If I get your use case right you want to call the sentiment analysis code from Spark Streaming right ? For that I think you can just use jvmr if that works and I don't think you need SparkR. SparkR is mainly intended as an API for large scale jobs which are written in R. For this use case where the job is written in Scala (or Java) you can create your SparkContext in Scala and then just call jvmr from say within a map function. The only other thing might be to figure out what the thread-safety model is for jvmr -- AFAIK R is single threaded, but we run tasks in multiple threads in Spark. Thanks Shivaram On Sat, Apr 12, 2014 at 12:16 PM, pawan kumar wrote: > Hi Shivaram, > > I was able to get R integrated into spark using jvmr. Now i call R from > scala and pass values to the R function using scala ( Spark Streaming > values). Attached the application. I can also call sparkR but not sure > where to pass the spark context in regards to the application attached. Any > help would be greatly appreciated. > You can run the application from scala IDE. Let me know if you have any > difficulties running this application. > > > Dependencies : > Having R installed in the machine. > > Thank > Pawan Kumar Venugopal > > > > On Mon, Apr 7, 2014 at 3:38 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> You can create standalone jobs in SparkR as just R files that are run >> using the sparkR script. These commands will be sent to a Spark cluster and >> the examples on the SparkR repository ( >> https://github.com/amplab-extras/SparkR-pkg#examples-unit-tests) are in >> fact standalone jobs. >> >> However I don't think that will completely solve your use case of using >> Streaming + R. We don't yet have a way to call R functions from Spark's >> Java or Scala API. So right now one thing you can try is to save data from >> SparkStreaming to HDFS and then run a SparkR job which reads in the file. >> >> Regarding the other idea of calling R from Scala -- it might be possible >> to do that in your code if the classpath etc. is setup correctly. I haven't >> tried it out though, but do let us know if you get it to work. >> >> Thanks >> Shivaram >> >> >> On Mon, Apr 7, 2014 at 2:21 PM, pawan kumar wrote: >> >>> Hi, >>> >>> Is it possible to create a standalone job in scala using sparkR? If >>> possible can you provide me with the information of the setup process. >>> (Like the dependencies in SBT and where to include the JAR files) >>> >>> This is my use-case: >>> >>> 1. I have a Spark Streaming standalone Job running in local machine >>> which streams twitter data. >>> 2. I have an R script which performs Sentiment Analysis. >>> >>> I am looking for an optimal way where I could combine these two >>> operations into a single job and run using "SBT Run" command. >>> >>> I came across this document which talks about embedding R into scala ( >>> http://dahl.byu.edu/software/jvmr/dahl-payne-uppalapati-2013.pdf) but >>> was not sure if that would work well within the spark context. >>> >>> Thanks, >>> Pawan Venugopal >>> >>> >> >
Re: Creating a SparkR standalone job
Thanks Shivaram! Will give it a try and let you know. Regards, Pawan Venugopal On Mon, Apr 7, 2014 at 3:38 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > You can create standalone jobs in SparkR as just R files that are run > using the sparkR script. These commands will be sent to a Spark cluster and > the examples on the SparkR repository ( > https://github.com/amplab-extras/SparkR-pkg#examples-unit-tests) are in > fact standalone jobs. > > However I don't think that will completely solve your use case of using > Streaming + R. We don't yet have a way to call R functions from Spark's > Java or Scala API. So right now one thing you can try is to save data from > SparkStreaming to HDFS and then run a SparkR job which reads in the file. > > Regarding the other idea of calling R from Scala -- it might be possible > to do that in your code if the classpath etc. is setup correctly. I haven't > tried it out though, but do let us know if you get it to work. > > Thanks > Shivaram > > > On Mon, Apr 7, 2014 at 2:21 PM, pawan kumar wrote: > >> Hi, >> >> Is it possible to create a standalone job in scala using sparkR? If >> possible can you provide me with the information of the setup process. >> (Like the dependencies in SBT and where to include the JAR files) >> >> This is my use-case: >> >> 1. I have a Spark Streaming standalone Job running in local machine which >> streams twitter data. >> 2. I have an R script which performs Sentiment Analysis. >> >> I am looking for an optimal way where I could combine these two >> operations into a single job and run using "SBT Run" command. >> >> I came across this document which talks about embedding R into scala ( >> http://dahl.byu.edu/software/jvmr/dahl-payne-uppalapati-2013.pdf) but >> was not sure if that would work well within the spark context. >> >> Thanks, >> Pawan Venugopal >> >> >
Re: Creating a SparkR standalone job
You can create standalone jobs in SparkR as just R files that are run using the sparkR script. These commands will be sent to a Spark cluster and the examples on the SparkR repository ( https://github.com/amplab-extras/SparkR-pkg#examples-unit-tests) are in fact standalone jobs. However I don't think that will completely solve your use case of using Streaming + R. We don't yet have a way to call R functions from Spark's Java or Scala API. So right now one thing you can try is to save data from SparkStreaming to HDFS and then run a SparkR job which reads in the file. Regarding the other idea of calling R from Scala -- it might be possible to do that in your code if the classpath etc. is setup correctly. I haven't tried it out though, but do let us know if you get it to work. Thanks Shivaram On Mon, Apr 7, 2014 at 2:21 PM, pawan kumar wrote: > Hi, > > Is it possible to create a standalone job in scala using sparkR? If > possible can you provide me with the information of the setup process. > (Like the dependencies in SBT and where to include the JAR files) > > This is my use-case: > > 1. I have a Spark Streaming standalone Job running in local machine which > streams twitter data. > 2. I have an R script which performs Sentiment Analysis. > > I am looking for an optimal way where I could combine these two operations > into a single job and run using "SBT Run" command. > > I came across this document which talks about embedding R into scala ( > http://dahl.byu.edu/software/jvmr/dahl-payne-uppalapati-2013.pdf) but was > not sure if that would work well within the spark context. > > Thanks, > Pawan Venugopal > >
Creating a SparkR standalone job
Hi, Is it possible to create a standalone job in scala using sparkR? If possible can you provide me with the information of the setup process. (Like the dependencies in SBT and where to include the JAR files) This is my use-case: 1. I have a Spark Streaming standalone Job running in local machine which streams twitter data. 2. I have an R script which performs Sentiment Analysis. I am looking for an optimal way where I could combine these two operations into a single job and run using "SBT Run" command. I came across this document which talks about embedding R into scala ( http://dahl.byu.edu/software/jvmr/dahl-payne-uppalapati-2013.pdf) but was not sure if that would work well within the spark context. Thanks, Pawan Venugopal