Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-12 Thread Jayant Shekhar
xample. > > Thanks. > > On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar > wrote: > >> Hello Chetan, >> >> We have currently done it with .pipe(.py) as Prem suggested. >> >> That passes the RDD as CSV strings to the python script. The python >> script can

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Jayant Shekhar
Hello Chetan, We have currently done it with .pipe(.py) as Prem suggested. That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write

Re: UI for spark machine learning.

2017-07-10 Thread Jayant Shekhar
Hello Mahesh, We have built one. You can download from here : https://www.sparkflows.io/download Feel free to ping me for any questions, etc. Best Regards, Jayant On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker < mahesh_sawai...@persistent.com> wrote: > Hi, > > > 1) Is anyone aware of any

Re: Shall I use Apache Zeppelin for data analytics & visualization?

2017-04-17 Thread Jayant Shekhar
Hello Gaurav, Pre-calculating the results would obviously be a great idea - and load the results into a serving store from where you serve it out to your customers - as suggested by Jorn. And run it every hour/day, depending on your requirements. Zeppelin (as mentioned by Ayan) would not be a

Re: Any NLP library for sentiment analysis in Spark?

2017-04-12 Thread Jayant Shekhar
Hello Gaurav, Yes, Stanford CoreNLP is of course great to use too! You can find sample code here and pull the UDF's into your project : https://github.com/sparkflows/sparkflows-stanfordcorenlp Thanks, Jayant On Tue, Apr 11, 2017 at 8:44 PM, Gaurav Pandya wrote: >

Re: How to use scala.tools.nsc.interpreter.IMain in Spark, just like calling eval in Perl.

2016-06-30 Thread Jayant Shekhar
Hi Fanchao, This is because it is unable to find the anonymous classes generated. Adding the below code worked for me. I found the details here : https://github.com/cloudera/livy/blob/master/repl/src/main/scala/com/cloudera/livy/repl/SparkInterpreter.scala // Spark 1.6 does not have

Re: Running into issue using SparkIMain

2016-06-29 Thread Jayant Shekhar
On Mon, Jun 27, 2016 at 5:53 PM, Jayant Shekhar <jayantbaya...@gmail.com> wrote: > I tried setting the classpath explicitly in the settings. Classpath gets > printed properly, it has the scala jars in it like > scala-compiler-2.10.4.jar, scala-library-2.10.4.jar. > > It did

Re: Running into issue using SparkIMain

2016-06-27 Thread Jayant Shekhar
asspath=" + classpath); settings.classpath.value = classpath.distinct.mkString(java.io.File.pathSeparator) settings.embeddedDefaults(cl) -Jayant On Mon, Jun 27, 2016 at 3:19 PM, Jayant Shekhar <jayantbaya...@gmail.com> wrote: > Hello, > > I'm trying to run scala code

Running into issue using SparkIMain

2016-06-27 Thread Jayant Shekhar
Hello, I'm trying to run scala code in a Web Application. It runs great when I am running it in IntelliJ Run into error when I run it from the command line. Command used to run -- java -Dscala.usejavacp=true -jar target/XYZ.war

Re: Running JavaBased Implementation of StreamingKmeans Spark

2016-06-24 Thread Jayant Shekhar
Hi Biplop, Can you try adding new files to the training/test directories after you have started your streaming application! Especially the test directory as you are printing your predictions. On Fri, Jun 24, 2016 at 2:32 PM, Biplob Biswas wrote: > > Hi, > > I

Re: Spark ml and PMML export

2016-06-23 Thread Jayant Shekhar
Thanks Philippe! Looking forward to trying it out. I am on >= 1.6 Jayant On Thu, Jun 23, 2016 at 1:24 AM, philippe v wrote: > Hi, > > You can try this lib : https://github.com/jpmml/jpmml-sparkml > > I'll try it soon... you need to be in >=1.6 > > Philippe > > > > -- >

Re: Spark ml and PMML export

2016-06-23 Thread Jayant Shekhar
Thanks a lot Nick! Its very helpful. On Wed, Jun 22, 2016 at 11:47 PM, Nick Pentreath wrote: > Currently there is no way within Spark itself. You may want to check out > this issue (https://issues.apache.org/jira/browse/SPARK-11171) and here > is an external project

Getting a DataFrame back as result from SparkIMain

2016-06-21 Thread Jayant Shekhar
Hi, I have written a program using SparkIMain which creates an RDD and I am looking for a way to access that RDD in my normal spark/scala code for further processing. The code below binds the SparkContext:: sparkIMain.bind("sc", "org.apache.spark.SparkContext", sparkContext,

Re: MLLib: LinearRegressionWithSGD performance

2014-11-21 Thread Jayant Shekhar
Hi Sameer, You can try increasing the number of executor-cores. -Jayant On Fri, Nov 21, 2014 at 11:18 AM, Sameer Tilak ssti...@live.com wrote: Hi All, I have been using MLLib's linear regression and I have some question regarding the performance. We have a cluster of 10 nodes -- each

Re: MLLib: LinearRegressionWithSGD performance

2014-11-21 Thread Jayant Shekhar
Hi Sameer, You can also use repartition to create a higher number of tasks. -Jayant On Fri, Nov 21, 2014 at 12:02 PM, Jayant Shekhar jay...@cloudera.com wrote: Hi Sameer, You can try increasing the number of executor-cores. -Jayant On Fri, Nov 21, 2014 at 11:18 AM, Sameer Tilak

Re: Is Spark streaming suitable for our architecture?

2014-10-23 Thread Jayant Shekhar
Hi Albert, Have a couple of questions: - You mentioned near real-time. What exactly is your SLA for processing each document? - Which crawler are you using and are you looking to bring in Hadoop into your overall workflow. You might want to read up on how network traffic is

Re: How to access objects declared and initialized outside the call() method of JavaRDD

2014-10-23 Thread Jayant Shekhar
+1 to Sean. Is it possible to rewrite your code to not use SparkContext in RDD. Or why does javaFunctions() need the SparkContext. On Thu, Oct 23, 2014 at 10:53 AM, Localhost shell universal.localh...@gmail.com wrote: Bang On Sean Before sending the issue mail, I was able to remove the

Re: Oryx + Spark mllib

2014-10-19 Thread Jayant Shekhar
Hi Deb, Do check out https://github.com/OryxProject/oryx. It does integrate with Spark. Sean has put in quite a bit of neat details on the page about the architecture. It has all the things you are thinking about:) Thanks, Jayant On Sat, Oct 18, 2014 at 8:49 AM, Debasish Das

Re: How does the Spark Accumulator work under the covers?

2014-10-10 Thread Jayant Shekhar
Hi Areg, Check out http://spark.apache.org/docs/latest/programming-guide.html#accumulators val sum = sc.accumulator(0) // accumulator created from an initial value in the driver The accumulator variable is created in the driver. Tasks running on the cluster can then add to it. However, they

Re: window every n elements instead of time based

2014-10-08 Thread Jayant Shekhar
Hi Michael, I think you are meaning batch interval instead of windowing. It can be helpful for cases when you do not want to process very small batch sizes. HDFS sink in Flume has the concept of rolling files based on time, number of events or size.

Re: Using GraphX with Spark Streaming?

2014-10-06 Thread Jayant Shekhar
Arko, It would be useful to know more details on the use case you are trying to solve. As Tobias wrote, Spark Streaming works on DStream, which is a continuous series of RDDs. Do check out performance tuning :

Re: Debugging cluster stability, configuration issues

2014-08-21 Thread Jayant Shekhar
Hi Shay, You can try setting spark.storage.blockManagerSlaveTimeoutMs to a higher value. Cheers, Jayant On Thu, Aug 21, 2014 at 1:33 PM, Shay Seng s...@urbanengines.com wrote: Unfortunately it doesn't look like my executors are OOM. On the slave machines I checked both the logs in