xample.
>
> Thanks.
>
> On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar
> wrote:
>
>> Hello Chetan,
>>
>> We have currently done it with .pipe(.py) as Prem suggested.
>>
>> That passes the RDD as CSV strings to the python script. The python
>> script can
Hello Chetan,
We have currently done it with .pipe(.py) as Prem suggested.
That passes the RDD as CSV strings to the python script. The python script
can either process it line by line, create the result and return it back.
Or create things like Pandas Dataframe for processing and finally write
Hello Mahesh,
We have built one. You can download from here :
https://www.sparkflows.io/download
Feel free to ping me for any questions, etc.
Best Regards,
Jayant
On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker <
mahesh_sawai...@persistent.com> wrote:
> Hi,
>
>
> 1) Is anyone aware of any
Hello Gaurav,
Pre-calculating the results would obviously be a great idea - and load the
results into a serving store from where you serve it out to your customers
- as suggested by Jorn.
And run it every hour/day, depending on your requirements.
Zeppelin (as mentioned by Ayan) would not be a
Hello Gaurav,
Yes, Stanford CoreNLP is of course great to use too!
You can find sample code here and pull the UDF's into your project :
https://github.com/sparkflows/sparkflows-stanfordcorenlp
Thanks,
Jayant
On Tue, Apr 11, 2017 at 8:44 PM, Gaurav Pandya
wrote:
>
Hi Fanchao,
This is because it is unable to find the anonymous classes generated.
Adding the below code worked for me. I found the details here :
https://github.com/cloudera/livy/blob/master/repl/src/main/scala/com/cloudera/livy/repl/SparkInterpreter.scala
// Spark 1.6 does not have
On Mon, Jun 27, 2016 at 5:53 PM, Jayant Shekhar <jayantbaya...@gmail.com>
wrote:
> I tried setting the classpath explicitly in the settings. Classpath gets
> printed properly, it has the scala jars in it like
> scala-compiler-2.10.4.jar, scala-library-2.10.4.jar.
>
> It did
asspath=" + classpath);
settings.classpath.value =
classpath.distinct.mkString(java.io.File.pathSeparator)
settings.embeddedDefaults(cl)
-Jayant
On Mon, Jun 27, 2016 at 3:19 PM, Jayant Shekhar <jayantbaya...@gmail.com>
wrote:
> Hello,
>
> I'm trying to run scala code
Hello,
I'm trying to run scala code in a Web Application.
It runs great when I am running it in IntelliJ
Run into error when I run it from the command line.
Command used to run
--
java -Dscala.usejavacp=true -jar target/XYZ.war
Hi Biplop,
Can you try adding new files to the training/test directories after you
have started your streaming application! Especially the test directory as
you are printing your predictions.
On Fri, Jun 24, 2016 at 2:32 PM, Biplob Biswas
wrote:
>
> Hi,
>
> I
Thanks Philippe! Looking forward to trying it out. I am on >= 1.6
Jayant
On Thu, Jun 23, 2016 at 1:24 AM, philippe v wrote:
> Hi,
>
> You can try this lib : https://github.com/jpmml/jpmml-sparkml
>
> I'll try it soon... you need to be in >=1.6
>
> Philippe
>
>
>
> --
>
Thanks a lot Nick! Its very helpful.
On Wed, Jun 22, 2016 at 11:47 PM, Nick Pentreath
wrote:
> Currently there is no way within Spark itself. You may want to check out
> this issue (https://issues.apache.org/jira/browse/SPARK-11171) and here
> is an external project
Hi,
I have written a program using SparkIMain which creates an RDD and I am
looking for a way to access that RDD in my normal spark/scala code for
further processing.
The code below binds the SparkContext::
sparkIMain.bind("sc", "org.apache.spark.SparkContext", sparkContext,
Hi Sameer,
You can try increasing the number of executor-cores.
-Jayant
On Fri, Nov 21, 2014 at 11:18 AM, Sameer Tilak ssti...@live.com wrote:
Hi All,
I have been using MLLib's linear regression and I have some question
regarding the performance. We have a cluster of 10 nodes -- each
Hi Sameer,
You can also use repartition to create a higher number of tasks.
-Jayant
On Fri, Nov 21, 2014 at 12:02 PM, Jayant Shekhar jay...@cloudera.com
wrote:
Hi Sameer,
You can try increasing the number of executor-cores.
-Jayant
On Fri, Nov 21, 2014 at 11:18 AM, Sameer Tilak
Hi Albert,
Have a couple of questions:
- You mentioned near real-time. What exactly is your SLA for processing
each document?
- Which crawler are you using and are you looking to bring in Hadoop
into your overall workflow. You might want to read up on how network
traffic is
+1 to Sean.
Is it possible to rewrite your code to not use SparkContext in RDD. Or why
does javaFunctions() need the SparkContext.
On Thu, Oct 23, 2014 at 10:53 AM, Localhost shell
universal.localh...@gmail.com wrote:
Bang On Sean
Before sending the issue mail, I was able to remove the
Hi Deb,
Do check out https://github.com/OryxProject/oryx.
It does integrate with Spark. Sean has put in quite a bit of neat details
on the page about the architecture. It has all the things you are thinking
about:)
Thanks,
Jayant
On Sat, Oct 18, 2014 at 8:49 AM, Debasish Das
Hi Areg,
Check out
http://spark.apache.org/docs/latest/programming-guide.html#accumulators
val sum = sc.accumulator(0) // accumulator created from an initial value
in the driver
The accumulator variable is created in the driver. Tasks running on the
cluster can then add to it. However, they
Hi Michael,
I think you are meaning batch interval instead of windowing. It can be
helpful for cases when you do not want to process very small batch sizes.
HDFS sink in Flume has the concept of rolling files based on time, number
of events or size.
Arko,
It would be useful to know more details on the use case you are trying to
solve. As Tobias wrote, Spark Streaming works on DStream, which is a
continuous series of RDDs.
Do check out performance tuning :
Hi Shay,
You can try setting spark.storage.blockManagerSlaveTimeoutMs to a higher
value.
Cheers,
Jayant
On Thu, Aug 21, 2014 at 1:33 PM, Shay Seng s...@urbanengines.com wrote:
Unfortunately it doesn't look like my executors are OOM. On the slave
machines I checked both the logs in
22 matches
Mail list logo