You can use Breeze, which is part of spark distribution:
https://github.com/scalanlp/breeze/wiki/Breeze-Linear-Algebra
Check out the modules under import breeze._
On 23 May 2018 at 07:04, umargeek wrote:
> Hi Folks,
>
> I am planning to rewrite one of my python
Hi Johan,
DataFrames are building on top of RDDs, not sure if the ordering
issues are different there. Maybe you could create minimally large
enough simulated data and example series of transformations as an
example to experiment on.
Best,
-m
Mehmet Süzen, MSc, PhD
| PRIVILEGED
On 14 September 2017 at 10:42, wrote:
> val noTs = myData.map(dropTimestamp)
>
> val scaled = scaler.transform(noTs)
>
> val projected = (new RowMatrix(scaled)).multiply(principalComponents).rows
>
> val clusters = myModel.predict(projected)
>
> val result =
of partitions in mapPartition?
On 13 Sep 2017 19:54, "Ankit Maloo" <ankitmaloo1...@gmail.com> wrote:
>
> Rdd are fault tolerant as it can be recomputed using DAG without storing the
> intermediate RDDs.
>
> On 13-Sep-2017 11:16 PM, "Suzen, Mehmet" <
y a map operation can change sequence across a
> partition as partition is local and computation happens one record at a
> time.
>
> On 13-Sep-2017 9:54 PM, "Suzen, Mehmet" <su...@acm.org> wrote:
>
> I think the order has no meaning in RDDs see this post, specia
I think the order has no meaning in RDDs see this post, specially zip methods:
https://stackoverflow.com/questions/29268210/mind-blown-rdd-zip-method
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
...@gmail.com=22>
>
> On Wed, Aug 23, 2017 at 2:59 PM, Suzen, Mehmet <su...@acm.org> wrote:
>
>> It depends on what model you would like to train but models requiring
>> optimisation could use SGD with mini batches. See:
>> https://spark.apache.org/docs/latest/
It depends on what model you would like to train but models requiring
optimisation could use SGD with mini batches. See:
https://spark.apache.org/docs/latest/mllib-optimization.html#stochastic-gradient-descent-sgd
On 23 August 2017 at 14:27, Sea aj wrote:
> Hi,
>
> I am
On 3 August 2017 at 03:00, Vadim Semenov wrote:
> `saveAsObjectFile` doesn't save the DAG, it acts as a typical action, so it
> just saves data to some destination.
Yes, that's what I thought, so the statement "..otherwise saving it on
a file will require
On 3 August 2017 at 01:05, jeff saremi wrote:
> Vadim:
>
> This is from the Mastering Spark book:
>
> "It is strongly recommended that a checkpointed RDD is persisted in memory,
> otherwise saving it on a file will require recomputation."
Is this really true? I had the
I suggest RandomRDDs API. It provides nice tools. If you write
wrappers around that might be good.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs$
-
To unsubscribe e-mail:
There is a BigDL project:
https://github.com/intel-analytics/BigDL
On 20 June 2017 at 16:17, Jules Damji wrote:
> And we will having a webinar on July 27 going into some more details. Stay
> tuned.
>
> Cheers
> Jules
>
> Sent from my iPhone
> Pardon the dumb thumb typos :)
Hello List,
I was wondering what is the design principle that partition size of
an RDD is inherited from the parent. See one simple example below
[*]. 'ngauss_rdd2' has significantly less data, intuitively in such
cases, shouldn't spark invoke coalesce automatically for performance?
What would
Hello List,
I was wondering what is the design principle that partition size of
an RDD is inherited from the parent. See one simple example below
[*]. 'ngauss_rdd2' has significantly less data, intuitively in such
cases, shouldn't spark invoke coalesce automatically for performance?
What would
14 matches
Mail list logo