Hi all,
I am also interested in specifying a common framework.
And I am trying to implement a hierarchical k-means and a hierarchical
clustering like single-link method with LSH.
https://issues.apache.org/jira/browse/SPARK-2966
If you have designed the standardized clustering algorithms API,
Has anyone had a chance to look at this paper (with title in subject)?
http://www.cs.rice.edu/~lp6/comparison.pdf
Interesting that they chose to use Python alone. Do we know how much faster
Scala is vs. Python in general, if at all?
As with any and all benchmarks, I'm sure there are caveats, but
They only compared their own implementations of couple algorithms on
different platforms rather than comparing the different platforms
themselves (in the case of Spark -- PySpark). I can write two variants of
an algorithm on Spark and make them perform drastically differently.
I have no doubt if
Our experience matches Reynold's comments; pure-Python implementations of
anything are generally sub-optimal compared to pure Scala implementations,
or Scala versions exposed to Python (which are faster, but still slower than
pure Scala). It also seems on first glance that some of the
On a related note, I recently heard about Distributed R
https://github.com/vertica/DistributedR, which is coming out of
HP/Vertica and seems to be their proposition for machine learning at scale.
It would be interesting to see some kind of comparison between that and
MLlib (and perhaps also
Actually I believe the same person started both projects.
The Distributed R project from HP was started by Shivaram Venkataraman when
he was there. He since moved to Berkeley AMPLab to pursue a PhD and SparkR
was his latest project.
On Wed, Aug 13, 2014 at 1:04 PM, Nicholas Chammas
BTW you can find the original Presto (rebranded as Distributed R) paper
here:
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Venkataraman.pdf
On Wed, Aug 13, 2014 at 2:16 PM, Reynold Xin r...@databricks.com wrote:
Actually I believe the same person started both projects.
The
Yeah I worked on DistributedR while I was an intern at HP Labs, but it has
evolved a lot since then. I don't think its a direct comparison as
DistributedR is a pure R implementation in a distributed setting while
SparkR is a wrapper around the Scala / Java implementations in Spark.
That said, it
I've created a new pull request, which can be found at
https://github.com/apache/spark/pull/1929. Since Spark is using Scala
2.10.3 and there is a known issue with Scala 2.10.x not supporting the :cp
command (https://issues.scala-lang.org/browse/SI-6502), the Spark shell
does not have the
I haven't read the code yet, but if it is what I think it is, this is
SUPER, UBER, HUGELY useful.
On a related note, I asked about this on the Scala dev list but never got a
satisfactory answer
https://groups.google.com/forum/#!msg/scala-internals/_cZ1pK7q6cU/xyBQA0DdcYwJ
On Wed, Aug 13,
On Wed, Aug 13, 2014 at 2:16 PM, Ignacio Zendejas
ignacio.zendejas...@gmail.com wrote:
Yep, I thought it was a bogus comparison.
I should rephrase my question as it was poorly phrased: on average, how
much faster is Spark v. PySpark (I didn't really mean Scala v. Python)?
I've only used Spark
I now have a complete pull request for this issue that I'd like to get
reviewed and committed. The PR is available here:
https://github.com/apache/spark/pull/1890 and includes a testcase for the
issue I described. I've also submitted a related PR (
https://github.com/apache/spark/pull/1827) that
On Wed, Aug 13, 2014 at 2:31 PM, Davies Liu dav...@databricks.com wrote:
On Wed, Aug 13, 2014 at 2:16 PM, Ignacio Zendejas
ignacio.zendejas...@gmail.com wrote:
Yep, I thought it was a bogus comparison.
I should rephrase my question as it was poorly phrased: on average, how
much faster is
Hi,
I am new to Spark and want to explore more on Spark's master-worker/Cluster
manager communication architecture.
Any documents ? or code pointers will be helpful to start with.
Thanks!
--
View this message in context:
Hi Aniket,
Perhaps this video will help:
https://www.youtube.com/watch?v=HG2Yd-3r4-Mlist=PLTPXxbhUt-YWGNTaDj6HSjnHMxiTD1HCRindex=1
You can see other upto date videos and slides here at :
http://spark-summit.org/2014/training
Best regards,
Rajiv
2014-08-13 19:36 GMT-04:00 aniketadnaik
Dear all:
Does spark can acquire resources from and give back resources to
YARN dynamically ?
--
*Regards,*
*Zhaojie*
Hi devs,
I posted a design doc proposing an interface for pluggable block transfer
(used in shuffle, broadcast, block replication, etc). This is expected to
be done in 1.2 time frame.
It should make our code base cleaner, and enable us to provide alternative
implementations of block transfers
17 matches
Mail list logo