These numbers are from GPUs and Intel MKL (a closed-source math library for
Intel processors), where for CPU-bound algorithms you are going to get faster
speeds than MLlib's JBLAS. However, there's in theory nothing preventing the
use of these in MLlib (e.g. if you have a faster BLAS locally;
I am comparing the total time spent in finishing the job. And What I am
comparing, to be precise, is on a 48-core machine. I am comparing the
performance of local[48] vs. standalone mode with 8 nodes of 6 cores each
(totalling 48 cores) on localhost. In this comparison, the standalone mode
I see. There should not be a significant algorithmic difference between
those two cases, as far as I can think, but there is a good bit of
local-mode-only logic in Spark.
One typical problem we see on large-heap, many-core JVMs, though, is much
more time spent in garbage collection. I'm not sure
Hi,
The recently added NNLS implementation in MLlib returns wrong solutions.
This is not data specific, just try any data in R's nnls, and then the same
data in MLlib's NNLS. The results are very different.
Also, the elected algorithm Polyak(1969) is not the best one around. The
most popular one
Ah, I understand now. That sounds pretty useful and is something we would
currently plan very inefficiently.
On Sun, Jul 27, 2014 at 1:07 AM, Christos Kozanitis kozani...@berkeley.edu
wrote:
Thanks Michael for the recommendations. Actually the region-join (or I
could name it range-join or
I think this is nice to have. Feel free to create a JIRA for it and it
would be great if you can send a PR. Thanks! -Xiangrui
On Thu, Jul 24, 2014 at 12:39 PM, SK skrishna...@gmail.com wrote:
Hi,
The mllib.clustering.kmeans implementation supports a random or parallel
initialization mode to
Based on some discussions with my application users, I have been trying to come
up with a standard way to deploy applications built on Spark
1. Bundle the version of spark with your application and ask users store it in
hdfs before referring it in yarn to boot your application
2. Provide ways
Thank you. It works.
(I've applied the changed source code to my local 1.0.0 source)
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Friday, July 25, 2014 11:47 PM
To: user@spark.apache.org
Subject: Re: Strange exception on coalesce()
I'm pretty sure this was
Mayur,
I don't know if I exactly understand the context of what you are asking,
but let me just mention issues I had with deploying.
* As my application is a streaming application, it doesn't read any files
from disk, so therefore I have no Hadoop/HDFS in place and I there is no
need for it,
Hi Team,
Could you please help me on below query.
I'm using JavaStreamingContext to read streaming files from hdfs shared
directory. When i start spark streaming job it is reading files from hdfs
shared directory and doing some process. When i stop and restart the job it
is again reading old
Hi Andrew,
Thanks for the reply, I figured out the cause of the issue. Some resource
files were missing in JARs. A class initialization depends on the resource
files so it got that exception.
I appended the resource files explicitly to --jars option and it worked
fine.
The Caused by... messages
11 matches
Mail list logo