For large datasets, you need hashing in order to compute k-nearest
neighbors locally. You can start with LSH + k-nearest in Google
scholar: http://scholar.google.com/scholar?q=lsh+k+nearest -Xiangrui
On Tue, Jan 20, 2015 at 9:55 PM, DEVAN M.S. msdeva...@gmail.com wrote:
Hi all,
Please help me
Hi Sandy, thanks for the reply.
I tried to run this code without the cache and it worked.
Also if I cache before repartition, it also works, the problem seems to be
something related with repartition and caching.
My train is a SchemaRDD, and if I make all my columns as StringType, the
error
Hi Dirceu,
Does the issue not show up if you run map(f =
f(1).asInstanceOf[Int]).sum on the train RDD? It appears that f(1) is
an String, not an Int. If you're looking to parse and convert it, toInt
should be used instead of asInstanceOf.
-Sandy
On Wed, Jan 21, 2015 at 8:43 AM, Dirceu
Yep,
I think it's only useful (and likely to be maintained) if we actually
use this on Jenkins. So that was my proposal. Basically give people a
docker file so they can understand exactly what versions of everything
we use for our reference build. And if they don't want to use docker
directly,
Hi guys, have anyone find something like this?
I have a training set, and when I repartition it, if I call cache it throw
a classcastexception when I try to execute anything that access it
val rep120 = train.repartition(120)
val cached120 = rep120.cache
cached120.map(f =
Hi All
Any thoughts, comments or questions regarding the proposal outlined at
https://issues.apache.org/jira/browse/SPARK-5267?
Cheers
Steve
- - - - - - - - - - - - - - - - - -
This private and confidential e-mail has been sent to you by Synergy Systems
Limited. It may not represent the
Hi,
The test suites in the Kmeans class in clustering.py is not updated to
take the seed value and hence it is failing.
Shall I make the changes and submit it along with my PR( Python API for
Gaussian Mixture Model) or create a JIRA ?
Regards,
Meethu
Hi,
Sorry it was my mistake. My code was not properly built.
Regards,
Meethu
_http://www.linkedin.com/home?trk=hb_tab_home_top_
On Thursday 22 January 2015 10:39 AM, Meethu Mathew wrote:
Hi,
The test suites in the Kmeans class in clustering.py is not updated to
take the seed value and
If the goal is a reproducible test environment then I think that is what
Jenkins is. Granted you can only ask it for a test. But presumably you get
the same result if you start from the same VM image as Jenkins and run the
same steps.
But the issue is when users can't reproduce Jenkins