date:20140709

Contribution to MLlib

2014-07-09 Thread MEETHU MATHEW

Hi, I am interested in contributing a clustering algorithm towards MLlib of Spark.I am focusing on Gaussian Mixture Model. But I saw a JIRA @ https://spark-project.atlassian.net/browse/SPARK-952 regrading the same.I would like to know whether Gaussian Mixture Model is already implemented or

Re: Contribution to MLlib

2014-07-09 Thread RJ Nowling

Hi Meethu, There is no code for a Gaussian Mixture Model clustering algorithm in the repository, but I don't know if anyone is working on it. RJ On Wednesday, July 9, 2014, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I am interested in contributing a clustering algorithm towards MLlib

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-09 Thread RJ Nowling

Thanks everyone for the input. So it seems what people want is: * Implement MiniBatch KMeans and Hierarchical KMeans (Divide and conquer approach, look at DecisionTree implementation as a reference) * Restructure 3 Kmeans clustering algorithm implementations to prevent code duplication and

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-09 Thread Nick Pentreath

Cool seems like a god initiative. Adding a couple extra high quality clustering implantations will be great. I'd say it would make most sense to submit a PR for the Standardised API first, agree that with everyone and then build on it for the specific implementations. — Sent from Mailbox On

Unresponsive to PR/jira changes

2014-07-09 Thread Mridul Muralidharan

Hi, I noticed today that gmail has been marking most of the mails from spark github/jira I was receiving to spam folder; and I was assuming it was lull in activity due to spark summit for past few weeks ! In case I have commented on specific PR/JIRA issues and not followed up, apologies for

Re: Contribution to MLlib

2014-07-09 Thread Xiangrui Meng

I don't know if anyone is working on it either. If that JIRA is not moved to Apache JIRA, feel free to create a new one and make a note that you are working on it. Thanks! -Xiangrui On Wed, Jul 9, 2014 at 4:56 AM, RJ Nowling rnowl...@gmail.com wrote: Hi Meethu, There is no code for a Gaussian

15 new MLlib algorithms

2014-07-09 Thread Michael Malak

At Spark Summit, Patrick Wendell indicated the number of MLlib algorithms would roughly double in 1.1 from the current approx. 15. http://spark-summit.org/wp-content/uploads/2014/07/Future-of-Spark-Patrick-Wendell.pdf What are the planned additional algorithms? In Jira, I only see two when

CPU/Disk/network performance instrumentation

2014-07-09 Thread Kay Ousterhout

Hi all, I've been doing a bunch of performance measurement of Spark and, as part of doing this, added metrics that record the average CPU utilization, disk throughput and utilization for each block device, and network throughput while each task is running. These metrics are collected by reading

Re: CPU/Disk/network performance instrumentation

2014-07-09 Thread Reynold Xin

Maybe it's time to create an advanced mode in the ui. On Wed, Jul 9, 2014 at 12:23 PM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Hi all, I've been doing a bunch of performance measurement of Spark and, as part of doing this, added metrics that record the average CPU utilization, disk

Re: 15 new MLlib algorithms

2014-07-09 Thread Burak Yavuz

Hi, The roadmap for the 1.1 release and MLLib includes algorithms such as: Non-negative matrix factorization, Sparse SVD, Multiclass decision tree, Random Forests (?) and optimizers such as: ADMM, Accelerated gradient methods also a statistical toolbox that includes: descriptive statistics,

Re: CPU/Disk/network performance instrumentation

2014-07-09 Thread Shivaram Venkataraman

I think it would be very useful to have this. We could put the ui display either behind a flag or a url parameter Shivaram On Wed, Jul 9, 2014 at 12:25 PM, Reynold Xin r...@databricks.com wrote: Maybe it's time to create an advanced mode in the ui. On Wed, Jul 9, 2014 at 12:23 PM, Kay

Re: ExecutorState.LOADING?

2014-07-09 Thread Kay Ousterhout

Git history to the rescue! It seems to have been added by Matei way back in July 2012: https://github.com/apache/spark/commit/5d1a887bed8423bd6c25660910d18d91880e01fe and then was removed a few months later (replaced by RUNNING) by the same Mr. Zaharia:

Re: ExecutorState.LOADING?

2014-07-09 Thread Mark Hamstra

Actually, I'm thinking about re-purposing it. There's a nasty behavior that I'll open a JIRA for soon, and that I'm thinking about addressing by introducing/using another ExecutorState transition. The basic problem is that Master can be overly aggressive in calling removeApplication on

Re: ExecutorState.LOADING?

2014-07-09 Thread Aaron Davidson

Agreed that the behavior of the Master killing off an Application when Executors from the same set of nodes repeatedly die is silly. This can also strike if a single node enters a state where any Executor created on it quickly dies (e.g., a block device becomes faulty). This prevents the

Testing period for better jenkins integration

2014-07-09 Thread Patrick Wendell

Just a heads up - I've added some better Jenkins integration that posts more useful messages on pull requests. We'll run this side-by-side with the current Jenkins messages for a while to make sure it's working well. Things may be a bit chatty while we are testing this - we can migrate over as

libgfortran Dependency

2014-07-09 Thread Taka Shinagawa

Hi, After testing Spark 1.0.1-RC2 on EC2 instances from the standard Ubuntu and Amazon Linux AMIs, I've noticed the MLlib's dependancy on gfortran library (libgfortran.so.3). sbt assembly succeeds without this library installed, but sbt test fails as follows. I'm wondering if documenting this

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-09 Thread qingyang li

could i set some cache policy to let spark load data from tachyon only one time for all sql query? for example by using CacheAllPolicy FIFOCachePolicy LRUCachePolicy. But I have tried that three policy, they are not useful. I think , if spark always load data for each sql query, it will impact

Re: libgfortran Dependency

2014-07-09 Thread Xiangrui Meng

It is documented in the official doc: http://spark.apache.org/docs/latest/mllib-guide.html On Wed, Jul 9, 2014 at 7:35 PM, Taka Shinagawa taka.epsi...@gmail.com wrote: Hi, After testing Spark 1.0.1-RC2 on EC2 instances from the standard Ubuntu and Amazon Linux AMIs, I've noticed the MLlib's

Re: libgfortran Dependency

2014-07-09 Thread Taka Shinagawa

Thanks for point me to the MLlib guide. I was looking at only README and Spark docs. Also found it's already filed in JIRA https://spark-project.atlassian.net/browse/SPARK-797 On Wed, Jul 9, 2014 at 7:45 PM, Xiangrui Meng men...@gmail.com wrote: It is documented in the official doc:

Contribution to MLlib

Re: Contribution to MLlib

Re: Contributing to MLlib: Proposal for Clustering Algorithms

Re: Contributing to MLlib: Proposal for Clustering Algorithms

Unresponsive to PR/jira changes

Re: Contribution to MLlib

15 new MLlib algorithms

CPU/Disk/network performance instrumentation

Re: CPU/Disk/network performance instrumentation

Re: 15 new MLlib algorithms

Re: CPU/Disk/network performance instrumentation

Re: ExecutorState.LOADING?

Re: ExecutorState.LOADING?

Re: ExecutorState.LOADING?

Testing period for better jenkins integration

libgfortran Dependency

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

Re: libgfortran Dependency

Re: libgfortran Dependency

19 matches

Site Navigation

Mail list logo

Footer information