Hello,
I finally figured out my schedule for this summer and the conclusion is that I
would be able to dedicate about 20 hours per week for the GSoC project. As far
as I understand, this is about half of what is expected from a GSoC student, so
unfortunately I think I should not apply this year. I want to contribute to the
Commons Math library nonetheless.
Best regards,
Alina
From: Thomas Neidhart <[email protected]>
To: Commons Developers List <[email protected]>
Sent: Tuesday, February 3, 2015 1:17 AM
Subject: Re: [Math] Contributions to the clustering module (maybe GSoC)
On 02/02/2015 10:36 PM, Alina Ciobanu wrote:
> Hello Thomas,
>
>
> Thank you for the answer. I hope I will be able to clarify my schedule for
> the summer in about a week from now and I will decide whether I should apply
> to GSoC this year or not. I will let you know as soon as I can. Until then, I
> will shortly describe my first ideas below:
>
>
> 1. Spectral clustering [1] - It basically maps the data in a
> lower-dimensional space (relying on the eigenvectors of the similarity
> matrix) and performs (k-means) clustering there. This method can resolve a
> wide variety of problems, regardless of the form of the clusters. It could be
> implemented efficiently using the Commons Math linear algebra module.
>
>
> 2. Mean shift algorithm [2] - I didn't grasp all the details of the algorithm
> yet, but I find it very interesting. As far as I understand, it has been
> primarily used in pattern recognition and computer vision. I discovered it
> while searching for an algorithm that does not require the number of clusters
> as input parameter. I think it would be a good addition to Commons Math
> besides DBSCAN, from this point of view.
>
>
> 3. Clustering evaluation methods3.1. The Silhouette Coefficient [3] -
> accounts for the intra-cluster and inter-cluster distance to assign a score
> in [-1, 1] to a clustering.3.2. External clustering evaluation [4] - when
> gold standard is available for the clustered data, it can be used to asses
> the performance of a clustering algorithm.
>
>
> Suggestions are more than welcome. If you have requests from users for
> specific clustering algorithms, please let me know.
You proposals sound good, as a pointer to already existing feature
requests you can take a look at:
* Optics algorithm - https://issues.apache.org/jira/browse/MATH-1190
* HAC algorithm - https://issues.apache.org/jira/browse/MATH-959
Cluster evaluation would also be very interesting, I already wanted to
do something in this direction but could not find the time.
btw. by coincidence, we received a reminder about this years GSOC just
today, the deadline is 13-02-2015 to submit a project proposal with
project ideas.
Thomas
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]