subject:"An attempt to implement dbscan algorithm on top of Spark"

Re: An attempt to implement dbscan algorithm on top of Spark

2014-06-12 Thread Aliaksei Litouka

Vipul,
Thanks for your feedback. As far as I understand, mean RDD[(Double,
Double)] (note the parenthesis), and each of these Double values is
supposed to contain one coordinate of a point. It limits us to
2-dimensional space, which is not suitable for many tasks. I want the
algorithm to be able to work in multidimensional space. Actually, there is
a class org.alitouka.spark.dbscan.spatial.Point in my code, which
represents a point with an arbitrary number of coordinates.

IOHelper.readDataset is just a convenience method which reads a CSV file
and returns an RDD of Points (more precisely, it returns a value of type
RawDataset, which is just an alias for RDD[Point]). If your data is stored
in a format other than CSV, you will have to write your own code to convert
your data to RawDataset.

I can add support for other data formats in future versions.

As for other distance measures - it is a high priority issue in my list ;)

On Thu, Jun 12, 2014 at 6:02 PM, Vipul Pandey  wrote:

> Great! I was going to implement one of my own - but I may not need to do
> that any more :)
> I haven't had a chance to look deep into your code but I would recommend
> accepting an RDD[Double,Double] as well, instead of just a file.
>
> val data = IOHelper.readDataset(sc, "/path/to/my/data.csv")
>
> And other distance measures ofcourse.
>
> Thanks,
> Vipul
>
>
>
>
> On Jun 12, 2014, at 2:31 PM, Aliaksei Litouka 
> wrote:
>
> Hi.
> I'm not sure if messages like this are appropriate in this list; I just
> want to share with you an application I am working on. This is my personal
> project which I started to learn more about Spark and Scala, and, if it
> succeeds, to contribute it to the Spark community.
>
> Maybe someone will find it useful. Or maybe someone will want to join
> development.
>
> The application is available at https://github.com/alitouka/spark_dbscan
>
> Any questions, comments, suggestions, as well as criticism are welcome :)
>
> Best regards,
> Aliaksei Litouka
>
>
>

Re: An attempt to implement dbscan algorithm on top of Spark

2014-06-12 Thread Vipul Pandey

Great! I was going to implement one of my own - but I may not need to do that 
any more :)
I haven't had a chance to look deep into your code but I would recommend 
accepting an RDD[Double,Double] as well, instead of just a file. 
val data = IOHelper.readDataset(sc, "/path/to/my/data.csv")
And other distance measures ofcourse. 

Thanks,
Vipul

On Jun 12, 2014, at 2:31 PM, Aliaksei Litouka  
wrote:

> Hi.
> I'm not sure if messages like this are appropriate in this list; I just want 
> to share with you an application I am working on. This is my personal project 
> which I started to learn more about Spark and Scala, and, if it succeeds, to 
> contribute it to the Spark community.
> 
> Maybe someone will find it useful. Or maybe someone will want to join 
> development.
> 
> The application is available at https://github.com/alitouka/spark_dbscan
> 
> Any questions, comments, suggestions, as well as criticism are welcome :)
> 
> Best regards,
> Aliaksei Litouka

An attempt to implement dbscan algorithm on top of Spark

2014-06-12 Thread Aliaksei Litouka

Hi.
I'm not sure if messages like this are appropriate in this list; I just
want to share with you an application I am working on. This is my personal
project which I started to learn more about Spark and Scala, and, if it
succeeds, to contribute it to the Spark community.

Maybe someone will find it useful. Or maybe someone will want to join
development.

The application is available at https://github.com/alitouka/spark_dbscan

Any questions, comments, suggestions, as well as criticism are welcome :)

Best regards,
Aliaksei Litouka

Re: An attempt to implement dbscan algorithm on top of Spark

Re: An attempt to implement dbscan algorithm on top of Spark

An attempt to implement dbscan algorithm on top of Spark

3 matches

Site Navigation

Mail list logo

Footer information