Vipul,
Thanks for your feedback. As far as I understand, mean RDD[(Double,
Double)] (note the parenthesis), and each of these Double values is
supposed to contain one coordinate of a point. It limits us to
2-dimensional space, which is not suitable for many tasks. I want the
algorithm to be able to work in multidimensional space. Actually, there is
a class org.alitouka.spark.dbscan.spatial.Point in my code, which
represents a point with an arbitrary number of coordinates.
IOHelper.readDataset is just a convenience method which reads a CSV file
and returns an RDD of Points (more precisely, it returns a value of type
RawDataset, which is just an alias for RDD[Point]). If your data is stored
in a format other than CSV, you will have to write your own code to convert
your data to RawDataset.
I can add support for other data formats in future versions.
As for other distance measures - it is a high priority issue in my list ;)
On Thu, Jun 12, 2014 at 6:02 PM, Vipul Pandey wrote:
> Great! I was going to implement one of my own - but I may not need to do
> that any more :)
> I haven't had a chance to look deep into your code but I would recommend
> accepting an RDD[Double,Double] as well, instead of just a file.
>
> val data = IOHelper.readDataset(sc, "/path/to/my/data.csv")
>
> And other distance measures ofcourse.
>
> Thanks,
> Vipul
>
>
>
>
> On Jun 12, 2014, at 2:31 PM, Aliaksei Litouka
> wrote:
>
> Hi.
> I'm not sure if messages like this are appropriate in this list; I just
> want to share with you an application I am working on. This is my personal
> project which I started to learn more about Spark and Scala, and, if it
> succeeds, to contribute it to the Spark community.
>
> Maybe someone will find it useful. Or maybe someone will want to join
> development.
>
> The application is available at https://github.com/alitouka/spark_dbscan
>
> Any questions, comments, suggestions, as well as criticism are welcome :)
>
> Best regards,
> Aliaksei Litouka
>
>
>