Re: Python to Java object conversion of numpy array

2015-01-13 Thread Davies Liu
On Mon, Jan 12, 2015 at 8:14 PM, Meethu Mathew meethu.mat...@flytxt.com wrote: Hi, This is the function defined in PythonMLLibAPI.scala def findPredict( data: JavaRDD[Vector], wt: Object, mu: Array[Object], si: Array[Object]): RDD[Array[Double]] = { } So the

Re: Use of MapConverter, ListConverter in python to java object conversion

2015-01-13 Thread Davies Liu
It's not necessary, I will create a PR to remove them. For larger dict/list/tuple, the pickle approach may have less RPC calls, better performance. Davies On Tue, Jan 13, 2015 at 4:53 AM, Meethu Mathew meethu.mat...@flytxt.com wrote: Hi all, In the python object to java conversion done in

Re: create a SchemaRDD from a custom datasource

2015-01-13 Thread Reynold Xin
If it is a small collection of them on the driver, you can just use sc.parallelize to create an RDD. On Tue, Jan 13, 2015 at 7:56 AM, Malith Dhanushka mmali...@gmail.com wrote: Hi Reynold, Thanks for the response. I am just wondering, lets say we have set of Row objects. Isn't there a

Re: create a SchemaRDD from a custom datasource

2015-01-13 Thread Reynold Xin
Depends on what the other side is doing. You can create your own RDD implementation by subclassing RDD, or it might work if you use sc.parallelize(1 to n, n).mapPartitionsWithIndex( /* code to read the data and return an iterator */ ) where n is the number of partitions. On Tue, Jan 13, 2015 at

create a SchemaRDD from a custom datasource

2015-01-13 Thread Niranda Perera
Hi, We have a custom datasources API, which connects to various data sources and exposes them out as a common API. We are now trying to implement the Spark datasources API released in 1.2.0 to connect Spark for analytics. Looking at the sources API, we figured out that we should extend a scan

Use of MapConverter, ListConverter in python to java object conversion

2015-01-13 Thread Meethu Mathew
Hi all, In the python object to java conversion done in the method _py2java in spark/python/pyspark/mllib/common.py, why we are doing individual conversion using MpaConverter,ListConverter? The same can be acheived using bytearray(PickleSerializer().dumps(obj)) obj =

DBSCAN for MLlib

2015-01-13 Thread Muhammad Ali A'råby
Dear all, I think MLlib needs more clustering algorithms and DBSCAN is my first candidate. I am starting to implement it. Any advice? Muhammad-Ali

Re: DBSCAN for MLlib

2015-01-13 Thread Muhammad Ali A'råby
I have to say, I have created a Jira task for it: [SPARK-5226] Add DBSCAN Clustering Algorithm to MLlib - ASF JIRA |   | |   |   |   |   |   | | [SPARK-5226] Add DBSCAN Clustering Algorithm to MLlib - ASF JIRAMLlib is all k-means now, and I think we should add some new clustering algorithms to