On Mon, Jan 12, 2015 at 8:14 PM, Meethu Mathew meethu.mat...@flytxt.com wrote:
Hi,
This is the function defined in PythonMLLibAPI.scala
def findPredict(
data: JavaRDD[Vector],
wt: Object,
mu: Array[Object],
si: Array[Object]): RDD[Array[Double]] = {
}
So the
It's not necessary, I will create a PR to remove them.
For larger dict/list/tuple, the pickle approach may have less RPC
calls, better performance.
Davies
On Tue, Jan 13, 2015 at 4:53 AM, Meethu Mathew meethu.mat...@flytxt.com wrote:
Hi all,
In the python object to java conversion done in
If it is a small collection of them on the driver, you can just use
sc.parallelize to create an RDD.
On Tue, Jan 13, 2015 at 7:56 AM, Malith Dhanushka mmali...@gmail.com
wrote:
Hi Reynold,
Thanks for the response. I am just wondering, lets say we have set of Row
objects. Isn't there a
Depends on what the other side is doing. You can create your own RDD
implementation by subclassing RDD, or it might work if you use
sc.parallelize(1 to n, n).mapPartitionsWithIndex( /* code to read the data
and return an iterator */ ) where n is the number of partitions.
On Tue, Jan 13, 2015 at
Hi,
We have a custom datasources API, which connects to various data sources
and exposes them out as a common API. We are now trying to implement the
Spark datasources API released in 1.2.0 to connect Spark for analytics.
Looking at the sources API, we figured out that we should extend a scan
Hi all,
In the python object to java conversion done in the method _py2java in
spark/python/pyspark/mllib/common.py, why we are doing individual
conversion using MpaConverter,ListConverter? The same can be acheived
using
bytearray(PickleSerializer().dumps(obj))
obj =
Dear all,
I think MLlib needs more clustering algorithms and DBSCAN is my first
candidate. I am starting to implement it. Any advice?
Muhammad-Ali
I have to say, I have created a Jira task for it:
[SPARK-5226] Add DBSCAN Clustering Algorithm to MLlib - ASF JIRA
| |
| | | | | |
| [SPARK-5226] Add DBSCAN Clustering Algorithm to MLlib - ASF JIRAMLlib is all
k-means now, and I think we should add some new clustering algorithms to