I'm using;

org.apache.spark.mllib.clustering.KMeans.train(data.rdd(), 3, 20);

Cpu cores: 8 (using default Spark conf thought)

On partitions, I'm not sure how to find that.

On Mon, Jul 13, 2015 at 11:30 PM, Burak Yavuz <brk...@gmail.com> wrote:

> What are the other parameters? Are you just setting k=3? What about # of
> runs? How many partitions do you have? How many cores does your machine
> have?
>
> Thanks,
> Burak
>
> On Mon, Jul 13, 2015 at 10:57 AM, Nirmal Fernando <nir...@wso2.com> wrote:
>
>> Hi Burak,
>>
>> k = 3
>> dimension = 785 features
>> Spark 1.4
>>
>> On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz <brk...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> How are you running K-Means? What is your k? What is the dimension of
>>> your dataset (columns)? Which Spark version are you using?
>>>
>>> Thanks,
>>> Burak
>>>
>>> On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nir...@wso2.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of
>>>> time (16+ mints).
>>>>
>>>> It takes lot of time at this task;
>>>>
>>>> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33)
>>>> org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70)
>>>>
>>>> Can this be improved?
>>>>
>>>> --
>>>>
>>>> Thanks & regards,
>>>> Nirmal
>>>>
>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>> Mobile: +94715779733
>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>


-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

Reply via email to