Re: Apply Kmeans in partitions

2019-01-30 Thread Apostolos N. Papadopoulos

Hi Dimitri,

what is the error you are getting, please specify.

Apostolos


On 30/1/19 16:30, dimitris plakas wrote:

Hello everyone,

I have a dataframe which has 5040 rows where these rows are splitted 
in 5 groups. So i have a column called "Group_Id" which marks every 
row with values from 0-4 depending on in which group every rows 
belongs to. I am trying to split my dataframe to 5 partitions and 
apply Kmeans to every partition. I have tried


rdd=mydataframe.rdd.mapPartitions(function, True)
test = Kmeans.train(rdd, num_of_centers, "random")

but i get an error.

How can i apply Kmeans to every partition?

Thank you in advance,


--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papad...@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Apply Kmeans in partitions

2019-01-30 Thread dimitris plakas
Hello everyone,

I have a dataframe which has 5040 rows where these rows are splitted in 5
groups. So i have a column called "Group_Id" which marks every row with
values from 0-4 depending on in which group every rows belongs to. I am
trying to split my dataframe to 5 partitions and apply Kmeans to every
partition. I have tried

rdd=mydataframe.rdd.mapPartitions(function, True)
test = Kmeans.train(rdd, num_of_centers, "random")

but i get an error.

How can i apply Kmeans to every partition?

Thank you in advance,