[R] help formatting data for clustering

Raphael Bauduin Tue, 13 Nov 2012 02:48:58 -0800

Hi,

I'm a R beginner. I have data of this form:


user_id, brand_id1, brand_id2, .....

for example:
1 , 45 , 32, 45, 23
2 , 34
4, 11, 43, 45

I'm looking for the right procedure to be able to cluster users. I am
especially interested to know which functions to use at each step.

I am currently able to load the data in a data frame, each row's name being
the user id.

#extract user brands, ie all collumn except the first
user_brands <- userclustering[,-1]

# extract user ids, ie the first column
user_ids  <- userclustering[,1]

# set user ids as row name
row.names(user_brands) <- user_ids

But now I'm stuck replacing the brand ids by a count for each brand the
user ordered, all other brand counters being implicitely 0 for that user.

Then I'll need to be sure I can use it for clustering (normalising, correct
handling of brands absent from a user's list, etc).

thanks in advance for your help!

Raph

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help formatting data for clustering

Reply via email to