[R] Error evaluating partitioning around medoids clustering method R clValid package

2014-07-09 Thread Scott Davis
I have a data.frame with 300 observations of 36 numerical, categorical, and
NA variables. I am trying to evaluate the partitioning around medoids
clustering algorithm for a marketing segmentation study. My original
dataset has over 130,000 observations, but I took a sample for easy
reproducibility reasons.


My machine Mac OSX 10.9.3:


> sessionInfo()

R version 3.1.0 (2014-04-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)


Problem: Getting an error when doing internal and stability evaluation with
the clValid CRAN package in R.


Code:

#Convert csv to data.frame

frame <-as.data.frame(Smallstore1)

> library(cluster)

#Create dissimilarity matrix

#Gower coefficient for finding distance between mixed variables

> daisy1 <- daisy(frame, metric = "gower", type = list(ordratio =
c(1:36)))

#k-medoid algorithm with 3 clusters

> kanswers <- pam(daisy1, 3, diss = TRUE)

#Evaluate k-mediod clustering algorithm with 2 to 6 clusters

#Import clValid package

> library(clValid)

#Internal validation

> internval1 <- clValid(daisy1, 2:6, clMethods = "pam", validation =
"internal")

#Error in switch(class(obj), matrix = mat <- obj, ExpressionSet = mat
<-Biobase::exprs(obj),  : EXPR must be a length 1 vector

#Error in summary(internval1) :

  #error in evaluating the argument 'object' in selecting a method for
function 'summary': Error: object 'internval1' not found

#External validation

> stabval1 <- clValid(daisy1, 2:6, clMethods = "pam", validation =
"stability")

#Error in switch(class(obj), matrix = mat <- obj, ExpressionSet = mat
<- Biobase::exprs(obj),  : EXPR must be a length 1 vector


Data:


I put the data.frame in a dissimilarity matrix using the daisy function and
used partitioning around medoids with 3 clusters. The daisy and pam
functions come from the cluster CRAN package in R. Since the data.frame has
mixed values, the gower distance coefficient is used. Here's the head of
the first 7 variables, but I took out the names of the email for privacy
reasons.


> head(frame)

  user_id emailAge   Gender Household.Income
Marital.Status Presence .of.children

1   12945 @bellycard.com  Male
   

2   12947 @bellycard.com  Male
   

3   12990 @gmail.com  
   

4   13160 @gmail.com  25-34   Male100k-125k   Single
   No

5   13195 @gmail.com  Male75k-100kSingle
       No

6   13286 @gmail.com  
   


Please let me know if I can provide more information.
-- 
Scott Davis
Cell: (408)826-9561
Skype ID: Scdavis61
San Jose, CA.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error creating daisy matrix in R cluster package - Cannot allocate vector of size 66.0 Gb

2014-06-21 Thread Scott Davis
My purpose involves creating a dissimilarity matrix using the daisy package
in R before applying k-mediod clustering for customer segmentation. The
dataset has 133,153 observations of 35 variables in a data.frame with
numerical, categorical, blank cells and missing values. Missing values
refer to NA, while a blank cells means nothing present within the
data.frame.

Here’s my OS:

> sessionInfo()

R version 3.1.0 (2014-04-10)

Platform x86_64-w64-mingw32/x64 (64-bit)

I have 35 variables, but here is description of the first 5:

> head(df)

  user_idAgeGender  Household.Income  Marital.Status

1   12945 Male

2   12947 Male

3   12990

4   13160   25-34  Male 100k-125k   Single

5   13195 Male  75k-100kSingle

6   13286

Since the Windows computer has 3 Gb RAM, I increased the virtual memory to
100Gb hoping that would be enough to create the matrix - it didn't
work. I've looked into other R packages for solving the memory problem, but
they don't work. I cannot use the `bigmemory` with the `biganalytics`
package because it only accepts numeric matrices. The `clara` and `ff`
packages also accept only numeric matrices. Here's the daisy script:

#Load csv file

> Store1 <- read.csv("/Users/name/Client1.csv", head = TRUE)

#Convert csv to data.frame

> df <-as.data.frame(Store1)

#Increase memory allocation in R to 70 GB using the command:

> memory.limit(size = 7)

[1] 7

#Load cluster package

> library(cluster)

#Create daisy dissimilarity matrix

#Use Gower distance coefficient for mixed variables

#Set type as ratio scaled variable

> daisy1 <- daisy(df, metric = "gower”,

   type = list(ordratio = c(1:35)))

#Error: cannot allocate vector of size 66.0 Gb


How can I fix the error?
-- 
Scott Davis
Cell: (408)826-9561
Skype ID: Scdavis61
San Jose, CA.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.