[R] Formatted Data File Question for Clustering -Quickie Project

2007-06-13 Thread ngottlieb
I am trying to learn how to format Ascii data files for scan or read
into R.

Precisely for a quickie project, I found some code (at end of this
email) to do exactly what I need:
To cluster and graph a dendrogram from package (stats).

I am stuck on how to format a text file to run the script.
I looked at the dataset USArrests (which would be replaced by my data
and labels) using UltraEdit. That data appears to be in binary format
and I would simply like a readable ASCII text file.

How can I:
A) format this data to a file for the script below? 
B) I would like to use squared Euclidean distance, can hclust support
this?

Thanks,
Neil Gottlieb

Here is sub-set example of my data set, return series to cluster: 13
cases by 36 observations):
Month Convertible Arbitrage   Dedicated Short BiasEmerging
Markets 
1/31/1994   0.004   -0.016  0.105   
2/28/1994   0.002   0.020   -0.011  
3/31/1994   -0.010  0.072   -0.046  
4/30/1994   -0.025  0.013   -0.084  
5/31/1994   -0.010  0.023   -0.007  
6/30/1994   0.002   0.064   0.005   
7/31/1994   0.001   -0.012  0.058   
8/31/1994   0.000   -0.057  0.164   
9/30/1994   -0.012  0.016   0.052   
10/31/1994  -0.014  -0.004  -0.035  
11/30/1994  -0.002  0.030   -0.014  
12/31/1994  -0.019  -0.002  -0.042  
1/31/1995   -0.006  0.013   -0.100  
2/28/1995   0.012   -0.022  -0.079  
3/31/1995   0.013   0.004   -0.055  
4/30/1995   0.023   -0.004  0.073   
5/31/1995   0.017   -0.013  0.013   
6/30/1995   0.019   -0.069  0.008   
7/31/1995   0.009   -0.059  0.022   
8/31/1995   0.008   0.008   0.010   
9/30/1995   0.011   -0.029  0.019   
10/31/1995  0.013   0.064   -0.057  
11/30/1995  0.023   -0.010  -0.031  
12/31/1995  0.014   0.049   0.007   
1/31/1996   0.021   0.006   0.079   
2/29/1996   0.012   -0.056  -0.006  
3/31/1996   0.015   -0.009  -0.009  
4/30/1996   0.013   -0.066  0.051   
5/31/1996   0.016   0.000   0.045   
6/30/1996   0.015   0.051   0.054   
7/31/1996   0.014   0.098   -0.027  
8/31/1996   0.013   -0.034  0.036   
9/30/1996   0.011   -0.059  0.016   
10/31/1996  0.014   0.043   0.017   
11/30/1996  0.014   -0.029  0.026   

Code Example from Help files:
hc <- hclust(dist(USArrests), "ave")
(dend1 <- as.dendrogram(hc)) # "print()" method
str(dend1)  # "str()" method
str(dend1, max = 2) # only the first two sub-levels

op <- par(mfrow= c(2,2), mar = c(5,2,1,4))
plot(dend1)
## "triangle" type and show inner nodes:
plot(dend1, nodePar=list(pch = c(1,NA), cex=0.8, lab.cex = 0.8),
  type = "t", center=TRUE)
plot(dend1, edgePar=list(col = 1:2, lty = 2:3), dLeaf=1, edge.root =
TRUE)
plot(dend1, nodePar=list(pch = 2:1,cex=.4*2:1, col = 2:3), horiz=TRUE)

dend2 <- cut(dend1, h=70)
plot(dend2$upper)
## leafs are wrong horizontally:
plot(dend2$upper, nodePar=list(pch = c(1,7), col = 2:1))
##  dend2$lower is *NOT* a dendrogram, but a list of .. :
plot(dend2$lower[[3]], nodePar=list(col=4), horiz = TRUE, type = "tr")
## "inner" and "leaf" edges in different type & color :
plot(dend2$lower[[2]], nodePar=list(col=1),# non empty list
 edgePar = list(lty=1:2, col=2:1), edge.root=TRUE)
par(op)
str(d3 <- dend2$lower[[2]][[2]][[1]])

nP <- list(col=3:2, cex=c(2.0, 0.75), pch= 21:22, bg= c("light blue",
"pink"),
   lab.cex = 0.75, lab.col = "tomato")
plot(d3, nodePar= nP, edgePar = list(col="gray", lwd=2), horiz = TRUE)
addE <- function(n) {
  if(!is.leaf(n)) {
attr(n, "edgePar") <- list(p.col="plum")
attr(n, "edgetext") <- paste(attr(n,"members"),"members")
  }
  n
}
d3e <- dendrapply(d3, addE)
plot(d3e, nodePar= nP)
plot(d3e, nodePar= nP, leaflab = "textlike")




This information is being sent at the recipient's request or...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Formatted Data File Question for Clustering -Quickie Project

2007-06-13 Thread AA
if you look at
the data USArrests by doing
> data(USArrets)
> USArrets
you will see that it is a data.frame.
so by analogy you could do the following:
Probably you have this data in Excel (I guess from the format in your mail).
Have the data in a sheet as:

convertsshortBais 
19940131  0.004   -0.0016
19940228  ...

save this sheet as tab limited txt file.
then use
mydata <- read.table("yourdata.txt")
now you can use the data.frame mydata in the cluster analysis.
I would suggest you read the intro to R.
You can also use
read.csv see
?read.table
for more info, read R import/export on
http://finzi.psych.upenn.edu/R/doc/manual/R-data.html

good luck.
AA.
- Original Message - 
From: <[EMAIL PROTECTED]>
To: 
Sent: Wednesday, June 13, 2007 11:46 AM
Subject: [R] Formatted Data File Question for Clustering -Quickie Project


>I am trying to learn how to format Ascii data files for scan or read
> into R.
>
> Precisely for a quickie project, I found some code (at end of this
> email) to do exactly what I need:
> To cluster and graph a dendrogram from package (stats).
>
> I am stuck on how to format a text file to run the script.
> I looked at the dataset USArrests (which would be replaced by my data
> and labels) using UltraEdit. That data appears to be in binary format
> and I would simply like a readable ASCII text file.
>
> How can I:
> A) format this data to a file for the script below?
> B) I would like to use squared Euclidean distance, can hclust support
> this?
>
> Thanks,
> Neil Gottlieb
>
> Here is sub-set example of my data set, return series to cluster: 13
> cases by 36 observations):
> Month   Convertible Arbitrage   Dedicated Short Bias   Emerging
> Markets
> 1/31/1994 0.004 -0.016 0.105
> 2/28/1994 0.002 0.020 -0.011
> 3/31/1994 -0.010 0.072 -0.046
> 4/30/1994 -0.025 0.013 -0.084
> 5/31/1994 -0.010 0.023 -0.007
> 6/30/1994 0.002 0.064 0.005
> 7/31/1994 0.001 -0.012 0.058
> 8/31/1994 0.000 -0.057 0.164
> 9/30/1994 -0.012 0.016 0.052
> 10/31/1994 -0.014 -0.004 -0.035
> 11/30/1994 -0.002 0.030 -0.014
> 12/31/1994 -0.019 -0.002 -0.042
> 1/31/1995 -0.006 0.013 -0.100
> 2/28/1995 0.012 -0.022 -0.079
> 3/31/1995 0.013 0.004 -0.055
> 4/30/1995 0.023 -0.004 0.073
> 5/31/1995 0.017 -0.013 0.013
> 6/30/1995 0.019 -0.069 0.008
> 7/31/1995 0.009 -0.059 0.022
> 8/31/1995 0.008 0.008 0.010
> 9/30/1995 0.011 -0.029 0.019
> 10/31/1995 0.013 0.064 -0.057
> 11/30/1995 0.023 -0.010 -0.031
> 12/31/1995 0.014 0.049 0.007
> 1/31/1996 0.021 0.006 0.079
> 2/29/1996 0.012 -0.056 -0.006
> 3/31/1996 0.015 -0.009 -0.009
> 4/30/1996 0.013 -0.066 0.051
> 5/31/1996 0.016 0.000 0.045
> 6/30/1996 0.015 0.051 0.054
> 7/31/1996 0.014 0.098 -0.027
> 8/31/1996 0.013 -0.034 0.036
> 9/30/1996 0.011 -0.059 0.016
> 10/31/1996 0.014 0.043 0.017
> 11/30/1996 0.014 -0.029 0.026
>
> Code Example from Help files:
> hc <- hclust(dist(USArrests), "ave")
> (dend1 <- as.dendrogram(hc)) # "print()" method
> str(dend1)  # "str()" method
> str(dend1, max = 2) # only the first two sub-levels
>
> op <- par(mfrow= c(2,2), mar = c(5,2,1,4))
> plot(dend1)
> ## "triangle" type and show inner nodes:
> plot(dend1, nodePar=list(pch = c(1,NA), cex=0.8, lab.cex = 0.8),
>  type = "t", center=TRUE)
> plot(dend1, edgePar=list(col = 1:2, lty = 2:3), dLeaf=1, edge.root =
> TRUE)
> plot(dend1, nodePar=list(pch = 2:1,cex=.4*2:1, col = 2:3), horiz=TRUE)
>
> dend2 <- cut(dend1, h=70)
> plot(dend2$upper)
> ## leafs are wrong horizontally:
> plot(dend2$upper, nodePar=list(pch = c(1,7), col = 2:1))
> ##  dend2$lower is *NOT* a dendrogram, but a list of .. :
> plot(dend2$lower[[3]], nodePar=list(col=4), horiz = TRUE, type = "tr")
> ## "inner" and "leaf" edges in different type & color :
> plot(dend2$lower[[2]], nodePar=list(col=1),# non empty list
> edgePar = list(lty=1:2, col=2:1), edge.root=TRUE)
> par(op)
> str(d3 <- dend2$lower[[2]][[2]][[1]])
>
> nP <- list(col=3:2, cex=c(2.0, 0.75), pch= 21:22, bg= c("light blue",
> "pink"),
>   lab.cex = 0.75, lab.col = "tomato")
> plot(d3, nodePar= nP, edgePar = list(col="gray", lwd=2), horiz = TRUE)
> addE <- function(n) {
>  if(!is.leaf(n)) {
>attr(n, "edgePar") <- list(p.col="plum")
>attr(n, "edgetext") <- paste(attr(n,"members"),"members")
>  }
>  n
> }
> d3e <- dendrapply(d3, addE)
> plot(d3e, nodePar= nP)
> plot(d3e, nodePar= nP, leaflab = "textlike")
> -

Re: [R] Formatted Data File Question for Clustering -Quickie Project

2007-06-14 Thread Vladimir Eremeev

The "R Data Import/Export" guide was mentioned already, it contains
everything you should know about data exchange between R and other software.

In case it says nothing about dates, try as.Date() and strftime().
For your example below,
  as.Date("1/31/1994",format="%m/%d/%Y")
works.


ngottlieb wrote:
> 
> I am trying to learn how to format Ascii data files for scan or read
> into R.
> 
> Precisely for a quickie project, I found some code (at end of this
> email) to do exactly what I need:
> To cluster and graph a dendrogram from package (stats).
> 
> I am stuck on how to format a text file to run the script.
> I looked at the dataset USArrests (which would be replaced by my data
> and labels) using UltraEdit. That data appears to be in binary format
> and I would simply like a readable ASCII text file.
> 
> How can I:
> A) format this data to a file for the script below? 
> B) I would like to use squared Euclidean distance, can hclust support
> this?
> 
> Thanks,
> Neil Gottlieb
> 
> Here is sub-set example of my data set, return series to cluster: 13
> cases by 36 observations):
> Month   Convertible Arbitrage   Dedicated Short BiasEmerging
> Markets   
> 1/31/1994 0.004   -0.016  0.105   
> 2/28/1994 0.002   0.020   -0.011  
> 3/31/1994 -0.010  0.072   -0.046  
> 
> [skip]
> 

-- 
View this message in context: 
http://www.nabble.com/Formatted-Data-File-Question-for-Clustering--Quickie-Project-tf3915926.html#a5436
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.