Re: [R] reshape data frame

2013-11-18 Thread tsippel
Thanks, that seems to work.


On Fri, Nov 15, 2013 at 10:26 PM, arun kirshna [via R] <
ml-node+s789695n4680556...@n4.nabble.com> wrote:

> Hi,
>
> Try:
> var1 <- load("reshape_data.frame.RData")
> ##It is better not to name the objects with function names.
> dat1 <- data
>  reshape1 <- reshape
> names(dat1)[grep("X\\d+",names(dat1))] <-
> gsub("[[:alpha:]]","X_",names(dat1)[grep("X\\d+",names(dat1))])
> res1 <- reshape(dat1,direction="long",varying=7:ncol(dat1),sep="_")
> res2 <- res1[with(res1,order(Yr,Seas,Flt.Svy,Gender,Part,NSAMP)),-9]
> row.names(res2) <- 1:nrow(res2)
> colnames(res2) <- colnames(reshape1)
> all.equal(res2,reshape1)
> #[1] TRUE
>
> A.K.
>
>
> Some advice on transforming my data would be appreciated.  Attached is
> an .Rdata image with my examples (reshape_data.frame.Rdata). Within the
> image are 2 objects:
>
> 1) The "data" object contains an example of my original data
> format. The columns "X20" : "X50" are histogram bins, and the rows
> beneath are the number of observations (ie. counts) within those bins.
> The other columns; "Yr", "Seas", "Flt.Svy", "Gender", "Part", "NSAMP"
> are observation associated with each bin count.
>
> 2) The "reshape" object is the format I need to transform the "data"
> object into.
>
> I think the reshape() function is designed for this, and my
> "data" object is in wide format, and my "resphape" object would be in
> long format.  If indeed reshape() is the best way to do this, I'm
> looking for help in calling the function to transform my data.
>
> Many thanks,
>
> Tim
>
> __
> [hidden email] mailing 
> list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/reshape-data-frame-tp4680539p4680556.html
>  To unsubscribe from reshape data frame, click 
> here
> .
> NAML
>




--
View this message in context: 
http://r.789695.n4.nabble.com/reshape-data-frame-tp4680539p4680707.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Distribution of cluster medoids

2012-03-30 Thread tsippel
When looking at the histogram distribution of medoids from a cluster
analysis, clara{cluster}, they are close to normally distributed around
zero.  The cluster plot, clusplot{cluster}, does not suggest distinct
partitions.  How should the histogram distribution of medoids be
interpreted?  Can one say that there is a relationship between lack of
obvious partitioning, and the distribution of the mediods?  If clustering
was apparent, would it then be expected that medoids would not be
distributed normally?
Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting data from trellis object

2012-01-25 Thread tsippel
Simple question (I hope?).  How does one extract the data from a trellis
object? I need to get the data from a call to histogram().

A simple example below...

dat<-data.frame(year=as.factor(round(runif(500, 1990, 2010))), x=rnorm(500,
75, 35),
  y=rep(seq(1,50,by=1), times=10))
histogram(data=dat, y~x | year, type='count')

How do I get the actual numbers plotted in the histograms from this call to
histogram()?

Thanks,

Tim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reshaping and plotting tabular data

2011-04-14 Thread tsippel
Hi-
Tabular data have been provided to me within .csv files. I need to transform
the data from tabular format into a dataframe with three
columns. The columns need to be the table row id, table column id, and the
tabulated variable. An example dataset can be downloaded here:

https://docs.google.com/leaf?id=0B0d3zfSSPFQsOGFkYThhYTYtOTE2Zi00NTNkLThmMzYtNDEyMTY5MjRiN2Qy&sort=name&layout=list&num=50

So I need that to be like:

Season   Area Variable
2009-10   385
2008-09   38NA
...etc
2009-10   391
2008-09   393
...etc

>From there I need to show 'Variable' as a series of bubbles scaled to
summary categories of 'Variable'.
If x is my object storing the data frame above, something like:

x$var.cat[x$Variable %in% 1:5] <- 1
x$var.cat[x$Variable %in% 6:10] <- 2

plot(x=x$Area, y=x$Season, type='n')
points(x=x$Area, y=x$Season, pch=1, cex=x$var.categ)

Transforming tabular data has turned out to be trickier than I expected and
I would appreciate advice. Although my bubble plot
scheme should work, it would be nice to hear of better ways on that too...

Thanks,
Tim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-smirnov test

2011-02-18 Thread tsippel
Is the kolmogorov-smirnov test valid on both continuous and discrete data?
 I don't think so, and the example below helped me understand why.

A suggestion on testing the discrete data would be appreciated.

Thanks,

a <- rnorm(1000, 10, 1);a # normal distribution a
b <- rnorm(1000, 12, 1.5);b # normal distribution b
c <- rnorm(1000, 8, 1);c # normal distribution c
d <- rnorm(1000, 12, 2.5);d # normal distribution d

par(mfrow=c(2,2), las=1)
ahist<-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of a
bhist<-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of b
chist<-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of c
dhist<-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of d

ks.test(c(a,b), c(c,d), alternative="two.sided") # kolmogorov-smirnov on
continuous data
ks.test(c(ahist$density, bhist$density), c(chist$density, dhist$density),
alternative="two.sided") # kolmogorov-smirnov on discrete data

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with nested loops

2009-04-14 Thread tsippel

Hello-
I need to loop through a directory of files to extract data corresponding to
dates in a dataframe.  Within a function, I've written nested loops to index
the dataframe dates, and the directory files.  My function successfully
extracts the data corresponding to my first data frame date, but stops there
and doesn't continue through my entire list of data frame dates and
directory files (I need it to go through ~2600 lines in my data frame, and
~50 data files in my directory).

Currently, I'm using an 'if' statement to see if my data frame dates and
directory file names meet the criteria for data extraction, which is I think
why it stops after the first successful iteration. However, I'm stumped on
what to do now.  I have tried using 'while' statements which seem to hang up
in an endless loop.  

Some advice on getting the function to finish looping through all of my data
frame lines and directory files would be much appreciated.  I've also read
that apply, tapply, etc. can be more efficient at looping tasks, but my
experimenting with them has not been much help and I'm not sure my task is
best suited to those functions. 

Below is an example. I'm using R 2.7.2 in Vista.

 Example
extract.asc.data.for.tracks<-function(id, d.frame.date, centroid.x,
centroid.y, ci.x, ci.y, asc.dir, asc.date.start,
  asc.date.end, asc.duration, env.variable) {
  id<-id 
  d.frame.date<-as.POSIXct(d.frame.date, "GMT", format="%Y-%m-%d")
  require(RSAGA)
  # set environment for RSAGA
  env<-rsaga.env(path="C:\\programs\\saga_vc", cmd="saga_cmd.exe")
  # format directory file names for input into data extraction function
below called 'pick.from.ascii.grid'
  asc.files<-substr(basename(dir(asc.dir, pattern='.asc$')), 1,
nchar(basename(dir(asc.dir, pattern='.asc$'))) - 4) 
  # index and loop through dates dataframe
  for (j in 1 : length(asc.files)){ 
# index and loop through datafiles in directory  
for (i in 1 : length(d.frame.date)){   
  # find dates within the named directory of files that are within 7
days of any given dataframe date
  if ((as.POSIXct(substr(asc.files[j], start = nchar(asc.files[j]) -
asc.date.start,
stop = nchar(asc.files[j]) - asc.date.end), 'GMT') -
d.frame.date[i]) %in% 1:asc.duration)
# when date criteria are met, extract data using coordinates(x and
y) plus a buffer (ci.x and ci.y)
dat<-pick.from.ascii.grid(data=as.data.frame(cbind(x=centroid.x[i] +
ci.x, y=centroid.y[i] + ci.y)),  env=env, cbind=T,  
path=asc.dir, file=paste(asc.files[j],'asc', sep="."), at.once=F,
method="nearest.neighbour", nodata.values=-999)
# calculate weighted mean of extracted data
w.mean<-weighted.mean(x=dat[,3], w=den.xt$y*den.yt$y, na.rm=T)
  # return new data frame containing original id, date, x/y coords, and
newly calculated weighted mean
  out.df<-data.frame(id[i], d.frame.date[i], centroid.x[i], centroid.y[i],
w.mean[i])
  # give names to columns in new data frame
  names(out.df)<-c('id', 'date', 'x', 'y', paste(env.variable, 'w.mean',
sep="_"))  
  return(out.df)
  }}}
  
# subset of my data frame
> stm.data
 iddatex y
10 STM05_2 2005-03-01 12:00:00 178.2606 -34.82035
11 STM05_2 2005-03-02 00:00:00 178.2281 -34.38141
12 STM05_2 2005-03-02 12:00:00 178.2145 -33.95625
13 STM05_2 2005-03-03 00:00:00 178.2123 -33.55642
14 STM05_2 2005-03-03 12:00:00 178.2056 -33.18816
15 STM05_2 2005-03-04 00:00:00 178.1920 -32.85041
16 STM05_2 2005-03-04 12:00:00 178.1722 -32.54057
17 STM05_2 2005-03-05 00:00:00 178.1465 -32.25634
18 STM05_2 2005-03-05 12:00:00 178.1150 -31.99568
19 STM05_2 2005-03-06 00:00:00 178.0777 -31.75682

# some of the data files in directory which I need to extract data from
> asc.dir
[1] "tmp1_290E0N2005-03-02"  "tmp1_290E0N2006-01-05" 
"tmp1_290E0N2006-07-08"  "tmp1_290E0N2007-02-22"  "tmp10_290E0N2005-05-13"
"tmp10_290E0N2006-03-18"

 Arguements to run the function
extract.asc.data.for.tracks(id=stm.data$id, d.frame.date=stm.data$date,
centroid.x=stm.data$x, centroid.y=stm.data$y, 
  ci.x=seq(-1,1,0.1), ci.y=seq(-1,1,0.1), asc.date.start=9, asc.date.end=0,
asc.duration=7, env.variable="SST",
  asc.dir="C:\\...\\data.files")

# the result of the function successfully extracting data from a single row
in my data frame, but I need to do this for ~ 2600 dataframe lines
  id   datex y SST_w.mean
1 STM05_2 2005-03-01 178.2606 -34.82035   21.97327  


Many thanks,

Tim
-- 
View this message in context: 
http://www.nabble.com/Help-with-nested-loops-tp23049250p23049250.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in segmented() output from segmented package

2009-01-26 Thread tsippel

Hi-
I'm gettting the following error message when trying to use the segmented
function to look for breakpoints in my data.  

Error in segmented.glm(glm, seg.Z = ~segmentdist, psi = 2, control =
seg.control(display = F),  : 
  (Some) estimated psi out of its range

Here are some real data and the models I'm calling which gives the error
above.  

> segmentdist
 [1]  0.00  8.547576 12.700485 13.291767 15.701552 17.567891 18.936836
19.846242 20.325434 20.397607 20.066126 17.976218 16.772871 16.513030
16.434075
[16] 16.508426 16.717404 17.049235 17.501350 18.077070

> dal
 [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5
9.0 9.5

lm<-lm(data=df, segmentdist~dal)

lm(formula = segmentdist ~ dal, data = df)

Coefficients:
(Intercept)  dal  
   13.77564 -0.06682 

seg<-segmented(lm, seg.Z=~segmentdist, psi=2,
control=seg.control(display=F), model.frame=T)

The range of the data I'm looking for breaks in is min=0, max=44.5, so I
don't understand how my psi=2 could be out of range.  

Thanks for your help,

Tim
-- 
View this message in context: 
http://www.nabble.com/Error-in-segmented%28%29-output-from-segmented-package-tp21674240p21674240.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset exact values

2009-01-22 Thread tsippel

Hi-
I need to subset the following data by the column 'dal' for values that
equal the regular interval seq(0, 150, by=0.5) exactly

excluding rows with irregular 'dal' values such as  c(2.888958,
2.891620), etc.

data<-data.frame(id=id, dal=dal, date=date, mu.x=mu.x)

$dal
 [1] 0.00 0.50 1.00 1.50 2.00 2.50 2.888958 2.891620
3.00 3.245405 3.50 3.688333 4.00 4.50 4.738831 4.855949
[17] 4.993437 5.00 5.251875 5.252037 5.50

$id
 [1] STM05.3 STM05.3 STM05.3 STM05.3 STM05.3 STM05.3 STM05.3 STM05.3 STM05.3
STM05.3 STM05.3 STM05.3 STM05.3 STM05.3 STM05.3 STM05.3 STM05.3 STM05.3
[19] STM05.3 STM05.3 STM08.2

$date
 [1] "2005-02-26 00:00:00" "2005-02-26 12:00:00" "2005-02-27 00:00:00"
"2005-02-27 12:00:00" "2005-02-28 00:00:00" "2005-02-28 12:00:00"
 [7] "2005-02-28 21:20:06" "2005-02-28 21:23:56" "2005-03-01 00:00:00"
"2005-03-01 05:53:23" "2005-03-01 12:00:00" "2005-03-01 16:31:12"
[13] "2005-03-02 00:00:00" "2005-03-02 12:00:00" "2005-03-02 17:43:55"
"2005-03-02 20:32:34" "2005-03-02 23:50:33" "2005-03-03 00:00:00"
[19] "2005-03-03 06:02:42" "2005-03-03 06:02:56" "2005-03-03 12:00:00"

$mu.x
 [1] 176.2730 176.3550 176.4996 176.6969 176.9209 177.1511 177.3197 177.3197
177.3109 177.2958 177.5862 177.7929 177.7612 177.6563 177.6030 177.6421
[17] 177.6932 177.6949 177.7377 177.7377 177.7468

Why does the following subset code give me rows of data with irregular 'dal'
values which do not exactly match values in 'seq' below?  

seq<-seq(0, 150, 0.5)
reg<-subset(data, dal=seq, select=c(id, dal, date, mu.x))

Thanks,

Tim

-- 
View this message in context: 
http://www.nabble.com/subset-exact-values-tp21615855p21615855.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Alignment of image plot overlay

2009-01-08 Thread tsippel

I'm having trouble with alignment of a trend line overlayed onto an image
plot.  The two should be plotted on the same x-axis (time-series).  However,
the trend line begins about an inch into the image plot x-axis and ends
about an inch off of end of the image plot.  Once I have the alignment
sorted, I need to put a secondary y-axis on the image plot which is scaled
for the trend line.  An example plot is attached.  My code follows.

tad.image(ptt.tad, dbins, interp=T, loess.interp=F, ylim=c(300,1),
main="STM07.4", zlim=c(0,1))
axis(4, at=c(1,2,3,4), labels=c(1,2,3,4), tick=T, las=1)
par(new=T)
plot(x=stm$dal, y=stm$model, ann=F, axes=F, type="l", col="black", lwd=2)  

Ideas involving the use of par(usr=c(,,,)) haven't solved the issue, and
attempting to convert the x-axis coordinates using the function grconvertX()
hasn't worked either.  Using the axis function as shown here hasn't helped
either.  

The function above called tad.image() was written by someone else, but it
calls the image() function to make the plot shown.  



http://www.nabble.com/file/p21362251/test.plot.jpeg test.plot.jpeg 
-- 
View this message in context: 
http://www.nabble.com/Alignment-of-image-plot-overlay-tp21362251p21362251.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional operation on multiple columns from two data frames

2008-12-28 Thread tsippel

The suggestion below was made.

df1$Date <- as.Date(df1$Date)
df2$Date <- as.Date(df2$Date)
 
ifelse(df1$ID==df2$ID & df1$Date-df2$Date<0.5,df1$y-df2$y, NA)

However, because my dataframe rows do not align, I need the conditionals to
be tested on every combination of cells.  I'm starting to think I need to
use tapply?  

Tim  
-- 
View this message in context: 
http://www.nabble.com/Conditional-operation-on-multiple-columns-from-two-data-frames-tp21189891p21197566.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.