Re: [R] Intersecting two matrices

2013-07-31 Thread Jeff Newmiller
I would appreciate it if you would follow the Posting Guide and give a 
reproducible example and post all messages using plain text.

Try

m1 - matrix(sample(0:999,2*1057837,TRUE),ncol=2)
m2 - matrix(sample(0:999,2*951980,TRUE),ncol=2)
df1 - as.data.frame(m1)
df2 - as.data.frame(m2)
library(sqldf)
system.time(df3 - sqldf(SELECT DISTINCT df1.V1, df1.V2 FROM df1 INNER JOIN 
df2 ON df1.V1=df2.V1 AND df1.V2=df2.V2) )

The speed seems heavily dependent on how many rows are duplicated within the 
input data frames... so if the range of values is small then the query runs 
slower. Note also that moving the data from R to the database and back takes 
time... you may be able to import the data directly from your source data to 
the database and save some time. Read ?sqldf and ?read.csv.sql examples for 
more info.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

c char charlie.hsia...@gmail.com wrote:
I am not familiar with R's sort and sql libs. appreciate if you can
post a
code snippet when you got time. Thanks a lot!


On Tue, Jul 30, 2013 at 10:36 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.uswrote:

 In that case, you should be looking at a relational inner join,
perhaps
 with SQLite (see package sqldf).

---
 Jeff NewmillerThe .   .  Go
Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#.. 
Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#. 
rocks...1k

---
 Sent from my phone. Please excuse my brevity.

 c char charlie.hsia...@gmail.com wrote:
 Thanks a lot.
 Still looking for some super fast and memory efficient solution, as
the
 matrix I have in real world has billions of rows.
 
 
 On Mon, Jul 29, 2013 at 6:24 PM, William Dunlap wdun...@tibco.com
 wrote:
 
  I haven't looked at the size-time relationship, but im2 (below) is
 faster
  than your
  function on at least one example:
 
  intersectMat - function(mat1, mat2)
  {
  #mat1 and mat2 are both deduplicated
  nr1 - nrow(mat1)
  nr2 - nrow(mat2)
  mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ,
  drop=FALSE]
  }
 
  im2 - function(mat1, mat2)
  {
  stopifnot(ncol(mat1)==2, ncol(mat1)==ncol(mat2))
  toChar - function(twoColMat) paste(sep=\1, twoColMat[,1],
  twoColMat[,2])
  mat1[match(toChar(mat2), toChar(mat1), nomatch=0), ,
drop=FALSE]
  }
 
   m1 - cbind(1:1e7, rep(1:10, len=1e7))
   m2 - cbind(1:1e7, rep(1:20, len=1e7))
   system.time(r1 - intersectMat(m1,m2))
 user  system elapsed
   430.371.96  433.98
   system.time(r2 - im2(m1,m2))
 user  system elapsed
27.890.20   28.13
   identical(r1, r2)
  [1] TRUE
   dim(r1)
  [1] 500   2
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
   -Original Message-
   From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org]
  On Behalf
   Of c char
   Sent: Monday, July 29, 2013 4:04 PM
   To: r-help@r-project.org
   Subject: [R] Intersecting two matrices
  
   Dear all,
  
   I am interested to know a faster matrix intersection package for
R
  handles
   intersection of two integer matrices with ncol=2. Currently I am
 using my
   homemade code adapted from a previous thread:
  
  
   intersectMat - function(mat1, mat2){#mat1 and mat2 are both
   deduplicated  nr1 - nrow(mat1)  nr2 - nrow(mat2)
   mat2[duplicated(rbind(mat1, mat2))[(nr1 + 1):(nr1 + nr2)], ]}
  
  
   which handles:
   size A= 10578373
   size B= 9519807
   expected intersecting time= 251.2272
   intersecting for corssing MPRs took 409.602 seconds.
  
   scale a little bit worse than linearly but atomic operation is
not
 good.
   Wonder if a super fast C/C++ extension exists for this task.
Your
 ideas
  are
   appreciated.
  
   Thanks!
  
 [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible
code.
 
 
[[alternative HTML version deleted]]
 
 __

Re: [R] Plot a series of plots without using a loop

2013-07-31 Thread Rui Barradas

Hello,

There's a bug in the line

for (i in 1:length(dim(somdata.xyf$codes$X)[2]))

length() is always 1, you can use simply 1:dim(...)[2] or even simpler

for(i in 1:ncol(somdata.xyf$codes$X))

As for a way without a loop, you could use ?sapply:

sapply(1:ncol(somdata.xyf$codes$X), function(i) plot(...))

But I believe the loop is far more readable, and preferable.

Rui Barradas

Em 31-07-2013 00:25, Ben Harrison escreveu:

On 30 July 2013 21:35, Rui Barradas ruipbarra...@sapo.pt wrote:

Hello,

Maybe the following does it.

op - par(mfrow=c(2, 3))

for(i in 1:6){
 plot(somdata.xyf,
  type=property,
  property=somdata.xyf$codes$X[, i],
  main=colnames(somdata.xyf$codes$X)[i])
}

par(op)


Hope this helps,

Rui Barradas


Thanks Rui,
that does it for sure. I had come to that solution, but just realised
by looking at it again, I could change
for (i in 1:6)
with
for (i in 1:length(dim(somdata.xyf$codes$X)[2]))

I was also wondering if there was a way to do it without a for loop,
but in this case it's a very small number of iterations, so probably
not worth it.

Ben



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using If loop in R how to extract even and odd ids

2013-07-31 Thread ravi.raghava1
I have 500 ids ; i want to take out even and odd ids separately and store it
another data files.
How can it be done in R by using *If and for loop* ??




--
View this message in context: 
http://r.789695.n4.nabble.com/Using-If-loop-in-R-how-to-extract-even-and-odd-ids-tp4672707.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] List of lists

2013-07-31 Thread mohan . radhakrishnan
Hi Jim,

close(filedescriptors$cpufiledescriptors[[1]])
close(filedescriptors$cpufiledescriptors[[2]])
close(filedescriptors$cpufiledescriptors[[3]])

  I might be doing something wrong. Error is

   Error in UseMethod(close) :
  no applicable method for 'close' applied to an object of class c
('integer', 'numeric')


Thanks,
Mohan





   Re: [R] List of lists


   Jim Lemon
  to:   
 mohan.radhakrishnan, R-help@r-project.org  
   31-07-2013 03:05 
 AM 







On 07/30/2013 10:05 PM, mohan.radhakrish...@polarisft.com wrote:

 Hi,
 I am creating a list of 2 lists, one containing filenames
 and the other file descriptors.  When I retrieve them I am  unable to
close
 the file descriptor.

 I am getting this error when I try to call close(filedescriptors
 [[2]][[1]]).

 Error in UseMethod(close) :
no applicable method for 'close' applied to an object of class c
 ('integer', 'numeric')

 print(filedescriptors[[2]][[1]]) seems to be printing individual
elements.

 Thanks,
 Mohan

 filelist.array- function(n){
cpufile- list()
cpufiledescriptors- list()
length(cpufile)- n
for (i in 1:n) {
  cpufile[[i]]- paste(output, i, .txt, sep = )
cpufiledescriptors[[i]]-file( cpufile[[i]], a )
}
  listoffiles- list(cpufile=cpufile,
 cpufiledescriptors=cpufiledescriptors)
return (listoffiles)
 }



 #Test function

 test.filelist.array- function() {
filedescriptors- filelist.array(3)
  print(filedescriptors[[2]][[1]])
  print(filedescriptors[[2]][[2]])
  print(filedescriptors[[2]][[3]])

 }


Hi Mohan,
When you have opened connections as above, you need to pass the
connection, not just one element, to close:

close(listoffiles$cpufiledescriptors[[1]])

Jim





This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread Dark
Hi all,

I think this should be an easy question for the guru's out here.

I have this large data frame (2.500.000 rows, 15 columns) and I want to add
a column named SEGMENT to it.
The first 5% rows (first 125.000 rows) should have the value Top 5% in the
SEGMENT column
Then the rows from 5% to 20% should have the value 5 to 20
Then 20-50% should have the value 20 to 50
And the last 50% of the rows should have the value Bottom 50

What is the easiest way of doing this? I was thinking of using quantile but
then I should have some rownumber column.

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comparing real set vs sampled sets

2013-07-31 Thread PQuery
Dear R helper,

I have a statistic question. 
I have a vector of 500 values for which I need to assess the statistical
significance of occurrence

real.dist - realValues

For that, I sampled from my data large data pool 1000 other vectors of 500
values each.

I then run ks.test with my real vec vs each of the sampled vectors.

ks.res-unlist(lapply(l.sampled,function(x){
  ks - ks.test(real.dist, x$dist)
  as.numeric(ks[[statistic]])
}))

I now have 1000 D values with their corresponding p.values. How can I have
a general p.value saying that
my real data differs from the sampled one, and thus significant ?

Any suggestion ?
Many thanks,




--
View this message in context: 
http://r.789695.n4.nabble.com/comparing-real-set-vs-sampled-sets-tp4672709.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] parfm frailty model and post hoc testing

2013-07-31 Thread Raoul Van Oosten
Dear all,

I'm running a model with one fixed factor which has four groups called
species, and a clustering factor called nest. My dependent variable
(timeto) is ttm (time to moult) which is number of days
perindividualhttp://r.789695.n4.nabble.com/parfm-frailty-model-and-post-hoc-testing-td4672712.html#,
and the Status-variable is called moulted_final.
The code and its results are as follows.

library(parfm)
 Moult=read.table(file=HSBS R moult2.txt,header=T)

modelMoult=parfm(Surv(ttm,moulted_final)~species,cluster=nest,data=Moult,dist=weibull,frailty=possta)

Execution time: 12.72 second(s)
 anova(modelMoult)
Analysis of Deviance Table
Parametric frailty model: response is Surv(ttm, moulted_final)
Terms added sequentially (first to last)

 loglik  Chisq Df Pr(|Chi|)
NULL-346.61
species -341.35 10.514  1   0.001184 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As you can see there are significant differences among species and I would
like to know how to obtain these. I'm used to using linear models in which
post hoc testing gives you pairwise p-values, but I'm not sure if that is
how parfm works.

On a side note, all my samples have moulted so moulted_final has the same
state (1) for all samples.


Thanks in advance,
Raoul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] detect multivariate outliers with aq.plot {mvoutliers} high dimensions

2013-07-31 Thread monaR
Hei,
i have a species abundance data set CommData, with n (samples)=40 and p
(species)=107. 
Sample  Species A   Species B   Species C   Species D   ….
411_201040  20  0   0   
412_201030  20  0   0   
413_20100   0   0   0   
414_20100   10  0   0   
415_201020  0   0   0   
418_20100   0   0   0   
419_20100   0   0   0   
421_2010160 40  0   10  
….  

I try to find outliers based on the Mahalonis distance with the package
{mvoutliers}. I get an error using aq.plot(CommData): Error in covMcd(x,
alpha = quan) : n = p -- you can't be serious! 
SoI try pcout(CommData), which is supposed to work for high dimensions, but
get the error More than 50% equal values in one or more variables!

Can this be fixed? Any idea how i can find outliers in my multidimensional
data?
Thanks a lot for any help!!



--
View this message in context: 
http://r.789695.n4.nabble.com/detect-multivariate-outliers-with-aq-plot-mvoutliers-high-dimensions-tp4672714.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using If loop in R how to extract even and odd ids

2013-07-31 Thread Rui Barradas

Hello,

Who told you you need a loop or an if?


even - function(x) x %% 2 == 0

x - 1:50
idx - even(x)
x[idx]


Hope this helps,

Rui Barradas

Em 31-07-2013 08:46, ravi.raghava1 escreveu:

I have 500 ids ; i want to take out even and odd ids separately and store it
another data files.
How can it be done in R by using *If and for loop* ??




--
View this message in context: 
http://r.789695.n4.nabble.com/Using-If-loop-in-R-how-to-extract-even-and-odd-ids-tp4672707.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread Rui Barradas

Hello,

Combine quantile() with findInterval(). Something like the following.


# sample data
x - rnorm(100)

val - c(Bottom 50, 20 to 50, 5 to 20, Top 5%)
qq - quantile(x, probs = c(0, 0.50, 0.70, 0.95, 1))

idx - findInterval(x, qq)
val[idx]


Hope this helps,

Rui Barradas

Em 31-07-2013 10:37, Dark escreveu:

Hi all,

I think this should be an easy question for the guru's out here.

I have this large data frame (2.500.000 rows, 15 columns) and I want to add
a column named SEGMENT to it.
The first 5% rows (first 125.000 rows) should have the value Top 5% in the
SEGMENT column
Then the rows from 5% to 20% should have the value 5 to 20
Then 20-50% should have the value 20 to 50
And the last 50% of the rows should have the value Bottom 50

What is the easiest way of doing this? I was thinking of using quantile but
then I should have some rownumber column.

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using If loop in R how to extract even and odd ids

2013-07-31 Thread arun
Hi,
May be this helps:

set.seed(24)
dat1- data.frame(ID=1:500,value=rnorm(500))
res- split(dat1,dat1$ID%%2)


A.K.

- Original Message -
From: ravi.raghava1 ravi.ragh...@classle.co.in
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 31, 2013 3:46 AM
Subject: [R] Using If loop in R how to extract even and odd ids

I have 500 ids ; i want to take out even and odd ids separately and store it
another data files.
How can it be done in R by using *If and for loop* ??




--
View this message in context: 
http://r.789695.n4.nabble.com/Using-If-loop-in-R-how-to-extract-even-and-odd-ids-tp4672707.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread arun
Hi,
May be this helps:
set.seed(24)
dat1- data.frame(ID=1:500,value=rnorm(500))
indx-round(quantile(as.numeric(row.names(dat1)),probs=c(0.05,0.20,0.50,1)))
indx1-findInterval(row.names(dat1),indx,rightmost.closed=TRUE)
dat1$SEGMENT- as.character(factor(indx1,labels=c(Top 5%,5 to 20,20 to 
50, Bottom 50)))
head(dat1)
#  ID  value SEGMENT
#1  3 -0.7859574  Top 5%
#2  3  1.0117428  Top 5%
#3  8 -2.1558035  Top 5%
#4  6  1.7803880  Top 5%
#5  7  0.4192816  Top 5%
#6 10 -1.0142512  Top 5%
 tail(dat1)
#    ID  value   SEGMENT
#495  1  0.3571848 Bottom 50
#496  9 -1.1971854 Bottom 50
#497  5  0.3544896 Bottom 50
#498  8 -0.1562356 Bottom 50
#499  8 -0.2994321 Bottom 50
#500  8 -0.4170319 Bottom 50


A.K.



- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 31, 2013 5:37 AM
Subject: [R] Add a column to a data frame with value based on the percentile of 
the row

Hi all,

I think this should be an easy question for the guru's out here.

I have this large data frame (2.500.000 rows, 15 columns) and I want to add
a column named SEGMENT to it.
The first 5% rows (first 125.000 rows) should have the value Top 5% in the
SEGMENT column
Then the rows from 5% to 20% should have the value 5 to 20
Then 20-50% should have the value 20 to 50
And the last 50% of the rows should have the value Bottom 50

What is the easiest way of doing this? I was thinking of using quantile but
then I should have some rownumber column.

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] heatmap scale parameter question

2013-07-31 Thread Witold E Wolski
Would anyone of the more experienced r-users explain to me the
behaviour of the scale parameter in the heatmap function.

different options for scale (R 3.0.1) do change only the colors but do
not affect the dendrograms. Please see for yourself executing the
following code:

d - matrix(rnorm(100),nrow=20)
stats::heatmap(d)
X11()
heatmap(d,scale=column)
X11()
heatmap(d,scale=row)
X11()
heatmap(d,scale=none)

In all four above cases the dendrograms look exactly the same
However, scaling clearly affects clustering. see:

d - scale(d)
heatmap(d,scale=none)


best regards

R version 3.0.1 (2013-05-16) -- Good Sport
ciao

--
Witold Eryk Wolski


-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Please take me out of the mailing list

2013-07-31 Thread Mirjam Appel


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Please take me out of the mailing list

2013-07-31 Thread S Ellison
 Subject: [R] Please take me out of the mailing list
Please follow the instructions on the mailing list page. The link is given at 
the bottom of every mail from the list.



***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread arun
Hi,

 set.seed(24)
dat1- data.frame(ID=1:500,value=rnorm(500))
 dat1 - dat1[order(-dat1$value),]
row.names(dat1)-1:nrow(dat1) 


indx-round(quantile(as.numeric(row.names(dat1)),probs=c(0.05,0.20,0.50,1)))
indx1-findInterval(row.names(dat1),indx,rightmost.closed=TRUE)
dat1$SEGMENT- as.character(factor(indx1,labels=c(Top 5%,5 to 20,20 to 
50, Bottom 50)))

A.K.

Hi Arun Kirshna, 

I have tested your method and it will work for me. 
I only run into one problem. Before I want to do this operation I have sorted 
my data frame so my rownumbers ar not subsequent. 

You can see if you first order your example data frame like: 
dat1 - dat1[order(-dat1$value),] 

head(dat1) 
     ID    value   SEGMENT 
237 237 3.538552  20 to 50 
21   21 3.376149    Top 5% 
421 421 3.015634 Bottom 50 
339 339 2.855991 Bottom 50 
119 119 2.589574  20 to 50 
12   12 2.512276    Top 5% 

Do you have a solution for this? 



- Original Message -
From: arun smartpink...@yahoo.com
To: Dark i...@software-solutions.nl
Cc: R help r-help@r-project.org
Sent: Wednesday, July 31, 2013 7:48 AM
Subject: Re: [R] Add a column to a data frame with value based on the 
percentile of the row

Hi,
May be this helps:
set.seed(24)
dat1- data.frame(ID=1:500,value=rnorm(500))
indx-round(quantile(as.numeric(row.names(dat1)),probs=c(0.05,0.20,0.50,1)))
indx1-findInterval(row.names(dat1),indx,rightmost.closed=TRUE)
dat1$SEGMENT- as.character(factor(indx1,labels=c(Top 5%,5 to 20,20 to 
50, Bottom 50)))
head(dat1)
#  ID  value SEGMENT
#1  3 -0.7859574  Top 5%
#2  3  1.0117428  Top 5%
#3  8 -2.1558035  Top 5%
#4  6  1.7803880  Top 5%
#5  7  0.4192816  Top 5%
#6 10 -1.0142512  Top 5%
 tail(dat1)
#    ID  value   SEGMENT
#495  1  0.3571848 Bottom 50
#496  9 -1.1971854 Bottom 50
#497  5  0.3544896 Bottom 50
#498  8 -0.1562356 Bottom 50
#499  8 -0.2994321 Bottom 50
#500  8 -0.4170319 Bottom 50


A.K.



- Original Message -
From: Dark i...@software-solutions.nl
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 31, 2013 5:37 AM
Subject: [R] Add a column to a data frame with value based on the percentile of 
the row

Hi all,

I think this should be an easy question for the guru's out here.

I have this large data frame (2.500.000 rows, 15 columns) and I want to add
a column named SEGMENT to it.
The first 5% rows (first 125.000 rows) should have the value Top 5% in the
SEGMENT column
Then the rows from 5% to 20% should have the value 5 to 20
Then 20-50% should have the value 20 to 50
And the last 50% of the rows should have the value Bottom 50

What is the easiest way of doing this? I was thinking of using quantile but
then I should have some rownumber column.

Regards Derk



--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merge matrix row data

2013-07-31 Thread Elaine Kuo
Dear list,



I have a matrix showing the species presence-absence on a map.

Its rows are map locations, represented by GridCellID, such as GID1 and GID
5.

Its columns are species ID, such as D0989, D9820, and D5629.

The matrix is as followed.



Now I want to merge the GridCellID according to the map location of each
island.

For instance, Island A consist of GID 1 and 5. Island B consist of GID 2,
4, and 7.

In GID 1 and 5, species D0989 are both 1.

Then I want to merge GID 1 and 5 into Island A, with species D0989 as 1.

The original matrix and the resulting matrix are listed below.

Please kindly advise how to code the calculation in R.

Please do not hesitate to ask if anything is unclear.

Thank you in advance.



Elaine



Original matrix

D0989   D9820  D5629  D4327  D2134

GID 1100   1  0

GID 2011   0  0

GID 4001   0  0

GID 5110   0  0

GID 7010   0  1



Resulting matrix

D0989   D9820  D5629  D4327  D2134

Island A   11   0   1   0

Island B   01   1   0   1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge matrix row data

2013-07-31 Thread arun
HI,

Please use ?dput()
mat1- as.matrix(read.table(text=
D0989  D9820  D5629  D4327  D2134
GID_1    1    0    0  1  0
GID_2    0    1    1  0  0
GID_4    0    0    1  0  0
GID_5    1    1    0  0  0
GID_7    0    1    0  0  1
,sep=,header=TRUE))
row.names(mat1)- gsub([_], ,row.names(mat1))
IslandA-c(GID 1, GID 5)
IslandB- c(GID 2, GID 4, GID 7)
 res-  t(sapply(c(IslandA,IslandB),function(x) 
{x1-mat1[match(get(x),row.names(mat1)),];(!!colSums(x1))*1} ))

 res
#    D0989 D9820 D5629 D4327 D2134
#IslandA 1 1 0 1 0
#IslandB 0 1 1 0 1
A.K.




- Original Message -
From: Elaine Kuo elaine.kuo...@gmail.com
To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch
Cc: 
Sent: Wednesday, July 31, 2013 9:03 AM
Subject: [R] merge matrix row data

Dear list,



I have a matrix showing the species presence-absence on a map.

Its rows are map locations, represented by GridCellID, such as GID1 and GID
5.

Its columns are species ID, such as D0989, D9820, and D5629.

The matrix is as followed.



Now I want to merge the GridCellID according to the map location of each
island.

For instance, Island A consist of GID 1 and 5. Island B consist of GID 2,
4, and 7.

In GID 1 and 5, species D0989 are both 1.

Then I want to merge GID 1 and 5 into Island A, with species D0989 as 1.

The original matrix and the resulting matrix are listed below.

Please kindly advise how to code the calculation in R.

Please do not hesitate to ask if anything is unclear.

Thank you in advance.



Elaine



Original matrix

        D0989   D9820  D5629  D4327  D2134

GID 1    1        0        0       1      0

GID 2    0        1        1       0      0

GID 4    0        0        1       0      0

GID 5    1        1        0       0      0

GID 7    0        1        0       0      1



Resulting matrix

                D0989   D9820  D5629  D4327  D2134

Island A   1        1       0       1       0

Island B   0        1       1       0       1

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Highlight selected bar in barplot

2013-07-31 Thread Jurgens de Bruin
Hi All,

I am new at R so any help would be appreciate.

Below my current R-code/script:

initial.dir-getwd()
setwd('/Users/jurgens/VirtualEnv/venv/Projects/QTLS/Resaved_Results')
dataset - read.table(LWxANNA_FinalReport_resaved_spwc.csv, header=TRUE,
sep=\t )
n - length(dataset$X..No.Call)
x - sort(dataset$X..No.Call,partial = n )[n]

outlier - dataset[ dataset$X..No.Call  quantile(dataset$X..No.Call,0.25)
+ (IQR(dataset$X..No.Call) *1.5),]

par( las=2,  cex.axis=0.5, cex.lab=1, cex.main=2, cex.sub=1)
barplot(dataset$X..No.Call, names.arg = dataset$Individual.Sample,
cex.names=0.5 ,space=0.5, ylim=c(0,x*1.5) )
setwd(initial.dir)

I would like to highlight the sample in outlier on the barplot that is
create, would this be possible?


Thanks
-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R number format with Hmisc and knitr

2013-07-31 Thread Simon Zehnder
Dear R-Users and R-Devels,

I have a problem when using knitr in combination with Hmisc. I generate a 
data.frame which has mixed scientific and non-scientific numbers inside. In my 
Latex Table I just want to have non-scientific format, so I call

latex(myDataFrame,
file = '',
cdec = c(0, rep(4, NROW(myDataFrame) - 1)),
)

Usually this works, but in this case it doesn't. I do not know why but suspect 
the mixed data format to be the culprit. What could I do?

Using format(, scientific = FALSE) before or options(scipen = 4) before has no 
influence. 


Best

Simon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R number format with Hmisc and knitr

2013-07-31 Thread Simon Zehnder
Errata:

it must say:

latex(myDataFrame,
file = '',
cdec = c(0, rep(4, NCOL(myDataFrame) - 1))
)

But this does not work. Scientific notation is very robust :)

Apologize

Simon

On Jul 31, 2013, at 5:05 PM, Simon Zehnder szehn...@uni-bonn.de wrote:

 Dear R-Users and R-Devels,
 
 I have a problem when using knitr in combination with Hmisc. I generate a 
 data.frame which has mixed scientific and non-scientific numbers inside. In 
 my Latex Table I just want to have non-scientific format, so I call
 
 latex(myDataFrame,
 file = '',
 cdec = c(0, rep(4, NROW(myDataFrame) - 1)),
 )
 
 Usually this works, but in this case it doesn't. I do not know why but 
 suspect the mixed data format to be the culprit. What could I do?
 
 Using format(, scientific = FALSE) before or options(scipen = 4) before has 
 no influence. 
 
 
 Best
 
 Simon
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation Loops in time series

2013-07-31 Thread arun


Hi,
May be this helps:

set.seed(25)
mt1- matrix(sample(c(NA,1:40),20*200,replace=TRUE),ncol=200)

set.seed(487)
mt2- matrix(sample(c(NA,1:80),20*200,replace=TRUE),ncol=200)
res- sapply(seq_len(ncol(mt1)),function(i) 
cor(mt1[,i],mt2[,i],use=complete.obs,method=pearson))


A.K.




Hello, I've got the following problem. 
I have to matrices each containing 200 time series. 
Now I want to calculate the correlation of the first time series of each of the 
matrices. 
I use the following command: 
cor(mts1[,1],mts2[,1], use=complete.obs, method=c(pearson)) 
cor(mts1[,2],mts2[,2], use=complete.obs, method=c(pearson)) 
cor(mts1[,3],mts2[,3], use=complete.obs, method=c(pearson)) 
and so on.. 
I would like to repeat this for each of the 200 time series. As it 
is quite painful to change the command 200 times I wanted to ask if 
there's a loop function that can cover these series in a fast way? 
Thanks in advance for your help 
Best 
Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] heatmap scale parameter question

2013-07-31 Thread David Carlson
In your example all of the values are drawn from the same
distribution so there will not be substantial differences (row
means/variances and column means/variances will be
approximately the same). 

set.seed(42)
d - matrix(rnorm(100),nrow=20)
# Start with your example and modify the row/col means
rows - sample.int(15:25, 20, replace=TRUE)
cols - sample.int(5:15, 5, replace=TRUE)
d2 - sweep(d, 2, cols, +)
d2 - sweep(d2, 1, rows, +)
heatmap(d2, scale=none)
heatmap(d2, scale=row)
heatmap(d2, scale=col)

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352




-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Witold E
Wolski
Sent: Wednesday, July 31, 2013 7:04 AM
To: r-help@r-project.org
Subject: [R] heatmap scale parameter question

Would anyone of the more experienced r-users explain to me the
behaviour of the scale parameter in the heatmap function.

different options for scale (R 3.0.1) do change only the
colors but do
not affect the dendrograms. Please see for yourself executing
the
following code:

d - matrix(rnorm(100),nrow=20)
stats::heatmap(d)
X11()
heatmap(d,scale=column)
X11()
heatmap(d,scale=row)
X11()
heatmap(d,scale=none)

In all four above cases the dendrograms look exactly the same
However, scaling clearly affects clustering. see:

d - scale(d)
heatmap(d,scale=none)


best regards

R version 3.0.1 (2013-05-16) -- Good Sport
ciao

--
Witold Eryk Wolski


-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread Rui Barradas

Hello,

Sorry, that should be 0.80, not 0.70.

qq - quantile(x, probs = c(0, 0.50, 0.80, 0.95, 1))

Rui Barradas


Em 31-07-2013 12:22, Rui Barradas escreveu:

Hello,

Combine quantile() with findInterval(). Something like the following.


# sample data
x - rnorm(100)

val - c(Bottom 50, 20 to 50, 5 to 20, Top 5%)
qq - quantile(x, probs = c(0, 0.50, 0.70, 0.95, 1))

idx - findInterval(x, qq)
val[idx]


Hope this helps,

Rui Barradas

Em 31-07-2013 10:37, Dark escreveu:

Hi all,

I think this should be an easy question for the guru's out here.

I have this large data frame (2.500.000 rows, 15 columns) and I want
to add
a column named SEGMENT to it.
The first 5% rows (first 125.000 rows) should have the value Top 5%
in the
SEGMENT column
Then the rows from 5% to 20% should have the value 5 to 20
Then 20-50% should have the value 20 to 50
And the last 50% of the rows should have the value Bottom 50

What is the easiest way of doing this? I was thinking of using
quantile but
then I should have some rownumber column.

Regards Derk



--
View this message in context:
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Greek symbols in study labels and custom summary lines in forest plot (meta)

2013-07-31 Thread David Winsemius

On Jul 29, 2013, at 11:52 AM, Rapsomaniki, Eleni wrote:

 Dear R helpers,
 
 Is there a way to display mathematical notations (e.g. greek characters, 
 subscripts) properly in study (studlab) and group (byvar) labels in a forest 
 plot created using the meta package?
 
 #Example:
 library(meta)
 logHR - log(runif(10,0.5,2))
 selogHR - log(runif(10,0.05,0.2))
 study=c(0.1,.2,.3,.4,.5,0.1,.2,.3,.4,.5)
 group=c(rep('alpha',5),rep('beta',5))
 meta1=metagen(logHR, selogHR, 
 sm=HR,studlab=paste(Fixed,expression(beta[w]),study),byvar=group)
 forest(meta1, print.byvar=F)

I tried a variety of plotmath and substitute strategies but the arguments to 
studlab get first processed with 'as.character' and then put into a data.frame 
before printing. Dataframes do not accept language objects, so R expressions 
could not be processed.

 dftest - data.frame(a =expression(a,b,c))
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class expression to a data.frame


Best I could to was '

   ... ,studlab=paste('Fixed ß[w]=',study), 

 
 Question 2
 Is there a way to add a line to this plot at my preferred location? For 
 example, I want to add a within-group combined estimate line (the default 
 here is just an overall group line by random or fixed effects). 
 I know I need to use grid.lines, e.g.
 
 grid.lines(x = 3, y = c(0.5,1),gp = gpar(col = 5))
 
 But for the life of me I can't work out the co-ordinate system in grid 
 graphics!

Unfortunately all of the printing to the device is handled inside the 'forest' 
function and no list representation is returned as a value to be augmented and 
later printed. So the input data would need to be entered in a manner that gets 
processed as text or you would need to modify the code. I don't have the 
knowledge of the meta package that can get there.

-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Highlight selected bar in barplot

2013-07-31 Thread John Kane
It's a bit difficult to know what you are doing without any data.  Would you 
supply some data please.

See ?dput for the easiest way to supply it.  Also have a look at 
https://github.com/hadley/devtools/wiki/Reproducibility and/or 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 for some suggetions on asking questions and code formatting.

John Kane
Kingston ON Canada


 -Original Message-
 From: debrui...@gmail.com
 Sent: Wed, 31 Jul 2013 16:57:55 +0200
 To: r-help@r-project.org
 Subject: [R] Highlight selected bar in barplot
 
 Hi All,
 
 I am new at R so any help would be appreciate.
 
 Below my current R-code/script:
 
 initial.dir-getwd()
 setwd('/Users/jurgens/VirtualEnv/venv/Projects/QTLS/Resaved_Results')
 dataset - read.table(LWxANNA_FinalReport_resaved_spwc.csv,
 header=TRUE,
 sep=\t )
 n - length(dataset$X..No.Call)
 x - sort(dataset$X..No.Call,partial = n )[n]
 
 outlier - dataset[ dataset$X..No.Call 
 quantile(dataset$X..No.Call,0.25)
 + (IQR(dataset$X..No.Call) *1.5),]
 
 par( las=2,  cex.axis=0.5, cex.lab=1, cex.main=2, cex.sub=1)
 barplot(dataset$X..No.Call, names.arg = dataset$Individual.Sample,
 cex.names=0.5 ,space=0.5, ylim=c(0,x*1.5) )
 setwd(initial.dir)
 
 I would like to highlight the sample in outlier on the barplot that is
 create, would this be possible?
 
 
 Thanks
 --
 Regards/Groete/Mit freundlichen GrCC?en/recuerdos/meilleures
 salutations/
 distinti saluti/siong/duC, yC:/P?Q?P8P2P5Q?
 
 Jurgens de Bruin
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks  orcas on your 
desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xmlToDataFrame very slow

2013-07-31 Thread Duncan Temple Lang
Hi Stavros

 xmlToDataFrame() is very generic and so doesn't know anything
about the particulars of the XML it is processing. If you know
something about the structure of the XML, you should be able to leverage that
for performance.

xmlToDataFrame is also not optimized as it is just a convenience routine for 
people who want to work with
XML without much effort.

If you send me the file and the code you are using to read the file, I'll take a
look at it.

 D.

On 7/30/13 11:10 AM, Stavros Macrakis wrote:
 I have a modest-size XML file (52MB) in a format suited to xmlToDataFrame 
 (package XML).
 
 I have successfully read it into R by splitting the file 10 ways then running 
 xmlToDataFrame on each part, then
 rbind.fill (package plyr) on the result. This takes about 530 s total, and 
 results in a data.frame with 71k rows and
 object.size of 21MB.
 
 But trying to run xmlToDataFrame on the whole file takes forever ( 1 s 
 so far). xmlParse of this file takes only 0.8 s.
 
 I tried running xmlToDataFrame on the first 10% of the file, then the first 
 10% repeated twice, then three times (with
 the outer tags adjusted of course). Timings:
 
 1 copy: 111 s = 111 per copy
 2 copy: 311 s = 155
 3 copy: 626 s = 209
 
 The runtime is superlinear.  What is going on here? Is there a better 
 approach?
 
 Thanks,
 
   -s


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does a general latex table-making function exist?

2013-07-31 Thread Frank Harrell
Our Hmisc package summary.formula function and its latex methods can 
make some fairly advanced tables.  But the tables have to be regular. 
For example, all rows of the tables are based on the same data frame. 
I'm thinking that what is needed is a ggplot2-like set of functions for 
building a table row-by-row or row-by-block of rows.  Different row 
blocks could have different denominators, e.g., the first part of the 
table might be on everyone and a latter block of rows be for females, 
with different summary statistics computed.


Has anyone already written functions creating LaTeX markup with such 
functionality?


Thanks
Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does a general latex table-making function exist?

2013-07-31 Thread Duncan Murdoch

On 13-07-31 4:03 PM, Frank Harrell wrote:

Our Hmisc package summary.formula function and its latex methods can
make some fairly advanced tables.  But the tables have to be regular.
For example, all rows of the tables are based on the same data frame.
I'm thinking that what is needed is a ggplot2-like set of functions for
building a table row-by-row or row-by-block of rows.  Different row
blocks could have different denominators, e.g., the first part of the
table might be on everyone and a latter block of rows be for females,
with different summary statistics computed.

Has anyone already written functions creating LaTeX markup with such
functionality?


My tables package does some of what you are asking for; I'm not sure if 
it does everything.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] qgraph: how to create legend (scale) for edge thickness?

2013-07-31 Thread María Antonieta Sánchez Farrán
Hello R community,

I am creating some network representations using the qgraph package (big
thanks to Sacha Epskamp for developing it!). The package is very well
documented, but I am unable to find how to create a legend (scale) for edge
thickness.  In one of his qgraph examples, Sacha shows such type of scale
(fifth graph in http://sachaepskamp.com/qgraph/examples - scale for edge
thickness relative to p-values). I have searched the documentation and it
seems that a legend relates to the definition of node groups, so I am
uncertain on which option/command I need to use for achieving what I need.
I would also like to be able to select the values for which the scale is
created too.

If it is unclear, what I am looking for is to display this next to the
network graph:

probability edge thickness
1.0   display line with thickness for 1.0
0.8   display line with thickness for 0.8
0.6   display line with thickness for 0.6
0.4   display line with thickness for 0.4

My line of code for generating the network is the following:
qgraph(Edges2,esize=7,nsize=12,gray=TRUE,layout=circular,filetype=pdf,width=5,height=5,vsize=11,label.prop=1.2,arrows=FALSE,border.color=c(red,red,blue,green,purple),border.width=4,maximum=1.4,cut=0.0001)

I would appreciate if somebody can help me out.

Thanks,
Maria Antonieta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] geocoding using the Google API with a key

2013-07-31 Thread Vergari, Fabiano
Hello,
I am trying to geocode an address using the Google API and R. So far, I have 
used the following code:

location-c('120 Avenue de la Republique, 92120 Montrouge, France')
location - gsub(' ', '+', location)
sensor-c('FALSE')
sensor4url - paste('sensor=', tolower(as.character(sensor)), sep = '')
posturl - paste(location, sensor4url, sep = '')
url_string - 
paste('http://maps.googleapis.com/maps/api/geocode/json?address=', posturl, sep 
= )
url_string - URLencode(url_string)
gc - fromJSON(paste(readLines(url(url_string)), collapse = ''))
gc

The above code has worked just fine for up to 2500 Google queries per day 
(which are free). My question: how can I modify the above code to insert a 
Google client ID and/or crypto key so as to run 2500 queries? Adding a 
'client=...' and/or 'key=...' in the posturl line above does not seem to do the 
trick.

Thanks in advance,
fv
CONFIDENTIALITY NOTICE This e-mail message and any attac...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] resampling

2013-07-31 Thread Rita Gamito
Could anyone tell me how,from a pool of 1002 observations (one variable),  can 
I resample 1000 samples of 20 observations?
And then calculate the mean and standard deviation between 2, 3, 4, ..., 1000 
samples and plot them?
Thank you!

_

Rita Gamito
Centro de Oceanografia
Faculdade de Ciências, Universidade de Lisboa
Campo Grande, 1749-016 Lisboa, Portugal
e-mail: rgam...@fc.ul.pt
Tel: + 351 21 750 00 00 - ext. 22575
Fax: + 351 21 750 02 07
www.co.fc.ul.pt
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R help

2013-07-31 Thread Mª Teresa Martinez Soriano
Hi

First of all, thanks for this service, it is being very useful for me. I am new 
in R so I have a lot of doubts.

I have to do imputation in a data set, this is a sample of my data set which 
looks like:


 NUMERO  Data1  Data2 IE.2003 IE.2004 IE.2005 IE.2006 IE.2007 IE.2008 
IE.2009 IE.2010
20133 30/09/2002 18/06/2013 153 279 289 370 412 262 
115  75
21138 11/07/2002 13/05/2009546078638365   12009   16763  NA 
 NA  NA
22146 16/10/2009 18/06/2013  NA  NA  NA  NA  NA  NA 
 NA  35
23152 27/05/1999 18/06/2013  NA  80  77  60  89 137 
144 146
24154 21/12/2004 18/06/2013  NA  NA 148 186 302 233 
194 204
25166  8/02/2008 18/06/2013  NA  NA  NA  NA  NA  NA 
 98 160
26177 20/02/1996 18/06/2013  16   4  NA   3   3  NA 
  5   5
 


The problem is that I have cells which have to be empty, this depends on Data1 
and Data2

For instance in the third row, you can see that Data1 is equal to 16/10/2009, 
so I don't have to 

have any information until year 2009, therefore 
IE.2003,IE.2004,IE.2005,IE.2006, IE.2007, IE.2008 

have 


to be totally empty, but this doesn't mean that they are  missing values, in 
fact they are not. I 

don't  want to get any imputation in this cells.

 Ie.2009 and IE.2010 have to be full and they are not, so this cells are 
missing values and I want to get imputed values for them. (I would delete this 
row, because it is impossible to get any imformation about it, but it is ok for 
this example)

On the other hand, in the last row NA is a real missing value.


 
How can I specify that this cells are empty and don't get this imputed values?? 

I have tried to put NaN but I have problems in some functions that I need to do 
it before the 

imputation.


Thanks a  lot

Best regars, Teresa   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split in blocks

2013-07-31 Thread Dominic Roye
Hello,

I am a little bit lost on my search for a solution and idea. I would like
to split my time serie in blocks of night. V1 indicates if its night or
not.

How can i split this kind of cases?


Best regards,


str(ou[,c(1,3,8)])
'data.frame': 863 obs. of  3 variables:
 $ Fecha: POSIXct, format: 2013-07-04 00:10:00 ...
 $ Ta   : num  22.6 22.2 22.2 22.2 22.2 ...
 $ V1   : num  1 1 1 1 1 1 1 1 1 1 ...



 dput(ou[,c(1,3,8)])
structure(list(Fecha = structure(c(1372889400, 137289, 1372890600,
1372891200, 1372891800, 1372892400, 1372893000, 1372893600, 1372894200,
1372894800, 1372895400, 1372896000, 1372896600, 1372897200, 1372897800,
1372898400, 1372899000, 1372899600, 1372900200, 1372900800, 1372901400,
1372902000, 1372902600, 1372903200, 1372903800, 1372904400, 1372905000,
1372905600, 1372906200, 1372906800, 1372907400, 1372908000, 1372908600,
1372909200, 1372909800, 1372910400, 1372911000, 1372911600, 1372912200,
1372912800, 1372913400, 1372914000, 1372914600, 1372915200, 1372915800,
1372916400, 1372917000, 1372917600, 1372918200, 1372918800, 1372919400,
137292, 1372920600, 1372921200, 1372921800, 1372922400, 1372923000,
1372923600, 1372924200, 1372924800, 1372925400, 1372926000, 1372926600,
1372927200, 1372927800, 1372928400, 1372929000, 1372929600, 1372930200,
1372930800, 1372931400, 1372932000, 1372932600, 1372933200, 1372933800,
1372934400, 1372935000, 1372935600, 1372936200, 1372936800, 1372937400,
1372938000, 1372938600, 1372939200, 1372939800, 1372940400, 1372941000,
1372941600, 1372942200, 1372942800, 1372943400, 1372944000, 1372944600,
1372945200, 1372945800, 1372946400, 1372947000, 1372947600, 1372948200,
1372948800, 1372949400, 137295, 1372950600, 1372951200, 1372951800,
1372952400, 1372953000, 1372953600, 1372954200, 1372954800, 1372955400,
1372956000, 1372956600, 1372957200, 1372957800, 1372958400, 1372959000,
1372959600, 1372960200, 1372960800, 1372961400, 1372962000, 1372962600,
1372963200, 1372963800, 1372964400, 1372965000, 1372965600, 1372966200,
1372966800, 1372967400, 1372968000, 1372968600, 1372969200, 1372969800,
1372970400, 1372971000, 1372971600, 1372972200, 1372972800, 1372973400,
1372974000, 1372974600, 1372975200, 1372975800, 1372976400, 1372977000,
1372977600, 1372978200, 1372978800, 1372979400, 137298, 1372980600,
1372981200, 1372981800, 1372982400, 1372983000, 1372983600, 1372984200,
1372984800, 1372985400, 1372986000, 1372986600, 1372987200, 1372987800,
1372988400, 1372989000, 1372989600, 1372990200, 1372990800, 1372991400,
1372992000, 1372992600, 1372993200, 1372993800, 1372994400, 1372995000,
1372995600, 1372996200, 1372996800, 1372997400, 1372998000, 1372998600,
1372999200, 1372999800, 1373000400, 1373001000, 1373001600, 1373002200,
1373002800, 1373003400, 1373004000, 1373004600, 1373005200, 1373005800,
1373006400, 1373007000, 1373007600, 1373008200, 1373008800, 1373009400,
137301, 1373010600, 1373011200, 1373011800, 1373012400, 1373013000,
1373013600, 1373014200, 1373014800, 1373015400, 1373016000, 1373016600,
1373017200, 1373017800, 1373018400, 1373019000, 1373019600, 1373020200,
1373020800, 1373021400, 1373022000, 1373022600, 1373023200, 1373023800,
1373024400, 1373025000, 1373025600, 1373026200, 1373026800, 1373027400,
1373028000, 1373028600, 1373029200, 1373029800, 1373030400, 1373031000,
1373031600, 1373032200, 1373032800, 1373033400, 1373034000, 1373034600,
1373035200, 1373035800, 1373036400, 1373037000, 1373037600, 1373038200,
1373038800, 1373039400, 137304, 1373040600, 1373041200, 1373041800,
1373042400, 1373043000, 1373043600, 1373044200, 1373044800, 1373045400,
1373046000, 1373046600, 1373047200, 1373047800, 1373048400, 1373049000,
1373049600, 1373050200, 1373050800, 1373051400, 1373052000, 1373052600,
1373053200, 1373053800, 1373054400, 1373055000, 1373055600, 1373056200,
1373056800, 1373057400, 1373058000, 1373058600, 1373059200, 1373059800,
1373060400, 1373061000, 1373061600, 1373062200, 1373062800, 1373063400,
1373064000, 1373064600, 1373065200, 1373065800, 1373066400, 1373067000,
1373067600, 1373068200, 1373068800, 1373069400, 137307, 1373070600,
1373071200, 1373071800, 1373072400, 1373073000, 1373073600, 1373074200,
1373074800, 1373075400, 1373076000, 1373076600, 1373077200, 1373077800,
1373078400, 1373079000, 1373079600, 1373080200, 1373080800, 1373081400,
1373082000, 1373082600, 1373083200, 1373083800, 1373084400, 1373085000,
1373085600, 1373086200, 1373086800, 1373087400, 1373088000, 1373088600,
1373089200, 1373089800, 1373090400, 1373091000, 1373091600, 1373092200,
1373092800, 1373093400, 1373094000, 1373094600, 1373095200, 1373095800,
1373096400, 1373097000, 1373097600, 1373098200, 1373098800, 1373099400,
137310, 1373100600, 1373101200, 1373101800, 1373102400, 1373103000,
1373103600, 1373104200, 1373104800, 1373105400, 1373106000, 1373106600,
1373107200, 1373107800, 1373108400, 1373109000, 1373109600, 1373110200,
1373110800, 1373111400, 1373112000, 1373112600, 1373113200, 1373113800,
1373114400, 1373115000, 

Re: [R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread Dark
Works like a charm, thanks a lot!



--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711p4672728.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem about mean function in ffbase package

2013-07-31 Thread Chaos Chen
Hi all,

I experienced some unmatched result using mean function in ffbase package 
and cannot figure out what's wrong.

I have a simulated ff vector with 10 numbers inside and want to
calculate its mean. But the results are quite different.

With mean( ) function in ffbase package, the mean is 152.6858.
But with R's mean( ) or adding sum from chunks directly, I got 667.5595

any idea ? Thank you in advance!

Bayes Chen

# F1 is an ffdf , F1$X1 is an ff vector
 length(F1$X1)
[1] 10

# Use mean() function in ffbase package
 mean(F1$X1)
[1] 152.6858

 X2 = F1$X1[]    #  X2 is now an non-ff  vector
 length(X2)
[1] 10
 mean(X2)          # R's original mean function for ordinary vectors
[1] 667.5595

# calculate sum and then mean by chunks
 chunks = chunk(F1$X1, by=500)
 sumx = 0
 for (i in chunks) {
+     sumx = sumx + sum(F1$X1[i])
+ }
 sumx/length(F1$X1)
[1] 667.5595

--- below are some other trials
 X2 = F1$X1[1:100]
 mean(X2)
[1] 59.43149
 mean(as.ff(X2))
[1] 59.43149

 X2 = F1$X1[1:1]
 mean(X2)
[1] 59.41978
 mean(as.ff(X2))
[1] 59.42128

 X2 = F1$X1[1:5]
 mean(X2)
[1] 60.53615
 mean(as.ff(X2))
[1] 57.72168

 X2 = F1$X1[1:75000]
 mean(X2)
[1] 59.37562
 mean(as.ff(X2))
[1] 57.81179

 X2 = F1$X1[1:9]
 mean(X2)
[1] 57.0867
 mean(as.ff(X2))
[1] 57.44862

 X3 = F1$X1[9:10]
 mean(X3)
[1] 6161.814
 mean(as.ff(X3))
[1] 6161.797
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correlation Loops in time series

2013-07-31 Thread TMiller
Hello, I've got the following problem.
I have to matrices each containing 200 time series.
Now I want to calculate the correlation of the first time series of each of
the matrices.
I use the following command:
cor(mts1[,1],mts2[,1], use=complete.obs, method=c(pearson))
cor(mts1[,2],mts2[,2], use=complete.obs, method=c(pearson))
cor(mts1[,3],mts2[,3], use=complete.obs, method=c(pearson))
and so on..
I would like to repeat this for each of the 200 time series. As it is quite
painful to change the command 200 times I wanted to ask if there's a loop
function that can cover these series in a fast way?
Thanks in advance for your help
Best
Tom



--
View this message in context: 
http://r.789695.n4.nabble.com/Correlation-Loops-in-time-series-tp4672732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add a column to a data frame with value based on the percentile of the row

2013-07-31 Thread Dark
Hi Arun Kirshna,

I have tested your method and it will work for me.
I only run into one problem. Before I want to do this operation I have
sorted my data frame so my rownumbers ar not subsequent.

You can see if you first order your example data frame like:
dat1 - dat1[order(-dat1$value),]

head(dat1)
 IDvalue   SEGMENT
237 237 3.538552  20 to 50
21   21 3.376149Top 5%
421 421 3.015634 Bottom 50
339 339 2.855991 Bottom 50
119 119 2.589574  20 to 50
12   12 2.512276Top 5%

Do you have a solution for this?





--
View this message in context: 
http://r.789695.n4.nabble.com/Add-a-column-to-a-data-frame-with-value-based-on-the-percentile-of-the-row-tp4672711p4672726.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] resampling

2013-07-31 Thread andrija djurovic
Hi.
See ?sample, ?replicate,?colMeans, ?plot..

Here is the simple example:

sample(1:1000,20)
replicate(5, sample(1:1000,20))
colMeans(replicate(5, sample(1:1000,20)))

Andrija


On Wed, Jul 31, 2013 at 1:23 PM, Rita Gamito rslo...@fc.ul.pt wrote:

 Could anyone tell me how,from a pool of 1002 observations (one variable),
  can I resample 1000 samples of 20 observations?
 And then calculate the mean and standard deviation between 2, 3, 4, ...,
 1000 samples and plot them?
 Thank you!

 _

 Rita Gamito
 Centro de Oceanografia
 Faculdade de Ciências, Universidade de Lisboa
 Campo Grande, 1749-016 Lisboa, Portugal
 e-mail: rgam...@fc.ul.pt
 Tel: + 351 21 750 00 00 - ext. 22575
 Fax: + 351 21 750 02 07
 www.co.fc.ul.pt
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Canadian common CV: how to cite R packages?

2013-07-31 Thread Michael Friendly
A Q for Canadians who have filled out the new Canadian common CV for 
grant applications:  is there
any way to cite research contributions of software such as R packages, 
aside from published journal

articles? If so, where/how in the online application can they be entered?

For example, under Publications, they list Reports and Manuals,but the 
required fields there
seem to apply only to things like printed technical reports and printed 
manuals.


If the answer is: these cannot be listed, OK, but the online app is 
extremely Byzantine and maybe

there was something I missed.

TIA
-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.  Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation Loops in time series

2013-07-31 Thread David Carlson
sapply(1:200, function(x) cor(mts1[,x], mts2[,x],
use=complete.obs, method=c(pearson)))

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of TMiller
Sent: Wednesday, July 31, 2013 8:16 AM
To: r-help@r-project.org
Subject: [R] Correlation Loops in time series

Hello, I've got the following problem.
I have to matrices each containing 200 time series.
Now I want to calculate the correlation of the first time
series of each of
the matrices.
I use the following command:
cor(mts1[,1],mts2[,1], use=complete.obs,
method=c(pearson))
cor(mts1[,2],mts2[,2], use=complete.obs,
method=c(pearson))
cor(mts1[,3],mts2[,3], use=complete.obs,
method=c(pearson))
and so on..
I would like to repeat this for each of the 200 time series.
As it is quite
painful to change the command 200 times I wanted to ask if
there's a loop
function that can cover these series in a fast way?
Thanks in advance for your help
Best
Tom



--
View this message in context:
http://r.789695.n4.nabble.com/Correlation-Loops-in-time-series
-tp4672732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge matrix row data

2013-07-31 Thread Elaine Kuo
Dear Arun

Thank you for the very useful help.
However, please kindly explain the code below.
row.names(mat1)- gsub([_], ,row.names(mat1))

1. what does [_] mean?
2. what doesmean?
3. what does row.names(mat1) mean?

I checked ?gsub but still did not get the idea.

Thank you again

Elaine


On Wed, Jul 31, 2013 at 9:35 PM, arun smartpink...@yahoo.com wrote:

 HI,

 Please use ?dput()
 mat1- as.matrix(read.table(text=
 D0989  D9820  D5629  D4327  D2134
 GID_1100  1  0
 GID_2011  0  0
 GID_4001  0  0
 GID_5110  0  0
 GID_7010  0  1
 ,sep=,header=TRUE))
 row.names(mat1)- gsub([_], ,row.names(mat1))
 IslandA-c(GID 1, GID 5)
 IslandB- c(GID 2, GID 4, GID 7)
  res-  t(sapply(c(IslandA,IslandB),function(x)
 {x1-mat1[match(get(x),row.names(mat1)),];(!!colSums(x1))*1} ))

  res
 #D0989 D9820 D5629 D4327 D2134
 #IslandA 1 1 0 1 0
 #IslandB 0 1 1 0 1
 A.K.




 - Original Message -
 From: Elaine Kuo elaine.kuo...@gmail.com
 To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch
 Cc:
 Sent: Wednesday, July 31, 2013 9:03 AM
 Subject: [R] merge matrix row data

 Dear list,



 I have a matrix showing the species presence-absence on a map.

 Its rows are map locations, represented by GridCellID, such as GID1 and GID
 5.

 Its columns are species ID, such as D0989, D9820, and D5629.

 The matrix is as followed.



 Now I want to merge the GridCellID according to the map location of each
 island.

 For instance, Island A consist of GID 1 and 5. Island B consist of GID 2,
 4, and 7.

 In GID 1 and 5, species D0989 are both 1.

 Then I want to merge GID 1 and 5 into Island A, with species D0989 as 1.

 The original matrix and the resulting matrix are listed below.

 Please kindly advise how to code the calculation in R.

 Please do not hesitate to ask if anything is unclear.

 Thank you in advance.



 Elaine



 Original matrix

 D0989   D9820  D5629  D4327  D2134

 GID 1100   1  0

 GID 2011   0  0

 GID 4001   0  0

 GID 5110   0  0

 GID 7010   0  1



 Resulting matrix

 D0989   D9820  D5629  D4327  D2134

 Island A   11   0   1   0

 Island B   01   1   0   1

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split in blocks

2013-07-31 Thread Bert Gunter
Enter
?help
at the prompt to learn how to use R's (extensive) Help system to
answer questions like this.
For this question:
?split  ## what else?

Also ?tapply, ?ave, ?aggregate, ?by

may be relevant.

Also, read AN Introduction to R if you haven't already done so to
start learning about R's many data manipulation and analysis features.

Cheers,
Bert

On Wed, Jul 31, 2013 at 7:39 AM, Dominic Roye dominic.r...@gmail.com wrote:
 Hello,

 I am a little bit lost on my search for a solution and idea. I would like
 to split my time serie in blocks of night. V1 indicates if its night or
 not.

 How can i split this kind of cases?


 Best regards,


 str(ou[,c(1,3,8)])
 'data.frame': 863 obs. of  3 variables:
  $ Fecha: POSIXct, format: 2013-07-04 00:10:00 ...
  $ Ta   : num  22.6 22.2 22.2 22.2 22.2 ...
  $ V1   : num  1 1 1 1 1 1 1 1 1 1 ...

 

  dput(ou[,c(1,3,8)])
 structure(list(Fecha = structure(c(1372889400, 137289, 1372890600,
 1372891200, 1372891800, 1372892400, 1372893000, 1372893600, 1372894200,
 1372894800, 1372895400, 1372896000, 1372896600, 1372897200, 1372897800,
 1372898400, 1372899000, 1372899600, 1372900200, 1372900800, 1372901400,
 1372902000, 1372902600, 1372903200, 1372903800, 1372904400, 1372905000,
 1372905600, 1372906200, 1372906800, 1372907400, 1372908000, 1372908600,
 1372909200, 1372909800, 1372910400, 1372911000, 1372911600, 1372912200,
 1372912800, 1372913400, 1372914000, 1372914600, 1372915200, 1372915800,
 1372916400, 1372917000, 1372917600, 1372918200, 1372918800, 1372919400,
 137292, 1372920600, 1372921200, 1372921800, 1372922400, 1372923000,
 1372923600, 1372924200, 1372924800, 1372925400, 1372926000, 1372926600,
 1372927200, 1372927800, 1372928400, 1372929000, 1372929600, 1372930200,
 1372930800, 1372931400, 1372932000, 1372932600, 1372933200, 1372933800,
 1372934400, 1372935000, 1372935600, 1372936200, 1372936800, 1372937400,
 1372938000, 1372938600, 1372939200, 1372939800, 1372940400, 1372941000,
 1372941600, 1372942200, 1372942800, 1372943400, 1372944000, 1372944600,
 1372945200, 1372945800, 1372946400, 1372947000, 1372947600, 1372948200,
 1372948800, 1372949400, 137295, 1372950600, 1372951200, 1372951800,
 1372952400, 1372953000, 1372953600, 1372954200, 1372954800, 1372955400,
 1372956000, 1372956600, 1372957200, 1372957800, 1372958400, 1372959000,
 1372959600, 1372960200, 1372960800, 1372961400, 1372962000, 1372962600,
 1372963200, 1372963800, 1372964400, 1372965000, 1372965600, 1372966200,
 1372966800, 1372967400, 1372968000, 1372968600, 1372969200, 1372969800,
 1372970400, 1372971000, 1372971600, 1372972200, 1372972800, 1372973400,
 1372974000, 1372974600, 1372975200, 1372975800, 1372976400, 1372977000,
 1372977600, 1372978200, 1372978800, 1372979400, 137298, 1372980600,
 1372981200, 1372981800, 1372982400, 1372983000, 1372983600, 1372984200,
 1372984800, 1372985400, 1372986000, 1372986600, 1372987200, 1372987800,
 1372988400, 1372989000, 1372989600, 1372990200, 1372990800, 1372991400,
 1372992000, 1372992600, 1372993200, 1372993800, 1372994400, 1372995000,
 1372995600, 1372996200, 1372996800, 1372997400, 1372998000, 1372998600,
 1372999200, 1372999800, 1373000400, 1373001000, 1373001600, 1373002200,
 1373002800, 1373003400, 1373004000, 1373004600, 1373005200, 1373005800,
 1373006400, 1373007000, 1373007600, 1373008200, 1373008800, 1373009400,
 137301, 1373010600, 1373011200, 1373011800, 1373012400, 1373013000,
 1373013600, 1373014200, 1373014800, 1373015400, 1373016000, 1373016600,
 1373017200, 1373017800, 1373018400, 1373019000, 1373019600, 1373020200,
 1373020800, 1373021400, 1373022000, 1373022600, 1373023200, 1373023800,
 1373024400, 1373025000, 1373025600, 1373026200, 1373026800, 1373027400,
 1373028000, 1373028600, 1373029200, 1373029800, 1373030400, 1373031000,
 1373031600, 1373032200, 1373032800, 1373033400, 1373034000, 1373034600,
 1373035200, 1373035800, 1373036400, 1373037000, 1373037600, 1373038200,
 1373038800, 1373039400, 137304, 1373040600, 1373041200, 1373041800,
 1373042400, 1373043000, 1373043600, 1373044200, 1373044800, 1373045400,
 1373046000, 1373046600, 1373047200, 1373047800, 1373048400, 1373049000,
 1373049600, 1373050200, 1373050800, 1373051400, 1373052000, 1373052600,
 1373053200, 1373053800, 1373054400, 1373055000, 1373055600, 1373056200,
 1373056800, 1373057400, 1373058000, 1373058600, 1373059200, 1373059800,
 1373060400, 1373061000, 1373061600, 1373062200, 1373062800, 1373063400,
 1373064000, 1373064600, 1373065200, 1373065800, 1373066400, 1373067000,
 1373067600, 1373068200, 1373068800, 1373069400, 137307, 1373070600,
 1373071200, 1373071800, 1373072400, 1373073000, 1373073600, 1373074200,
 1373074800, 1373075400, 1373076000, 1373076600, 1373077200, 1373077800,
 1373078400, 1373079000, 1373079600, 1373080200, 1373080800, 1373081400,
 1373082000, 1373082600, 1373083200, 1373083800, 1373084400, 1373085000,
 1373085600, 1373086200, 1373086800, 1373087400, 1373088000, 1373088600,
 1373089200, 1373089800, 

Re: [R] merge matrix row data

2013-07-31 Thread Bert Gunter
Time to do some homework, Elaine:

?regexp

There are also numerous online tutorials on regular expressions that
you can use to educate yourself.

Cheers,
Bert

On Wed, Jul 31, 2013 at 2:07 PM, Elaine Kuo elaine.kuo...@gmail.com wrote:
 Dear Arun

 Thank you for the very useful help.
 However, please kindly explain the code below.
 row.names(mat1)- gsub([_], ,row.names(mat1))

 1. what does [_] mean?
 2. what doesmean?
 3. what does row.names(mat1) mean?

 I checked ?gsub but still did not get the idea.

 Thank you again

 Elaine


 On Wed, Jul 31, 2013 at 9:35 PM, arun smartpink...@yahoo.com wrote:

 HI,

 Please use ?dput()
 mat1- as.matrix(read.table(text=
 D0989  D9820  D5629  D4327  D2134
 GID_1100  1  0
 GID_2011  0  0
 GID_4001  0  0
 GID_5110  0  0
 GID_7010  0  1
 ,sep=,header=TRUE))
 row.names(mat1)- gsub([_], ,row.names(mat1))
 IslandA-c(GID 1, GID 5)
 IslandB- c(GID 2, GID 4, GID 7)
  res-  t(sapply(c(IslandA,IslandB),function(x)
 {x1-mat1[match(get(x),row.names(mat1)),];(!!colSums(x1))*1} ))

  res
 #D0989 D9820 D5629 D4327 D2134
 #IslandA 1 1 0 1 0
 #IslandB 0 1 1 0 1
 A.K.




 - Original Message -
 From: Elaine Kuo elaine.kuo...@gmail.com
 To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch
 Cc:
 Sent: Wednesday, July 31, 2013 9:03 AM
 Subject: [R] merge matrix row data

 Dear list,



 I have a matrix showing the species presence-absence on a map.

 Its rows are map locations, represented by GridCellID, such as GID1 and GID
 5.

 Its columns are species ID, such as D0989, D9820, and D5629.

 The matrix is as followed.



 Now I want to merge the GridCellID according to the map location of each
 island.

 For instance, Island A consist of GID 1 and 5. Island B consist of GID 2,
 4, and 7.

 In GID 1 and 5, species D0989 are both 1.

 Then I want to merge GID 1 and 5 into Island A, with species D0989 as 1.

 The original matrix and the resulting matrix are listed below.

 Please kindly advise how to code the calculation in R.

 Please do not hesitate to ask if anything is unclear.

 Thank you in advance.



 Elaine



 Original matrix

 D0989   D9820  D5629  D4327  D2134

 GID 1100   1  0

 GID 2011   0  0

 GID 4001   0  0

 GID 5110   0  0

 GID 7010   0  1



 Resulting matrix

 D0989   D9820  D5629  D4327  D2134

 Island A   11   0   1   0

 Island B   01   1   0   1

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] double matrix?

2013-07-31 Thread bruce087

Hi-

I have a 37 X 473971 character matrix that I am trying to convert into a 
numeric matrix. When I use the code:


class(matrix) = numeric  

I end up with something called a double matrix whose dimensions are still 
37 X 473971


I have also tried 


new = apply(matrix,2, as.numeric) and got the same thing.

The analysis code I am ultimately attempting to run on this data requires 
that it be in a numerical matrix, and it is really not okay with a double 
matrix.


Does anyone know how to fix this?

Thanks.

--
Jessica R.B. Musselman, MS
T32 Trainee/Doctoral Candidate
University of Minnesota
Department of Pediatrics
Division of Epidemiology/Clinical Research
Mayo Mail Code 715
Room 1-195 Moos Tower
420 Delaware St. SE
Minneapolis MN 55455
Phone: (612)626-3281
email: bruce...@umn.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does a general latex table-making function exist?

2013-07-31 Thread Frank Harrell

Duncan,

I had read your excellent tables package vignette at 
http://cran.r-project.org/web/packages/tables/vignettes/tables.pdf when 
it first came out.  It is extremely impressive.  I'm glad to be reminded 
to give it another look.


Is there a way to make the special symbols n and 1 refer to the number 
of non-missing observations rather than the length of a vector?


Do you feel like taking on this challenge?  An example of an irregular 
table I'm thinking of is the following


   Females  Males
 Q1 Med Q3   (n)   Q1 Med Q3   (n)
Age  25  49 63 (1016) 26  50 64  (1767)

Canadians
 Weight (kg) 57  63 74 ( 243) 67  73 90  ( 401)

Canadians could mean country=='Canada'.

Thanks!
Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] double matrix?

2013-07-31 Thread Don McKenzie
What are the entries in your matrix?  If they are something that won't coerce 
to numeric, you need to backtrack. Note how R distinguishes types of characters.

 as.numeric(a)
[1] NA
Warning message:
NAs introduced by coercion 
 as.character(2)
[1] 2
 as.numeric(2)
[1] 2


On Jul 31, 2013, at 1:47 PM, bruce...@umn.edu wrote:

 Hi-
 
 I have a 37 X 473971 character matrix that I am trying to convert into a 
 numeric matrix. When I use the code:
 
 class(matrix) = numeric  
 I end up with something called a double matrix whose dimensions are still 
 37 X 473971
 
 I have also tried 
 new = apply(matrix,2, as.numeric) and got the same thing.
 
 The analysis code I am ultimately attempting to run on this data requires 
 that it be in a numerical matrix, and it is really not okay with a double 
 matrix.
 
 Does anyone know how to fix this?
 
 Thanks.
 
 -- 
 Jessica R.B. Musselman, MS
 T32 Trainee/Doctoral Candidate
 University of Minnesota
 Department of Pediatrics
 Division of Epidemiology/Clinical Research
 Mayo Mail Code 715
 Room 1-195 Moos Tower
 420 Delaware St. SE
 Minneapolis MN 55455
 Phone: (612)626-3281
 email: bruce...@umn.edu
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Don McKenzie, Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources, College of the Environment
CSES Climate Impacts Group
University of Washington

phone: 206-732-7824
d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] double matrix?

2013-07-31 Thread Richard M. Heiberger
In R, double is a synonym for numeric.
Please see
?numeric

The details section of ?numeric begins with
Details:

 'numeric' is identical to 'double' (and 'real').  It creates a
 double-precision vector of the specified length with each element
 equal to '0'.

Rich


On Wed, Jul 31, 2013 at 4:47 PM, bruce...@umn.edu wrote:

 Hi-

 I have a 37 X 473971 character matrix that I am trying to convert into a
 numeric matrix. When I use the code:

 class(matrix) = numeric
 I end up with something called a double matrix whose dimensions are
 still 37 X 473971

 I have also tried
 new = apply(matrix,2, as.numeric) and got the same thing.

 The analysis code I am ultimately attempting to run on this data requires
 that it be in a numerical matrix, and it is really not okay with a double
 matrix.

 Does anyone know how to fix this?

 Thanks.

 --
 Jessica R.B. Musselman, MS
 T32 Trainee/Doctoral Candidate
 University of Minnesota
 Department of Pediatrics
 Division of Epidemiology/Clinical Research
 Mayo Mail Code 715
 Room 1-195 Moos Tower
 420 Delaware St. SE
 Minneapolis MN 55455
 Phone: (612)626-3281
 email: bruce...@umn.edu

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] double matrix?

2013-07-31 Thread Rui Barradas

Hello,

double and numeric are the same. From the help page for ?double, 
section Note on names


It is a historical anomaly that R has two names for its floating-point 
vectors, double and numeric (and formerly had real).


Apparently you are successfully converting characters to double 
precision floating-point numbers.


Hope this helps,

Rui Barradas


Em 31-07-2013 21:47, bruce...@umn.edu escreveu:

Hi-

I have a 37 X 473971 character matrix that I am trying to convert into a
numeric matrix. When I use the code:

class(matrix) = numeric
I end up with something called a double matrix whose dimensions are
still 37 X 473971

I have also tried
new = apply(matrix,2, as.numeric) and got the same thing.

The analysis code I am ultimately attempting to run on this data
requires that it be in a numerical matrix, and it is really not okay
with a double matrix.

Does anyone know how to fix this?

Thanks.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] double matrix?

2013-07-31 Thread David Carlson
It is hard to understand that your R code will not work with a
double matrix since double is just short for double precision
floating point matrix. Your only alternative would be integer.

From ?numeric

It is a historical anomaly that R has two names for its
floating-point vectors, double and numeric (and formerly had
real).

double is the name of the type. numeric is the name of the
mode and also of the implicit class.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of
bruce...@umn.edu
Sent: Wednesday, July 31, 2013 3:48 PM
To: r-help@r-project.org
Subject: [R] double matrix?

Hi-

I have a 37 X 473971 character matrix that I am trying to
convert into a 
numeric matrix. When I use the code:

 class(matrix) = numeric  

I end up with something called a double matrix whose
dimensions are still 
37 X 473971

I have also tried 

new = apply(matrix,2, as.numeric) and got the same thing.

The analysis code I am ultimately attempting to run on this
data requires 
that it be in a numerical matrix, and it is really not okay
with a double 
matrix.

Does anyone know how to fix this?

Thanks.

-- 
Jessica R.B. Musselman, MS
T32 Trainee/Doctoral Candidate
University of Minnesota
Department of Pediatrics
Division of Epidemiology/Clinical Research
Mayo Mail Code 715
Room 1-195 Moos Tower
420 Delaware St. SE
Minneapolis MN 55455
Phone: (612)626-3281
email: bruce...@umn.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Convert rbind of lists to data.frame

2013-07-31 Thread Shaun Jackman
I'm trying to build a data.frame row-by-row like so:

df - data.frame(rbind(list('a',1), list('b', 2), list('c', 3)))

I was surprised to see that the columns of the resulting data.frame
are stored in lists rather than vectors.

str(df)
'data.frame': 3 obs. of  2 variables:
 $ X1:List of 3
  ..$ : chr a
  ..$ : chr b
  ..$ : chr c
 $ X2:List of 3
  ..$ : num 1
  ..$ : num 2
  ..$ : num 3

The desired result is:

str(df)
'data.frame': 3 obs. of  2 variables:
 $ X1: chr  a b c
 $ X2: num  1 2 3

The following works, but is rather ugly:

df - data.frame(lapply(data.frame(rbind(list('a',1), list('b', 2),
list('c', 3))), unlist))

Thanks,
Shaun

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] double matrix?

2013-07-31 Thread William Dunlap
In R double and numeric mean essentially the same thing.  I think
you are fine. (What called the result a double matrix?)

   z - cbind(c(11, 12), c(3.14, 2.718))
   str(z)
   chr [1:2, 1:2] 11 12 3.14 2.718
   class(z)
  [1] matrix
  
   class(z) - numeric
   str(z)
   num [1:2, 1:2] 11 12 3.14 2.72
   class(z)
  [1] matrix
   z
   [,1]  [,2]
  [1,]   11 3.140
  [2,]   12 2.718
   log(z)
   [,1]  [,2]
  [1,] 2.397895 1.1442228
  [2,] 2.484907 0.9998963

R numeric vectors consist of C double or Fortran double precision
or real*8 values - 8 byte double precision floating point numbers with
52 binary digits of precision. 

S supported 4-byte single precision vectors which it also considered
numeric.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of bruce...@umn.edu
 Sent: Wednesday, July 31, 2013 1:48 PM
 To: r-help@r-project.org
 Subject: [R] double matrix?
 
 Hi-
 
 I have a 37 X 473971 character matrix that I am trying to convert into a
 numeric matrix. When I use the code:
 
  class(matrix) = numeric
 
 I end up with something called a double matrix whose dimensions are still
 37 X 473971
 
 I have also tried
 
 new = apply(matrix,2, as.numeric) and got the same thing.
 
 The analysis code I am ultimately attempting to run on this data requires
 that it be in a numerical matrix, and it is really not okay with a double
 matrix.
 
 Does anyone know how to fix this?
 
 Thanks.
 
 --
 Jessica R.B. Musselman, MS
 T32 Trainee/Doctoral Candidate
 University of Minnesota
 Department of Pediatrics
 Division of Epidemiology/Clinical Research
 Mayo Mail Code 715
 Room 1-195 Moos Tower
 420 Delaware St. SE
 Minneapolis MN 55455
 Phone: (612)626-3281
 email: bruce...@umn.edu
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] resampling

2013-07-31 Thread Rui Barradas

Hello,

The best way seems to be ?replicate.


set.seed(3997)   # make it reproducible
x - rnorm(1002) # make up some data

sim - replicate(1000, sample(x, 20))

colSds - function(x, na.rm = FALSE) apply(x, 2, sd, na.rm = na.rm)

mu - colMeans(sim)
sigma - colSds(sim)


Hope this helps,

Rui Barradas

Em 31-07-2013 12:23, Rita Gamito escreveu:

Could anyone tell me how,from a pool of 1002 observations (one variable),  can 
I resample 1000 samples of 20 observations?
And then calculate the mean and standard deviation between 2, 3, 4, ..., 1000 
samples and plot them?
Thank you!

_

Rita Gamito
Centro de Oceanografia
Faculdade de Ciências, Universidade de Lisboa
Campo Grande, 1749-016 Lisboa, Portugal
e-mail: rgam...@fc.ul.pt
Tel: + 351 21 750 00 00 - ext. 22575
Fax: + 351 21 750 02 07
www.co.fc.ul.pt
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert rbind of lists to data.frame

2013-07-31 Thread arun
May be this helps:

l1- list('a',1)
 l2- list('b',2)
 l3- list('c',3)
df1-data.frame(mapply(`c`,l1,l2,l3,SIMPLIFY=FALSE),stringsAsFactors=FALSE)
 colnames(df1)-paste0(X,1:2)
  str(df1)
#'data.frame':    3 obs. of  2 variables:
# $ X1: chr  a b c
# $ X2: num  1 2 3

A.K.



- Original Message -
From: Shaun Jackman sjack...@gmail.com
To: R help r-help@r-project.org
Cc: 
Sent: Wednesday, July 31, 2013 5:58 PM
Subject: [R] Convert rbind of lists to data.frame

I'm trying to build a data.frame row-by-row like so:

df - data.frame(rbind(list('a',1), list('b', 2), list('c', 3)))

I was surprised to see that the columns of the resulting data.frame
are stored in lists rather than vectors.

str(df)
'data.frame': 3 obs. of  2 variables:
$ X1:List of 3
  ..$ : chr a
  ..$ : chr b
  ..$ : chr c
$ X2:List of 3
  ..$ : num 1
  ..$ : num 2
  ..$ : num 3

The desired result is:

str(df)
'data.frame': 3 obs. of  2 variables:
$ X1: chr  a b c
$ X2: num  1 2 3

The following works, but is rather ugly:

df - data.frame(lapply(data.frame(rbind(list('a',1), list('b', 2),
list('c', 3))), unlist))

Thanks,
Shaun

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge matrix row data

2013-07-31 Thread arun
Hi Elaine, 

In that case:
Do you have GID in the IslandA and IslandBs?

IslandA-c(GID 1, GID 5)
IslandB- c(GID 2, GID 4, GID 7)

If there is no change in the two Islands, then using the same dataset:

mat1- as.matrix(read.table(text=
D0989  D9820  D5629  D4327  D2134
GID_1    1    0    0  1  0
GID_2    0    1    1  0  0
GID_4    0    0    1  0  0
GID_5    1    1    0  0  0
GID_7    0    1    0  0  1
,sep=,header=TRUE))

row.names(mat1)- gsub(.*\\_,,row.names(mat1)) #to replace the GID_ from 
the row.names()
 mat1
#  D0989 D9820 D5629 D4327 D2134
#1 1 0 0 1 0
#2 0 1 1 0 0
#4 0 0 1 0 0
#5 1 1 0 0 0
#7 0 1 0 0 1
 IslandA-c(GID 1, GID 5)
 IslandB- c(GID 2, GID 4, GID 7)
res-t(sapply(c(IslandA,IslandB),function(x) {x1- 
mat1[match(gsub(.*\\s+,,get(x)),row.names(mat1)),];(!!colSums(x1))*1}))
res
 #   D0989 D9820 D5629 D4327 D2134
#IslandA 1 1 0 1 0
#IslandB 0 1 1 0 1


Regarding the use of !!colSums()
You can check these:

 t(sapply(c(IslandA,IslandB),function(x) {x1- 
mat1[match(gsub(.*\\s+,,get(x)),row.names(mat1)),];!colSums(x1)}))
#    D0989 D9820 D5629 D4327 D2134
#IslandA FALSE FALSE  TRUE FALSE  TRUE
#IslandB  TRUE FALSE FALSE  TRUE FALSE
 
t(sapply(c(IslandA,IslandB),function(x) {x1- 
mat1[match(gsub(.*\\s+,,get(x)),row.names(mat1)),];!!colSums(x1)}))
#    D0989 D9820 D5629 D4327 D2134
#IslandA  TRUE  TRUE FALSE  TRUE FALSE
#IslandB FALSE  TRUE  TRUE FALSE  TRUE

# *1 will replace TRUE with 1 and FALSE with 0.

A.K.







From: Elaine Kuo elaine.kuo...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Wednesday, July 31, 2013 6:58 PM
Subject: Re: [R] merge matrix row data



Dear Arun, 

Thank you for the clear explanation.
The row.names question is a mistyping, for I do not have enough sleep last 
night.

Two more questions

1. If the row names are 1, 2, and 4 etc (numbers) instead of GID 1, GID 2, and 
GID 3, 
   is there any modification in need for the code ?

2. Please kindly explain the code
    (!!colSums(x1))*1}

   It is the critical part to merge the row data.

Thanks again.

Elaine



On Thu, Aug 1, 2013 at 6:45 AM, arun smartpink...@yahoo.com wrote:

Dear Elaine,

I used that line only because you didn't provide the data using dput().  So, I 
need to either use delimiter  , or just leave a space by first joining the 
GID and the numbers using _.  I chose the latter as I didn't had that 
much time to spent by putting , between each entries.  After that, I removed 
_ using the ?gsub().  As Bert pointed out, there are many online resources 
for understanding regular expression.

In this particular case, what I did was to single out the _ in the first 
pair of quotes, and replace with  space in the second pair of quotes  .  
Therefore, GID_1, would become GID 1, which is what your original dataset 
looks like.

If you type row.names(mat1) on the R console and enter, you will be able to 
get the output. 

Hope it helps.
Arun









From: Elaine Kuo elaine.kuo...@gmail.com
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Wednesday, July 31, 2013 5:07 PM
Subject: Re: [R] merge matrix row data




Dear Arun

Thank you for the very useful help.
However, please kindly explain the code below.
row.names(mat1)- gsub([_], ,row.names(mat1))


1. what does [_] mean?
2. what does    mean?
3. what does row.names(mat1) mean?

I checked ?gsub but still did not get the idea.

Thank you again

Elaine



On Wed, Jul 31, 2013 at 9:35 PM, arun smartpink...@yahoo.com wrote:

HI,

Please use ?dput()
mat1- as.matrix(read.table(text=

D0989  D9820  D5629  D4327  D2134
GID_1    1    0    0  1  0
GID_2    0    1    1  0  0
GID_4    0    0    1  0  0
GID_5    1    1    0  0  0
GID_7    0    1    0  0  1
,sep=,header=TRUE))
row.names(mat1)- gsub([_], ,row.names(mat1))
IslandA-c(GID 1, GID 5)
IslandB- c(GID 2, GID 4, GID 7)
 res-  t(sapply(c(IslandA,IslandB),function(x) 
{x1-mat1[match(get(x),row.names(mat1)),];(!!colSums(x1))*1} ))

 res
#    D0989 D9820 D5629 D4327 D2134
#IslandA 1 1 0 1 0
#IslandB 0 1 1 0 1
A.K.





- Original Message -
From: Elaine Kuo elaine.kuo...@gmail.com
To: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch
Cc:
Sent: Wednesday, July 31, 2013 9:03 AM
Subject: [R] merge matrix row data

Dear list,



I have a matrix showing the species presence-absence on a map.

Its rows are map locations, represented by GridCellID, such as GID1 and GID
5.

Its columns are species ID, such as D0989, D9820, and D5629.

The matrix is as followed.



Now I want to merge the GridCellID according to the map location of each
island.

For instance, Island A consist of GID 1 and 

Re: [R] Split in blocks

2013-07-31 Thread arun




Hi,
Not clear about your desired output.

source(ou.txt)
split(ou,ou$V1) #split based on values of V1 (1 and 0) 
#or
#may be you wanted 1 followed by 0 in one block, again 1 followed by 0 in 
second block etc.. 
#In that case:
lst1-split(ou,cumsum(c(TRUE,diff(ou$V1)==1)))
A.K.

On Wed, Jul 31, 2013 at 7:39 AM, Dominic Roye dominic.r...@gmail.com wrote:
 Hello,

 I am a little bit lost on my search for a solution and idea. I would like
 to split my time serie in blocks of night. V1 indicates if its night or
 not.

 How can i split this kind of cases?


 Best regards,


 str(ou[,c(1,3,8)])
 'data.frame': 863 obs. of  3 variables:
  $ Fecha: POSIXct, format: 2013-07-04 00:10:00 ...
  $ Ta   : num  22.6 22.2 22.2 22.2 22.2 ...
  $ V1   : num  1 1 1 1 1 1 1 1 1 1 ...

 

  dput(ou[,c(1,3,8)])
 structure(list(Fecha = structure(c(1372889400, 137289, 1372890600,
 1372891200, 1372891800, 1372892400, 1372893000, 1372893600, 1372894200,
 1372894800, 1372895400, 1372896000, 1372896600, 1372897200, 1372897800,
 1372898400, 1372899000, 1372899600, 1372900200, 1372900800, 1372901400,
 1372902000, 1372902600, 1372903200, 1372903800, 1372904400, 1372905000,
 1372905600, 1372906200, 1372906800, 1372907400, 1372908000, 1372908600,
 1372909200, 1372909800, 1372910400, 1372911000, 1372911600, 1372912200,
 1372912800, 1372913400, 1372914000, 1372914600, 1372915200, 1372915800,
 1372916400, 1372917000, 1372917600, 1372918200, 1372918800, 1372919400,
 137292, 1372920600, 1372921200, 1372921800, 1372922400, 1372923000,
 1372923600, 1372924200, 1372924800, 1372925400, 1372926000, 1372926600,
 1372927200, 1372927800, 1372928400, 1372929000, 1372929600, 1372930200,
 1372930800, 1372931400, 1372932000, 1372932600, 1372933200, 1372933800,
 1372934400, 1372935000, 1372935600, 1372936200, 1372936800, 1372937400,
 1372938000, 1372938600, 1372939200, 1372939800, 1372940400, 1372941000,
 1372941600, 1372942200, 1372942800, 1372943400, 1372944000, 1372944600,
 1372945200, 1372945800, 1372946400, 1372947000, 1372947600, 1372948200,
 1372948800, 1372949400, 137295, 1372950600, 1372951200, 1372951800,
 1372952400, 1372953000, 1372953600, 1372954200, 1372954800, 1372955400,
 1372956000, 1372956600, 1372957200, 1372957800, 1372958400, 1372959000,
 1372959600, 1372960200, 1372960800, 1372961400, 1372962000, 1372962600,
 1372963200, 1372963800, 1372964400, 1372965000, 1372965600, 1372966200,
 1372966800, 1372967400, 1372968000, 1372968600, 1372969200, 1372969800,
 1372970400, 1372971000, 1372971600, 1372972200, 1372972800, 1372973400,
 1372974000, 1372974600, 1372975200, 1372975800, 1372976400, 1372977000,
 1372977600, 1372978200, 1372978800, 1372979400, 137298, 1372980600,
 1372981200, 1372981800, 1372982400, 1372983000, 1372983600, 1372984200,
 1372984800, 1372985400, 1372986000, 1372986600, 1372987200, 1372987800,
 1372988400, 1372989000, 1372989600, 1372990200, 1372990800, 1372991400,
 1372992000, 1372992600, 1372993200, 1372993800, 1372994400, 1372995000,
 1372995600, 1372996200, 1372996800, 1372997400, 1372998000, 1372998600,
 1372999200, 1372999800, 1373000400, 1373001000, 1373001600, 1373002200,
 1373002800, 1373003400, 1373004000, 1373004600, 1373005200, 1373005800,
 1373006400, 1373007000, 1373007600, 1373008200, 1373008800, 1373009400,
 137301, 1373010600, 1373011200, 1373011800, 1373012400, 1373013000,
 1373013600, 1373014200, 1373014800, 1373015400, 1373016000, 1373016600,
 1373017200, 1373017800, 1373018400, 1373019000, 1373019600, 1373020200,
 1373020800, 1373021400, 1373022000, 1373022600, 1373023200, 1373023800,
 1373024400, 1373025000, 1373025600, 1373026200, 1373026800, 1373027400,
 1373028000, 1373028600, 1373029200, 1373029800, 1373030400, 1373031000,
 1373031600, 1373032200, 1373032800, 1373033400, 1373034000, 1373034600,
 1373035200, 1373035800, 1373036400, 1373037000, 1373037600, 1373038200,
 1373038800, 1373039400, 137304, 1373040600, 1373041200, 1373041800,
 1373042400, 1373043000, 1373043600, 1373044200, 1373044800, 1373045400,
 1373046000, 1373046600, 1373047200, 1373047800, 1373048400, 1373049000,
 1373049600, 1373050200, 1373050800, 1373051400, 1373052000, 1373052600,
 1373053200, 1373053800, 1373054400, 1373055000, 1373055600, 1373056200,
 1373056800, 1373057400, 1373058000, 1373058600, 1373059200, 1373059800,
 1373060400, 1373061000, 1373061600, 1373062200, 1373062800, 1373063400,
 1373064000, 1373064600, 1373065200, 1373065800, 1373066400, 1373067000,
 1373067600, 1373068200, 1373068800, 1373069400, 137307, 1373070600,
 1373071200, 1373071800, 1373072400, 1373073000, 1373073600, 1373074200,
 1373074800, 1373075400, 1373076000, 1373076600, 1373077200, 1373077800,
 1373078400, 1373079000, 1373079600, 1373080200, 1373080800, 1373081400,
 1373082000, 1373082600, 1373083200, 1373083800, 1373084400, 1373085000,
 1373085600, 1373086200, 1373086800, 1373087400, 1373088000, 1373088600,
 1373089200, 1373089800, 1373090400, 1373091000, 1373091600, 1373092200,
 1373092800, 1373093400, 

Re: [R] Split in blocks

2013-07-31 Thread arun
Hi,
In that case:
lst1-split(ou,cumsum(c(TRUE,diff(ou$V1)==1)))
 lst2-lapply(lst1,function(x) x[x$V1==1,])


A.K.







From: Dominic Roye dominic.r...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Wednesday, July 31, 2013 7:17 PM
Subject: Re: [R] Split in blocks



Hi, 


The thing is that because of the change of day at 00:00, I can not split with 
the date. 

I created V1 for separate the day time from the night time (sunset till 
sunrise). But now I need to separate each night without the day time and the 
other nights.

The aim is to obtain something like that in form of a list or an index 
indicating each night in the data.frame:

I hope you understand now my explanation.  Thank you. 

night 1:

278 2013-07-05 22:20:00      2.42 27.61 61   0 05.07.2013 22:20  1
279 2013-07-05 22:30:00      2.35 27.39 62   0 05.07.2013 22:30  1
280 2013-07-05 22:40:00      2.18 27.07 63   0 05.07.2013 22:40  1
281 2013-07-05 22:50:00      2.21 26.80 64   0 05.07.2013 22:50  1
282 2013-07-05 23:00:00      2.30 26.42 65   0 05.07.2013 23:00  1
283 2013-07-05 23:10:00      1.91 26.03 66   0 05.07.2013 23:10  1
284 2013-07-05 23:20:00      2.54 25.61 67   0 05.07.2013 23:20  1
285 2013-07-05 23:30:00      2.79 25.15 68   0 05.07.2013 23:30  1
286 2013-07-05 23:40:00      2.66 24.83 70   0 05.07.2013 23:40  1
287 2013-07-05 23:50:00      2.35 24.55 70   0 05.07.2013 23:50  1
288 2013-07-06 00:00:00      2.05 24.34 71   0 06.07.2013 00:00  1

289 2013-07-06 00:10:00      1.88 24.12 71   0 06.07.2013 00:10  1
290 2013-07-06 00:20:00      2.25 23.87 72   0 06.07.2013 00:20  1
291 2013-07-06 00:30:00      1.82 23.57 73   0 06.07.2013 00:30  1
292 2013-07-06 00:40:00      2.06 23.30 74   0 06.07.2013 00:40  1
293 2013-07-06 00:50:00      2.21 23.08 74   0 06.07.2013 00:50  1
294 2013-07-06 01:00:00      2.78 22.78 74   0 06.07.2013 01:00  1
295 2013-07-06 01:10:00      2.70 22.66 75   0 06.07.2013 01:10  1
296 2013-07-06 01:20:00      2.42 22.36 77   0 06.07.2013 01:20  1
297 2013-07-06 01:30:00      2.48 22.18 76   0 06.07.2013 01:30  1
298 2013-07-06 01:40:00      2.88 22.11 77   0 06.07.2013 01:40  1
299 2013-07-06 01:50:00      1.39 22.01 78   0 06.07.2013 01:50  1
300 2013-07-06 02:00:00      1.05 21.61 80   0 06.07.2013 02:00  1
301 2013-07-06 02:10:00      1.07 21.79 78   0 06.07.2013 02:10  1
302 2013-07-06 02:20:00      1.89 21.50 79   0 06.07.2013 02:20  1
303 2013-07-06 02:30:00      1.83 21.15 81   0 06.07.2013 02:30  1
304 2013-07-06 02:40:00      2.34 20.83 81   0 06.07.2013 02:40  1
305 2013-07-06 02:50:00      2.28 20.60 81   0 06.07.2013 02:50  1
306 2013-07-06 03:00:00      1.85 20.58 82   0 06.07.2013 03:00  1
307 2013-07-06 03:10:00      1.39 20.51 82   0 06.07.2013 03:10  1
308 2013-07-06 03:20:00      1.30 20.19 84   0 06.07.2013 03:20  1
309 2013-07-06 03:30:00      1.87 20.16 83   0 06.07.2013 03:30  1
310 2013-07-06 03:40:00      2.28 20.07 83   0 06.07.2013 03:40  1
311 2013-07-06 03:50:00      2.12 20.09 83   0 06.07.2013 03:50  1
312 2013-07-06 04:00:00      1.72 19.97 84   0 06.07.2013 04:00  1
313 2013-07-06 04:10:00      1.37 19.67 85   0 06.07.2013 04:10  1
314 2013-07-06 04:20:00      0.84 19.37 87   0 06.07.2013 04:20  1
315 2013-07-06 04:30:00      0.30 19.36 87   0 06.07.2013 04:30  1
316 2013-07-06 04:40:00      1.76 19.39 86   0 06.07.2013 04:40  1
317 2013-07-06 04:50:00      2.00 19.09 87   0 06.07.2013 04:50  1
318 2013-07-06 05:00:00      1.00 18.82 89   0 06.07.2013 05:00  1
319 2013-07-06 05:10:00      1.60 19.00 87   4 06.07.2013 05:10  1
320 2013-07-06 05:20:00      1.85 19.06 87   9 06.07.2013 05:20  1
321 2013-07-06 05:30:00      1.44 19.06 86  14 06.07.2013 05:30  1
322 2013-07-06 05:40:00      1.38 18.83 87  26 06.07.2013 05:40  1
323 2013-07-06 05:50:00      1.87 18.74 88  57 06.07.2013 05:50  1
324 2013-07-06 06:00:00      1.91 19.42 84  78 06.07.2013 06:00  1
325 2013-07-06 06:10:00      0.85 19.78 83 100 06.07.2013 06:10  1
326 2013-07-06 06:20:00      0.80 20.22 81 124 06.07.2013 06:20  1
327 2013-07-06 06:30:00      0.67 20.86 79 150 06.07.2013 06:30  1
328 2013-07-06 06:40:00      1.03 20.86 79 179 06.07.2013 06:40  1
329 2013-07-06 06:50:00      1.20 20.63 80 209 06.07.2013 06:50  1
330 2013-07-06 07:00:00      1.03 20.97 79 238 06.07.2013 07:00  1

night 2

421 2013-07-06 22:10:00      2.63 28.16 60   0 06.07.2013 22:10  1
422 2013-07-06 22:20:00      3.19 27.88 61   0 06.07.2013 22:20  1
423 2013-07-06 22:30:00      3.77 27.55 62   0 06.07.2013 22:30  1
424 2013-07-06 22:40:00      3.37 27.21 64   0 06.07.2013 22:40  1
425 2013-07-06 22:50:00      2.32 26.88 65   0 06.07.2013 22:50  1
426 2013-07-06 23:00:00      2.43 26.56 66   0 06.07.2013 23:00  1
427 2013-07-06 23:10:00      2.96 26.31 66   0 06.07.2013 23:10  1
428 2013-07-06 23:20:00      3.23 26.08 67   0 06.07.2013 23:20  1
429 2013-07-06 23:30:00      4.00 25.79 68   0 06.07.2013 23:30  1
430 2013-07-06 23:40:00      3.55 25.47 69   0 06.07.2013 23:40  1
431 2013-07-06 

[R] R and S+ Courses: Brisbane, Melbourne, Sydney in Aug and Sep.

2013-07-31 Thread Kris Angelovski
R and S+ Courseshttp://www.solutionmetrics.com.au/

Brisbane, Melbourne  Sydneyhttp://www.solutionmetrics.com.au/



Hi,

Apologies for cross-posting

SolutionMetrics is presenting R and S+ courses in Brisbane, Melbourne  Sydney 
- August  September, 2013

To book, please email 
enquir...@solutionmetrics.com.aumailto:enquir...@solutionmetrics.com.au or 
call +61 2 9233 6888

Getting Started with R (2 Day)

Day 1: Introduction to R, Data objects  Classes, Data Import/Export, Data 
Manipulation, Graphics, Basic Statistical models, avoiding repetitive 
typing/clicking  file management

Day2: Writing your own simple functions, efficient programming, Advanced 
Visualisations, Data Mining - Logistic Regression/Tree models  Working with 
Time-Series objects. More Infohttp://bit.ly/11qFxpO

Date: 12-13 Aug, 2013 - Sydney (Mon-Tue)
  19-20 Aug, 2013 - Melbourne (Mon-Tue)
  26-27 Aug, 2013 - Brisbane (Mon-Tue)

Getting Started with S+ (2 Day)

Day 1: Course provides users with the knowledge to perform all day to day data 
analysis  graphics tasks with just a click of a mouse (No Programming 
Required).

Day2: Introduction to the S Language, Data objects  Classes, Data 
Import/Export, Data Manipulation, Graphics, Basic Statistical models - 
Regression, avoiding repetitive typing/clicking  file management. Course 
Outlinehttp://bit.ly/16fKTFY

Date:   9-10 Sep, 2013 - Sydney (Mon-Tue)
   19-20 Sep, 2013 - Melbourne (Thu-Fri)

Intermediate R (1 Day)

Efficient use of R language functions  objects, Big Data, Advanced Graphics, 
Data Mining - Logistic Regression/Tree models  Working with Time-Series 
objects.
More Infohttp://bit.ly/YBsT5b

Date:   29 Aug, 2013 - Sydney (Thu)
2 Sep, 2013 - Sydney (Mon)
   .
For more information, please email 
enquir...@solutionmetrics.com.aumailto:enquir...@solutionmetrics.com.au or
call + 61 2 9233 6888


Cheers
Kris Angelovski | Director| SolutionMetrics
T +61 2 9233 6888 | F +61 2 9233 4099
Suite 44, Level 9, 88 Pitt Street, Sydney NSW 2000
solutionmetrics.com.auhttp://www.solutionmetrics.com.au/



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R help

2013-07-31 Thread Jim Lemon

On 07/31/2013 10:03 PM, Mª Teresa Martinez Soriano wrote:

Hi

First of all, thanks for this service, it is being very useful for me. I am new 
in R so I have a lot of doubts.

I have to do imputation in a data set, this is a sample of my data set which 
looks like:


  NUMERO  Data1  Data2 IE.2003 IE.2004 IE.2005 IE.2006 IE.2007 IE.2008 
IE.2009 IE.2010
20133 30/09/2002 18/06/2013 153 279 289 370 412 262 
115  75
21138 11/07/2002 13/05/2009546078638365   12009   16763  NA 
 NA  NA
22146 16/10/2009 18/06/2013  NA  NA  NA  NA  NA  NA 
 NA  35
23152 27/05/1999 18/06/2013  NA  80  77  60  89 137 
144 146
24154 21/12/2004 18/06/2013  NA  NA 148 186 302 233 
194 204
25166  8/02/2008 18/06/2013  NA  NA  NA  NA  NA  NA 
 98 160
26177 20/02/1996 18/06/2013  16   4  NA   3   3  NA 
  5   5





The problem is that I have cells which have to be empty, this depends on Data1 
and Data2

For instance in the third row, you can see that Data1 is equal to 16/10/2009, 
so I don't have to

have any information until year 2009, therefore 
IE.2003,IE.2004,IE.2005,IE.2006, IE.2007, IE.2008

have


to be totally empty, but this doesn't mean that they are  missing values, in 
fact they are not. I

don't  want to get any imputation in this cells.

  Ie.2009 and IE.2010 have to be full and they are not, so this cells are 
missing values and I want to get imputed values for them. (I would delete this 
row, because it is impossible to get any imformation about it, but it is ok for 
this example)

On the other hand, in the last row NA is a real missing value.



How can I specify that this cells are empty and don't get this imputed values??

I have tried to put NaN but I have problems in some functions that I need to do 
it before the

imputation.


Hi Teresa,
I didn't see an answer to this, so I'll offer a couple of suggestions. 
First, NA is probably the best thing to have in your empty cells. If 
you change the NA cells to , the columns will become factors, and if 
you then change the values back to numeric, the blanks will become NAs 
again.


I would get a set of vectors of logical values that indicated which 
cells you _don't_ want to impute (say your data frame is tmsdf):


dontimpute2003-which(
 as.numeric(unlist(sapply(strsplit(tmsdf$Data1,/),[,3)))  2003 
 is.na(tmsdf$IE.2003))
dontimpute2004-which(
 as.numeric(unlist(sapply(strsplit(tmsdf$Data1,/),[,3)))  2004 
 is.na(tmsdf$IE.2004))
...

then do your imputation on the entire data frame and reset the ones you 
don't want imputed to NA:


tmsdf$2003[dontimpute2003]-NA
...

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2: color histograms by quintile

2013-07-31 Thread David Chertudi
Hello,
 
I have a basic panel of histograms as follows, whose current colors don't 
matter:
 
 
binsize=diff(range(thing$Rate))/64
ggplot(thing, aes(x=Rate, fill=Series)) + geom_histogram(binwidth=binsize) + 
facet_grid(Series~.,scales=free)+
  labs(fill=Index) +
  xlab(Growth Rate (%)) + 
  theme(axis.title.y=element_blank(),legend.position=c(1,.64), 
legend.justification=c(1,1),strip.text.y = theme_blank()) + 
  scale_x_continuous(breaks=c(-10,-5,-2:10,15,20)) +
  geom_vline(xintercept=0, linetype=dotted)
rm(binsize)
 
 
 
What I would like to do is color each of the four histograms by its own 
deciles.  Essentially 
quantile(trim.index$Rate,c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1)) would give me the 
over-all deciles, but I would like them broken down by the elements of the 
Series variable, and then applied to the histograms as shading or coloring.  
Does this makes sense?  I've dumped the first 100 rows of data below.
 
Thanks in advance for any help you're able to provide.
 
 
David
 
 
 
 structure(list(Trials = 1:100, Year = c(2005L, 2008L, 2006L, 
2007L, 2006L, 2004L, 2004L, 2003L, 2007L, 2005L, 2008L, 2006L, 
2011L, 2005L, 2004L, 2003L, 2010L, 2002L, 2008L, 2005L, 2005L, 
2004L, 2006L, 2011L, 2011L, 2008L, 2006L, 2004L, 2002L, 2003L, 
2009L, 2004L, 2003L, 2011L, 2006L, 2002L, 2007L, 2010L, 2005L, 
2008L, 2011L, 2008L, 2010L, 2005L, 2004L, 2009L, 2002L, 2008L, 
2002L, 2006L, 2003L, 2007L, 2006L, 2006L, 2002L, 2002L, 2010L, 
2008L, 2008L, 2003L, 2003L, 2009L, 2007L, 2009L, 2004L, 2005L, 
2011L, 2010L, 2005L, 2008L, 2008L, 2008L, 2007L, 2008L, 2008L, 
2007L, 2002L, 2009L, 2011L, 2002L, 2002L, 2006L, 2007L, 2007L, 
2002L, 2009L, 2007L, 2003L, 2010L, 2010L, 2009L, 2003L, 2010L, 
2003L, 2007L, 2006L, 2010L, 2005L, 2004L, 2010L), Month = c(12L, 
4L, 5L, 3L, 9L, 12L, 4L, 3L, 6L, 10L, 6L, 11L, 1L, 6L, 9L, 10L, 
5L, 3L, 11L, 10L, 2L, 8L, 9L, 7L, 8L, 8L, 7L, 1L, 9L, 1L, 11L, 
3L, 12L, 1L, 6L, 7L, 6L, 8L, 12L, 8L, 11L, 11L, 5L, 7L, 2L, 6L, 
9L, 9L, 11L, 6L, 11L, 5L, 5L, 3L, 10L, 6L, 7L, 8L, 9L, 2L, 3L, 
11L, 8L, 4L, 12L, 6L, 10L, 10L, 12L, 9L, 4L, 12L, 12L, 12L, 6L, 
6L, 11L, 1L, 5L, 6L, 2L, 4L, 7L, 10L, 12L, 4L, 5L, 8L, 7L, 2L, 
6L, 10L, 10L, 10L, 10L, 2L, 6L, 6L, 9L, 9L), Core.CPI.Weighting = c(2L, 
3L, 2L, 5L, 1L, 5L, 4L, 5L, 4L, 1L, 3L, 4L, 5L, 2L, 4L, 5L, 1L, 
5L, 1L, 3L, 2L, 1L, 5L, 2L, 1L, 2L, 5L, 5L, 4L, 2L, 4L, 4L, 5L, 
5L, 2L, 5L, 3L, 4L, 5L, 1L, 2L, 2L, 5L, 3L, 2L, 2L, 5L, 3L, 2L, 
4L, 2L, 4L, 1L, 3L, 1L, 4L, 1L, 3L, 1L, 1L, 4L, 2L, 3L, 2L, 2L, 
5L, 4L, 4L, 3L, 4L, 2L, 5L, 2L, 5L, 1L, 2L, 5L, 5L, 5L, 2L, 5L, 
3L, 3L, 1L, 5L, 2L, 2L, 2L, 1L, 3L, 5L, 3L, 4L, 3L, 3L, 1L, 1L, 
2L, 2L, 3L), CPI.Food = c(0.023474768, 0.043433814, 0.029315923, 
0.042208873, 0.035479323, 0.024429485, 0.028537661, 0.027623773, 
0.045546671, 0.023973579, 0.045546671, 0.038421672, 0.037161108, 
0.023102181, 0.032765694, 0.032962625, 0.008051879, 0.028741685, 
0.053639179, 0.025192645, 0.025077433, 0.032806764, 0.023605006, 
0.025644434, 0.029584922, 0.031756778, 0.032450724, 0.026035343, 
0.020656969, 0.026035343, 0.010684754, 0.029551194, 0.02442531, 
0.012348667, 0.030959528, 0.023781539, 0.045546671, 0.008345359, 
0.024429485, 0.031756778, 0.034731773, 0.053639179, 0.008051879, 
0.023005118, 0.030315091, 0.04149634, 0.019373857, 0.051078725, 
0.022406708, 0.030959528, 0.022406708, 0.044396055, 0.023304811, 
0.025196539, 0.020831987, 0.016599861, 0.008044572, 0.049349997, 
0.026691689, 0.01612059, 0.015903088, 0.010684754, 0.033979886, 
0.048496522, 0.024429485, 0.023102181, 0.033084021, 0.033084021, 
0.024429485, 0.026691689, 0.048496522, 0.054246603, 0.039956669, 
0.054246603, 0.045546671, 0.045546671, 0.018027105, 0.053917666, 
0.034171337, 0.025416646, 0.029331567, 0.02412957, 0.032450724, 
0.052641267, 0.017531941, 0.048496522, 0.044396055, 0.018367312, 
0.008044572, 0.013896245, 0.04149634, 0.032962625, 0.009905593, 
0.032962625, 0.052641267, 0.025077433, 0.023022986, 0.023102181, 
0.032765694, 0.00916168), PPI.Farm = c(0.009730106, 0.204892729, 
0.138453455, 0.210017271, 0.178801715, -0.017104315, 0.168632738, 
0.157512456, 0.208907609, -0.007879949, 0.208907609, 0.187585976, 
0.171910952, -0.0471555, 0.144318736, 0.111713247, -0.000515726, 
3.019e-05, 0.120566043, -0.027737238, -0.021152479, 0.168890071, 
-0.020784628, 0.231482252, 0.050380553, -0.141247793, 0.16832412, 
0.140014634, -0.04669922, 0.140014634, 0.095775695, 0.02684028, 
0.142349938, 0.125929795, 0.154898078, -0.043965946, 0.208907609, 
0.033632438, -0.017104315, -0.141247793, 0.215496099, 0.120566043, 
-0.000515726, -0.041130752, 0.041959105, -0.105460885, 0.070882919, 
0.189171764, 0.122047713, 0.154898078, 0.122047713, 0.202255379, 
-0.048781928, -0.033167528, 0.099171658, 0.041330169, 0.015714896, 
0.204083122, -0.154114835, -0.008444152, -0.007718043, 0.095775695, 
0.170931281, -0.067437568, -0.017104315, -0.0471555, 0.228225921, 
0.228225921, -0.017104315, -0.154114835, -0.067437568, 0.072354298, 
0.198873427, 0.072354298, 0.208907609, 0.208907609,