Re: [R] "Best" way to merge 300+ .5MB dataframes?

2014-08-11 Thread Prof Brian Ripley

On 12/08/2014 07:07, David Winsemius wrote:


On Aug 11, 2014, at 8:01 PM, John McKown wrote:


On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams  wrote:

Grant,

Assuming all your filenames are something like file1.txt,
file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
the directory where your files are located...

This will strip off the 1st lines, that is, your header lines:

for file in *.txt;do
sed -i '1d'${file};
done

Then, do this:

cat *.txt > newfilename.txt

Doing both should only take a few seconds, depending on your file sizes.

Cheers!
Tom



Using sed hadn't occurred to me. I guess I'm just "awk-ward" .
A slightly different way would be:

for file in *.txt;do
  sed '1d' ${file}
done >newfilename.txt

that way the original files are not modified.  But it strips out the
header on the 1st file as well. Not a big deal, but the read.table
will need to be changed to accommodate that. Also, it creates an
otherwise unnecessary intermediate file "newfilename.txt". To get the
1st file's header, the script could:

head -1 >newfilename.txt
for file in *.txt;do
   sed '1d' ${file}
done >>newfilename.txt

I really like having multiple answers to a given problem. Especially
since I have a poorly implemented version of "awk" on one of my
systems. It is the vendor's "awk" and conforms exactly to the POSIX
definition with no additions. So I don't have the FNR built-in
variable. Your implementation would work well on that system. Well, if
there were a version of R for it. It is a branded UNIX system which
was designed to be totally __and only__ POSIX compliant, with few
(maybe no) extensions at all. IOW, it stinks. No, it can't be
replaced. It is the z/OS system from IBM which is EBCDIC based and
runs on the "big iron" mainframe, system z.

--


On the Mac the awk equivalent is gawk. Within R you would use `system()` 
possibly using paste0() to construct a string to send.


For historical reasons this is actually part of R's configuration: see 
the AWK entry in R_HOME/etc/Makeconf.  (There is an SED entry too: not 
all sed's in current OSes are POSIX-compliant.)


Using system2() rather than system() is recommended for new code.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "Best" way to merge 300+ .5MB dataframes?

2014-08-11 Thread David Winsemius

On Aug 11, 2014, at 8:01 PM, John McKown wrote:

> On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams  wrote:
>> Grant,
>> 
>> Assuming all your filenames are something like file1.txt,
>> file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
>> the directory where your files are located...
>> 
>> This will strip off the 1st lines, that is, your header lines:
>> 
>> for file in *.txt;do
>> sed -i '1d'${file};
>> done
>> 
>> Then, do this:
>> 
>> cat *.txt > newfilename.txt
>> 
>> Doing both should only take a few seconds, depending on your file sizes.
>> 
>> Cheers!
>> Tom
>> 
> 
> Using sed hadn't occurred to me. I guess I'm just "awk-ward" .
> A slightly different way would be:
> 
> for file in *.txt;do
>  sed '1d' ${file}
> done >newfilename.txt
> 
> that way the original files are not modified.  But it strips out the
> header on the 1st file as well. Not a big deal, but the read.table
> will need to be changed to accommodate that. Also, it creates an
> otherwise unnecessary intermediate file "newfilename.txt". To get the
> 1st file's header, the script could:
> 
> head -1 >newfilename.txt
> for file in *.txt;do
>   sed '1d' ${file}
> done >>newfilename.txt
> 
> I really like having multiple answers to a given problem. Especially
> since I have a poorly implemented version of "awk" on one of my
> systems. It is the vendor's "awk" and conforms exactly to the POSIX
> definition with no additions. So I don't have the FNR built-in
> variable. Your implementation would work well on that system. Well, if
> there were a version of R for it. It is a branded UNIX system which
> was designed to be totally __and only__ POSIX compliant, with few
> (maybe no) extensions at all. IOW, it stinks. No, it can't be
> replaced. It is the z/OS system from IBM which is EBCDIC based and
> runs on the "big iron" mainframe, system z.
> 
> -- 

On the Mac the awk equivalent is gawk. Within R you would use `system()` 
possibly using paste0() to construct a string to send.

-- 



David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "Best" way to merge 300+ .5MB dataframes?

2014-08-11 Thread John McKown
On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams  wrote:
> Grant,
>
> Assuming all your filenames are something like file1.txt,
> file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
> the directory where your files are located...
>
> This will strip off the 1st lines, that is, your header lines:
>
> for file in *.txt;do
> sed -i '1d'${file};
> done
>
> Then, do this:
>
> cat *.txt > newfilename.txt
>
> Doing both should only take a few seconds, depending on your file sizes.
>
> Cheers!
> Tom
>

Using sed hadn't occurred to me. I guess I'm just "awk-ward" .
A slightly different way would be:

for file in *.txt;do
  sed '1d' ${file}
done >newfilename.txt

that way the original files are not modified.  But it strips out the
header on the 1st file as well. Not a big deal, but the read.table
will need to be changed to accommodate that. Also, it creates an
otherwise unnecessary intermediate file "newfilename.txt". To get the
1st file's header, the script could:

head -1 >newfilename.txt
for file in *.txt;do
   sed '1d' ${file}
done >>newfilename.txt

I really like having multiple answers to a given problem. Especially
since I have a poorly implemented version of "awk" on one of my
systems. It is the vendor's "awk" and conforms exactly to the POSIX
definition with no additions. So I don't have the FNR built-in
variable. Your implementation would work well on that system. Well, if
there were a version of R for it. It is a branded UNIX system which
was designed to be totally __and only__ POSIX compliant, with few
(maybe no) extensions at all. IOW, it stinks. No, it can't be
replaced. It is the z/OS system from IBM which is EBCDIC based and
runs on the "big iron" mainframe, system z.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "Best" way to merge 300+ .5MB dataframes?

2014-08-11 Thread Thomas Adams
Grant,

Assuming all your filenames are something like file1.txt,
file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
the directory where your files are located...

This will strip off the 1st lines, that is, your header lines:

for file in *.txt;do
sed -i '1d'${file};
done

Then, do this:

cat *.txt > newfilename.txt

Doing both should only take a few seconds, depending on your file sizes.

Cheers!
Tom



On Mon, Aug 11, 2014 at 12:01 PM, Grant Rettke 
wrote:

> On Sun, Aug 10, 2014 at 6:50 PM, John McKown
>  wrote:
>
> > OK, I assume this results in a vector of file names in a variable,
> > like you'd get from list.files();
>
> Yes.
>
> > Why? Do you need them in separate data frames?
>
> I do not.
>
> > The meat of the question. If you don't need the files in separate data
> > frames, and the files do _NOT_ have headers, then I would just load
> > them all into a single frame. I used Linux and so my solution may not
> > work on Windows. Something like:
>
> Excellent point. All of the files do have the same header. I'm on OSX
> so there must be a nice
> one liner to concatenate all of the individual files, dropping the
> first line for all but the first.  Danke!
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "Best" way to merge 300+ .5MB dataframes?

2014-08-11 Thread Grant Rettke
On Sun, Aug 10, 2014 at 6:50 PM, John McKown
 wrote:

> OK, I assume this results in a vector of file names in a variable,
> like you'd get from list.files();

Yes.

> Why? Do you need them in separate data frames?

I do not.

> The meat of the question. If you don't need the files in separate data
> frames, and the files do _NOT_ have headers, then I would just load
> them all into a single frame. I used Linux and so my solution may not
> work on Windows. Something like:

Excellent point. All of the files do have the same header. I'm on OSX
so there must be a nice
one liner to concatenate all of the individual files, dropping the
first line for all but the first.  Danke!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading chunks of data from a file more efficiently

2014-08-11 Thread peter salzman
Scott,

there is a package called ff that '... provides data structures that are
stored on disk but behave (almost) as if they were in RAM ...'

i hope it helps
peter


On Sat, Aug 9, 2014 at 6:31 PM, Waichler, Scott R 
wrote:

> Hi,
>
> I have some very large (~1.1 GB) output files from a groundwater model
> called STOMP that I want to read as efficiently as possible.  For each
> variable there are over 1 million values to read.  Variables are not
> organized in columns; instead they are written out in sections in the file,
> like this:
>
> X-Direction Node Positions, m
>  5.93145E+05  5.93155E+05  5.93165E+05  5.93175E+05
>  5.93245E+05  5.93255E+05  5.93265E+05  5.93275E+05
> . . .
>  5.94695E+05  5.94705E+05  5.94715E+05  5.94725E+05
>  5.94795E+05  5.94805E+05  5.94815E+05  5.94825E+05
>
> Y-Direction Node Positions, m
>  1.14805E+05  1.14805E+05  1.14805E+05  1.14805E+05
>  1.14805E+05  1.14805E+05  1.14805E+05  1.14805E+05
> . . .
>  1.17195E+05  1.17195E+05  1.17195E+05  1.17195E+05
>  1.17195E+05  1.17195E+05  1.17195E+05  1.17195E+05
>
> Z-Direction Node Positions, m
>  9.55000E+01  9.55000E+01  9.55000E+01  9.55000E+01
>  9.55000E+01  9.55000E+01  9.55000E+01  9.55000E+01
> . . .
>
> I want to read and use only a subset of the variables.  I wrote the
> function below to find the line where each target variable begins and then
> scan the values, but it still seems rather slow, perhaps because I am
> opening and closing the file for each variable.  Can anyone suggest a
> faster way?
>
> # Reads original STOMP plot file (plot.*) directly.  Should be useful when
> the plot files are
> # very large with lots of variables, and you just want to retrieve a few
> of them.
> # Arguments:  1) plot filename, 2) number of nodes,
> # 3) character vector of names of target variables you want to return.
> # Returns a list with the selected plot output.
> READ.PLOT.OUTPUT6 <- function(plt.file, num.nodes, var.names) {
>   lines <- readLines(plt.file)
>   num.vars <- length(var.names)
>   tmp <- list()
>   for(i in 1:num.vars) {
> ind <- grep(var.names[i], lines, fixed=T, useBytes=T)
> if(length(ind) != 1) stop("Not one line in the plot file with matching
> variable name.\n")
> tmp[[i]] <- scan(plt.file, skip=ind, nmax=num.nodes, quiet=T)
>   }
>   return(tmp)
> }  # end READ.PLOT.OUTPUT6()
>
> Regards,
> Scott Waichler
> Pacific Northwest National Laboratory
> Richland, WA, USA
> scott.waich...@pnnl.gov
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Peter Salzman, PhD
Department of Biostatistics and Computational Biology
University of Rochester

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superimposing graphs

2014-08-11 Thread Duncan Mackay
Hi

If you want a 1 package and 1 function approach try this

xyplot(conc ~ time | factor(subject, levels = c(2,1,3)), data = data.d,
par.settings = list(strip.background = list(col = "transparent")),
layout = c(3,1),
aspect = 1,
type   = c("b","g"),
scales = list(alternating = FALSE),
panel = function(x,y,...){

  panel.xyplot(x,y,...)

  # f1<-function(x,v,cl,t)
  # (x/v)*exp(-(cl/v)*t) f1(0.5,0.5,0.06,t),
  panel.curve((0.5/0.5)*exp(-(0.06/0.5)*x),0,30)

}
 )

# par.settings ... if you are publishing show text better
# with factor if you want 1:3 omit the levels
# has advantage of doing more things than in groupedData as Doug Bates has
said

Regards

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Naser Jamil
Sent: Monday, 11 August 2014 19:06
To: R help
Subject: [R] Superimposing graphs

Dear R-user,
May I seek your help to sort out a little problem. I have the following
codes
to draw two graphs. I want to superimpose the second one on each of the
first one.



library(nlme)
subject<-c(1,1,1,2,2,2,3,3,3)
time<-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0)
con.cohort<-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282,
0.10593292,1.20808375,0.47638394,0.02808967)

data.d=data.frame(subject=subject,time=time,conc=con.cohort)
grouped.data<-groupedData(formula=conc~time | subject, data =data.d)

plot(grouped.data)

##

f1<-function(x,v,cl,t) {
(x/v)*exp(-(cl/v)*t)
  }
t<-seq(0,30, .01)
plot(t,f1(0.5,0.5,0.06,t),type="l",pch=18, ylim=c(), xlab="time",
ylab="conc")


###

Any suggestion will really be helpful.


Regards,

Jamil.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient way to replace a range of numeric with a integer in a matrix

2014-08-11 Thread David Winsemius

On Aug 11, 2014, at 3:27 PM, Jinsong Zhao wrote:

> On 2014/8/11 14:50, William Dunlap wrote:
>> You can use
>> m[m > 0 & m <= 1.0] <- 1
>> m[m > 1 ] <- 2
>> or, if you have lots of intervals, something based on findInterval().  E.g.,
>> m[] <- findInterval(m, c(-Inf, 0, 1, Inf)) - 1
>> 

OR, if you have irregularly spaced intervals or particular values to match to 
the intervals,  you can use findInterval to define categories and select with 
"[":

> set.seed(42); m <- matrix( rnorm(100, 10, 5), 10)
> round( m, 2)
   [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,] 16.85 16.52  8.47 12.28 11.03 11.61  8.16  4.78 17.56 16.96
 [2,]  7.18 21.43  1.09 13.52  8.19  6.08 10.93  9.55 11.29  7.62
 [3,] 11.82  3.06  9.14 15.18 13.79 17.88 12.91 13.12 10.44 13.25
 [4,] 13.16  8.61 16.07  6.96  6.37 13.21 17.00  5.23  9.40 16.96
 [5,] 12.02  9.33 19.48 12.52  3.16 10.45  6.36  7.29  4.03  4.45
 [6,]  9.47 13.18  7.85  1.41 12.16 11.38 16.51 12.90 13.06  5.70
 [7,] 17.56  8.58  8.71  6.08  5.94 13.40 11.68 13.84  8.91  4.34
 [8,]  9.53 -3.28  1.18  5.75 17.22 10.45 15.19 12.32  9.09  2.70
 [9,] 20.09 -2.20 12.30 -2.07  7.84 -4.97 14.60  5.57 14.67 10.40
[10,]  9.69 16.60  6.80 10.18 13.28 11.42 13.60  4.50 14.11 13.27

> m[] <- c(1,2,4,8,16, 32) [ findInterval(m, c(-Inf, 2, 5, 10, 15, 18, Inf) ) ]
> m
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   16   16488842   1616
 [2,]4   321844848 4
 [3,]824   168   16888 8
 [4,]84   16448   164416
 [5,]84   32828442 2
 [6,]484188   1688 4
 [7,]   1644448884 2
 [8,]4114   168   1684 2
 [9,]   3218141848 8
[10,]4   164888828 8

-- 
David.


>> (What do you want to do with non-positive numbers?)
>> 
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
> 
> Thank you very much.
> 
> I think findInterval() is what I want.
> 
> Regards,
> Jinsong
> 
>> 
>> 
>> On Mon, Aug 11, 2014 at 2:40 PM, Jinsong Zhao  wrote:
>>> Hi there,
>>> 
>>> I hope to replace a range of numeric in a matrix with a integer. For
>>> example, in the following matrix, I want to use 1 to replace the elements
>>> range from 0.0 to 1.0, and all larger than 1. with 2.
>>> 
 (m <- matrix(runif(16, 0, 2), nrow = 4))
>>>   [,1]   [,2]  [,3] [,4]
>>> [1,] 0.7115088 0.55370418 0.1586146 1.882931
>>> [2,] 0.9068198 0.38081423 0.9172629 1.713592
>>> [3,] 1.5210150 0.93900649 1.2609942 1.744456
>>> [4,] 0.3779058 0.03130103 0.1893477 1.601181
>>> 
>>> so I want to get something like:
>>> 
>>>  [,1] [,2] [,3] [,4]
>>> [1,]1112
>>> [2,]1112
>>> [3,]2122
>>> [4,]1112
>>> 
>>> I wrote a function to do such thing:
>>> 
>>> fun <- function(x) {
>>> if (is.na(x)) {
>>> NA
>>> } else if (x > 0.0 && x <= 1.0) {
>>> 1
>>> } else if (x > 1.0) {
>>> 2
>>> } else {
>>> x
>>> }
>>> }
>>> 
>>> Then run it as:
>>> 
 apply(m,2,function(i) sapply(i, fun))
>>> 
>>> However, it seems that this method is not efficient when the dimension is
>>> large, e.g., 5000x5000 matrix.
>>> 
>>> Any suggestions? Thanks in advance!
>>> 
>>> Best regards,
>>> Jinsong

> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient way to replace a range of numeric with a integer in a matrix

2014-08-11 Thread Jinsong Zhao

On 2014/8/11 14:50, William Dunlap wrote:

You can use
 m[m > 0 & m <= 1.0] <- 1
 m[m > 1 ] <- 2
or, if you have lots of intervals, something based on findInterval().  E.g.,
 m[] <- findInterval(m, c(-Inf, 0, 1, Inf)) - 1

(What do you want to do with non-positive numbers?)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


Thank you very much.

I think findInterval() is what I want.

Regards,
Jinsong




On Mon, Aug 11, 2014 at 2:40 PM, Jinsong Zhao  wrote:

Hi there,

I hope to replace a range of numeric in a matrix with a integer. For
example, in the following matrix, I want to use 1 to replace the elements
range from 0.0 to 1.0, and all larger than 1. with 2.


(m <- matrix(runif(16, 0, 2), nrow = 4))

   [,1]   [,2]  [,3] [,4]
[1,] 0.7115088 0.55370418 0.1586146 1.882931
[2,] 0.9068198 0.38081423 0.9172629 1.713592
[3,] 1.5210150 0.93900649 1.2609942 1.744456
[4,] 0.3779058 0.03130103 0.1893477 1.601181

so I want to get something like:

  [,1] [,2] [,3] [,4]
[1,]1112
[2,]1112
[3,]2122
[4,]1112

I wrote a function to do such thing:

fun <- function(x) {
 if (is.na(x)) {
 NA
 } else if (x > 0.0 && x <= 1.0) {
 1
 } else if (x > 1.0) {
 2
 } else {
 x
 }
}

Then run it as:


apply(m,2,function(i) sapply(i, fun))


However, it seems that this method is not efficient when the dimension is
large, e.g., 5000x5000 matrix.

Any suggestions? Thanks in advance!

Best regards,
Jinsong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient way to replace a range of numeric with a integer in a matrix

2014-08-11 Thread William Dunlap
You can use
m[m > 0 & m <= 1.0] <- 1
m[m > 1 ] <- 2
or, if you have lots of intervals, something based on findInterval().  E.g.,
m[] <- findInterval(m, c(-Inf, 0, 1, Inf)) - 1

(What do you want to do with non-positive numbers?)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Mon, Aug 11, 2014 at 2:40 PM, Jinsong Zhao  wrote:
> Hi there,
>
> I hope to replace a range of numeric in a matrix with a integer. For
> example, in the following matrix, I want to use 1 to replace the elements
> range from 0.0 to 1.0, and all larger than 1. with 2.
>
>> (m <- matrix(runif(16, 0, 2), nrow = 4))
>   [,1]   [,2]  [,3] [,4]
> [1,] 0.7115088 0.55370418 0.1586146 1.882931
> [2,] 0.9068198 0.38081423 0.9172629 1.713592
> [3,] 1.5210150 0.93900649 1.2609942 1.744456
> [4,] 0.3779058 0.03130103 0.1893477 1.601181
>
> so I want to get something like:
>
>  [,1] [,2] [,3] [,4]
> [1,]1112
> [2,]1112
> [3,]2122
> [4,]1112
>
> I wrote a function to do such thing:
>
> fun <- function(x) {
> if (is.na(x)) {
> NA
> } else if (x > 0.0 && x <= 1.0) {
> 1
> } else if (x > 1.0) {
> 2
> } else {
> x
> }
> }
>
> Then run it as:
>
>> apply(m,2,function(i) sapply(i, fun))
>
> However, it seems that this method is not efficient when the dimension is
> large, e.g., 5000x5000 matrix.
>
> Any suggestions? Thanks in advance!
>
> Best regards,
> Jinsong
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficient way to replace a range of numeric with a integer in a matrix

2014-08-11 Thread Richard M. Heiberger
(m>1)+1

On Mon, Aug 11, 2014 at 5:40 PM, Jinsong Zhao  wrote:
> Hi there,
>
> I hope to replace a range of numeric in a matrix with a integer. For
> example, in the following matrix, I want to use 1 to replace the elements
> range from 0.0 to 1.0, and all larger than 1. with 2.
>
>> (m <- matrix(runif(16, 0, 2), nrow = 4))
>   [,1]   [,2]  [,3] [,4]
> [1,] 0.7115088 0.55370418 0.1586146 1.882931
> [2,] 0.9068198 0.38081423 0.9172629 1.713592
> [3,] 1.5210150 0.93900649 1.2609942 1.744456
> [4,] 0.3779058 0.03130103 0.1893477 1.601181
>
> so I want to get something like:
>
>  [,1] [,2] [,3] [,4]
> [1,]1112
> [2,]1112
> [3,]2122
> [4,]1112
>
> I wrote a function to do such thing:
>
> fun <- function(x) {
> if (is.na(x)) {
> NA
> } else if (x > 0.0 && x <= 1.0) {
> 1
> } else if (x > 1.0) {
> 2
> } else {
> x
> }
> }
>
> Then run it as:
>
>> apply(m,2,function(i) sapply(i, fun))
>
> However, it seems that this method is not efficient when the dimension is
> large, e.g., 5000x5000 matrix.
>
> Any suggestions? Thanks in advance!
>
> Best regards,
> Jinsong
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] efficient way to replace a range of numeric with a integer in a matrix

2014-08-11 Thread Jinsong Zhao

Hi there,

I hope to replace a range of numeric in a matrix with a integer. For 
example, in the following matrix, I want to use 1 to replace the 
elements range from 0.0 to 1.0, and all larger than 1. with 2.


> (m <- matrix(runif(16, 0, 2), nrow = 4))
  [,1]   [,2]  [,3] [,4]
[1,] 0.7115088 0.55370418 0.1586146 1.882931
[2,] 0.9068198 0.38081423 0.9172629 1.713592
[3,] 1.5210150 0.93900649 1.2609942 1.744456
[4,] 0.3779058 0.03130103 0.1893477 1.601181

so I want to get something like:

 [,1] [,2] [,3] [,4]
[1,]1112
[2,]1112
[3,]2122
[4,]1112

I wrote a function to do such thing:

fun <- function(x) {
if (is.na(x)) {
NA
} else if (x > 0.0 && x <= 1.0) {
1
} else if (x > 1.0) {
2
} else {
x
}
}

Then run it as:

> apply(m,2,function(i) sapply(i, fun))

However, it seems that this method is not efficient when the dimension 
is large, e.g., 5000x5000 matrix.


Any suggestions? Thanks in advance!

Best regards,
Jinsong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to process multiple data files using R loop

2014-08-11 Thread Greg Snow
In addition to the solution and comments that you have already
received, here are a couple of additional comments:

This is a variant on FAQ 7.21, if you had found that FAQ then it would
have told you about the get function.

The most important part of the answer in FAQ 7.21 is the last part
where it says that it is better to use a list.  If all the objects of
interest are related and you want to do the same or similar things to
each one, then having them all stored in a single list can simplify
things for the future.  You can collect all the objects into a single
list using the mget command, e.g.:

P_objects <- mget( ls(pattern='P_'))

Now that they are in a list you can do the equivalent of your loop,
but simpler with the lapply function, e.g.:

lapply( P_objects, head, 2 )

And if you want to do other things with all these objects, such as
save them, plot them, do a regression analysis on them, delete them,
etc. then you can do that using lapply/sapply as well in a simpler way
than looping.


On Fri, Aug 8, 2014 at 12:25 PM, Fix Ace  wrote:
> I have 16 files and would like to check the information of their first two 
> lines, what I did:
>
>
>> ls(pattern="P_")
>  [1] "P_3_utr_source_data"   "P_5_utr_source_data"
>  [3] "P_exon_per_gene_cds_source_data"   "P_exon_per_gene_source_data"
>  [5] "P_exon_source_data""P_first_exon_oncds_source_data"
>  [7] "P_first_intron_oncds_source_data"  "P_first_intron_ongene_source_data"
>  [9] "P_firt_exon_ongene_source_data""P_gene_cds_source_data"
> [11] "P_gene_source_data""P_intron_source_data"
> [13] "P_last_exon_oncds_source_data" "P_last_exon_ongene_source_data"
> [15] "P_last_intron_oncds_source_data"   "P_last_intron_ongene_source_data"
>
>
>
>>for(i in ls(pattern="P_")){head(i, 2)}
>
> It obviously does not work since nothing came out
>
> What I would like to see for the output is :
>
>> head(P_3_utr_source_data,2)
>   V1
> 1  1
> 2  1
>> head(P_5_utr_source_data,2)
>   V1
> 1  1
> 2  1
>>
> .
>
> .
> .
>
>
>
> Could anybody help me with this?
>
> Thank you very much for your time:)
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loops with assign() and get()

2014-08-11 Thread William Dunlap
That code will not work.  get() and assign() are troublesome for a
variety of reasons.  E.g.,

* adding made-up names to the current environment is dangerous.  They
may clobber fixed names in the environment.  You may be confused about
what the current environment is (especially when refactoring code).
You can avoid this by using dataEnv <- new.env() to make an
environment for your related objects and using the envir=dataEnv
argument to get() and assign() to put the objects in there.  However,
once you go this route, you may as well use the syntax dataEnv[[name]]
to refer to your objects instead of get(name, envir=dataEnv) and
assign(name, value, envir=dataEnv).

* replacement syntax like
names(get(someName)) <- c("One", "Two")
will not work.  You have to use kludgy code like
tmp <- get(someName)
names(tmp) <- c("One", "Two")
assign(someName, tmp)
If you use the dataEnv[[name]] syntax then you can use the more normal looking
names(dataEnv[[name]]) <- c("One", "Two")

By the way, I do not think your suggested code will work - you call
assign() before making a bunch of changes to dfi instead of after
making the changes.
I have not measured the memory implications of your method vs. using
lapply on lists, but I don't think there is much of a difference in
this case.  (There can be a big difference when you are replacing the
inputs by the outputs.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sun, Aug 10, 2014 at 8:22 PM, PO SU  wrote:
>
> It's a great method, but there is  a memory problem, DFS would occupy a
> large memory. So from this point of view, i prefer the loop.
>
>>> for (i in 1 : nrow(unique)){
>>> tmp=get(past0("DF",i))[1,]
>>> assign(paste0("df",i),tmp)
>>> dfi=dfi[,1:3]
>>> names(dfi)=names(tmp[c(1,4,5)])
>>> dfi=rbind(dfi,tmp[c(1,4,5)])
>>> names(dfi)=c("UID","Date","Location")
>>>}
>
> NB: The code above  without any test!
>
>
>
> --
> PO SU
> mail: desolato...@163.com
> Majored in Statistics from SJTU
>
>
> At 2014-08-10 06:32:38, "William Dunlap"  wrote:
>>> I was able to create 102 distinct dataframes (DFs1, DFs2, DFs3, etc)
>>> using
>>> the assign() in a loop.
>>
>>The first step to making things easier to do is to put those data.frames
>>into a list.  I'll call it DFS and your data.frames will now be DFs[[1]],
>>DFs[[2]], ..., DFs[[length(DFs)]].
>>DFs <- lapply(paste0("DFs", 1:102), get)
>>In the future, I think it would be easier if you skipped the 'assign()'
>>and just put the data into a list from the start.
>>
>>Now use lapply to process that list, creating a new list called 'df', where
>>df[[i]] is the result of processing DFs[[i]]:
>>
>>df <- lapply(DFs, FUN=function(DFsi) {
>>  # your code from the for loop you supplied
>>  dfi=DFsi[1,]
>>  dfi=dfi[,1:3]
>>  names(dfi)=names(DFsi[c(1,4,5)])
>>  dfi=rbind(dfi,DFsi[c(1,4,5)])
>>  names(dfi)=c("UID","Date","Location")
>>  dfi # return this to put in list that lapply is
>> making
>>  })
>>
>>(You didn't supply sample data so I did not run this - there may be typos.)
>>
>>Bill Dunlap
>>TIBCO Software
>>wdunlap tibco.com
>>
>>
>>On Sat, Aug 9, 2014 at 1:39 PM, Laura Villegas Ortiz 
>> wrote:
>>> Dear all,
>>>
>>> I was able to create 102 distinct dataframes (DFs1, DFs2, DFs3, etc)
>>> using
>>> the assign() in a loop.
>>>
>>> Now, I would like to perform the following transformation for each one of
>>> these dataframes:
>>>
>>> df1=DFs1[1,]
>>> df1=df1[,1:3]
>>> names(df1)=names(DFs1[c(1,4,5)])
>>> df1=rbind(df1,DFs1[c(1,4,5)])
>>> names(df1)=c("UID","Date","Location")
>>>
>>> something like this:
>>>
>>> for (i in 1 : nrow(unique)){
>>>
>>> dfi=DFsi[1,]
>>> dfi=dfi[,1:3]
>>> names(dfi)=names(DFsi[c(1,4,5)])
>>> dfi=rbind(dfi,DFsi[c(1,4,5)])
>>> names(dfi)=c("UID","Date","Location")
>>>
>>> }
>>>
>>> I thought it could be straightforward but has proven the opposite
>>>
>>> Many thanks
>>>
>>> Laura
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>__
>>R-help@r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superimposing graphs

2014-08-11 Thread Rmh
whoops

P1<- plot(grouped.data)

Sent from my iPhone

> On Aug 11, 2014, at 5:06, Naser Jamil  wrote:
> 
> Dear R-user,
> May I seek your help to sort out a little problem. I have the following
> codes
> to draw two graphs. I want to superimpose the second one on each of the
> first one.
> 
> 
> 
> library(nlme)
> subject<-c(1,1,1,2,2,2,3,3,3)
> time<-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0)
> con.cohort<-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282,
> 0.10593292,1.20808375,0.47638394,0.02808967)
> 
> data.d=data.frame(subject=subject,time=time,conc=con.cohort)
> grouped.data<-groupedData(formula=conc~time | subject, data =data.d)
> 
> plot(grouped.data)
> 
> ##
> 
> f1<-function(x,v,cl,t) {
> (x/v)*exp(-(cl/v)*t)
>  }
> t<-seq(0,30, .01)
> plot(t,f1(0.5,0.5,0.06,t),type="l",pch=18, ylim=c(), xlab="time",
> ylab="conc")
> 
> 
> ###
> 
> Any suggestion will really be helpful.
> 
> 
> Regards,
> 
> Jamil.
> 
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Just stumbled across this: Advanced R programming text & code - from Hadley

2014-08-11 Thread PO SU


The book is absolutely helpful to me. Any R newusers should read the book. Now 
I am reading the Section Rcpp.



--

PO SU
mail: desolato...@163.com
Majored in Statistics from SJTU



At 2014-08-11 09:02:53, "Mitchell Maltenfort"  wrote:
>Ah, what do you know anyway? -- as the book critic said to the author.
>
>Ersatzistician and Chutzpahthologist
>
>I can answer any question.  "I don't know" is an answer. "I don't know
>yet" is a better answer.
>
>"I can write better than anybody who can write faster, and I can write
>faster than anybody who can write better" AJ Leibling
>
>
>On Mon, Aug 11, 2014 at 8:38 AM, Hadley Wickham  wrote:
>> Or just go to http://adv-r.had.co.nz/ ...
>>
>> Hadley
>>
>> On Sun, Aug 10, 2014 at 9:34 PM, John McKown
>>  wrote:
>>> Well, it says that it's from Hadley Wickham.
>>>
>>> https://github.com/hadley/adv-r
>>> 
>>>
>>> This is code and text behind the Advanced R programming book.
>>>
>>> The site is built using jekyll, with a custom plugin to render .rmd
>>> files with knitr and pandoc. To create the site, you need:
>>>
>>> jekyll and s3_websiter gems: gem install jekyll s3_website
>>> pandoc
>>> knitr: install.packages("knitr")
>>>
>>> 
>>>
>>> This contains a Rstudio project file. I know because I've done a git
>>> clone on it and loaded it into Rstudio, on Linux. If you don't have
>>> git, there is a "download zip" option on the site too.
>>>
>>> --
>>> There is nothing more pleasant than traveling and meeting new people!
>>> Genghis Khan
>>>
>>> Maranatha! <><
>>> John McKown
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> http://had.co.nz/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with assignment 1 part 1

2014-08-11 Thread John McKown
On Mon, Aug 11, 2014 at 9:00 AM, michelle maurin
 wrote:
> see code below
>
>
> pollutantmean <- function(directory, pollutant, id = 1:332) {
>   files_list <- list.files(directory, full.names=TRUE) #creates a list of
> files
>   dat <- data.frame()#creates an empty data frame
>   for (i in 1:332) {
>   dat <- rbind(dat, read.csv(files_list[i]))#loops through the files,
> rbinding them together
>   }
>
> #subsets the rows that match the 'pollutant' argument
>   median(dat_subset$pollutant, na.rm=TRUE) #identifies the median of the
> subset
> }
>
>
>
> ##I highlighted the area that I think has the problem , I helped my self
> using the tutorial found on the forum ,for assignment 1

I really think your not where you believe you are. This is an email
list for general questions on the R language. I am not aware of an
"the tutorial found on the forum". But I do think that I have an idea
of what your problem is. Basically you want to find all the rows in
"dat" which have a pollutant (dat$pollutant) of either "sulfate" or
"nitrate". The which() function isn't going to do that for you. The
which() function takes a logical vector of TRUE and FALSE values. It
return an integer vector which has the index values of the TRUE
entries. For example:
> which(c(TRUE,FALSE,FALSE,TRUE,FALSE,TRUE))
[1] 1 4 6
I realise how this can be thought of as how to do this. And if could
work, but is unnecessary in this case. But the real problem is the
segment:
dat["suflate","nitrate"] == pollutant

If you would try this (I can't because I don't have the data files),
you would see that this is not asking the right question. You want to
see if dat$pollutant is either "suflate" or "nitrate". Or, expanding a
bit you want to ask: 'is dat$pollutant equal to "suflate"? If not, is
it equal to 'nitrate"?'. The answer to this question will be the
proper logical vector that you can either use in the which() function,
or directly as a row selector. The hint on how to ask this question is
to use the ifelse() function properly.

So your line (with the critical method of the proper use of ifelse)
should look something like:

dat_subset <- dat[which(ifelse(),];
#or, equivalently
dat_subset <- dat[ifelse(???),];

This latter is valid because the R language will accept a logical
vector as a "selector" and only return the data where the logical
value is TRUE.

I am deliberately leaving the challenge of how to use the ifelse() for
you. Remember, from the documentation, that the form of the ifelse()
is: ifelse(condition,result-if-condition-true,result-if-condition-false.)

Hopefully this is a sufficient clew to get you going.

I won't comment on the rest of the code because I don't know the
problem. Or what "forum" you're talking about.

>
>  Best regards
>
>
>
> Michelle
>
>
>
> 
> Date: Sun, 10 Aug 2014 22:06:38 -0500
> Subject: Re: [R] Problem with assignment 1 part 1
> From: john.archie.mck...@gmail.com
> To: michimau...@hotmail.com
> CC: r-help@r-project.org
>
> What code.
>
> Also, the forum has a "no homework" policy. Your subject implies this is
> homework, so you might not get any answers. You might get a hint or two
> though.
>
> On Aug 10, 2014 10:00 PM, "michelle maurin"  wrote:
>
> I think my code is very close I can seem to be able to debug it Might be
> something very simple I know the problem is on the last 3 lines of code can
> you please help?
> Thanks
> Michelle
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superimposing graphs

2014-08-11 Thread Richard M. Heiberger
I think this is what you are looking for.

library(latticeExtra)
t.tmp <-seq(0,30, .01)
P1 + layer(panel.xyplot(y=f1(0.5,0.5,0.06, t.tmp), x=t.tmp, type="l",
col="black"))

Notice that t is a very bad name for your variable as it is the name
of a function.
I used t.tmp instead.

Rich


On Mon, Aug 11, 2014 at 5:06 AM, Naser Jamil  wrote:
> Dear R-user,
> May I seek your help to sort out a little problem. I have the following
> codes
> to draw two graphs. I want to superimpose the second one on each of the
> first one.
>
> 
>
> library(nlme)
> subject<-c(1,1,1,2,2,2,3,3,3)
> time<-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0)
> con.cohort<-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282,
> 0.10593292,1.20808375,0.47638394,0.02808967)
>
> data.d=data.frame(subject=subject,time=time,conc=con.cohort)
> grouped.data<-groupedData(formula=conc~time | subject, data =data.d)
>
> plot(grouped.data)
>
> ##
>
> f1<-function(x,v,cl,t) {
> (x/v)*exp(-(cl/v)*t)
>   }
> t<-seq(0,30, .01)
> plot(t,f1(0.5,0.5,0.06,t),type="l",pch=18, ylim=c(), xlab="time",
> ylab="conc")
>
>
> ###
>
> Any suggestion will really be helpful.
>
>
> Regards,
>
> Jamil.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] building a BIGLM model from three tables (related)

2014-08-11 Thread Farewell, Timothy
Hi all,

I wonder if you can help me.

THE PROBLEM

I want to train and test a GLM with some large datasets. I am running into some 
problems when I flatten my tables together to feed into the GLM model, as it 
produces a very large table which is far too big for the memory on my computer.

THREE TABLES - Pipes, Weekly Weather data, Bursts

I have three tables, which are all related to each other.


(1)Pipe cohorts (114,000 rows) with a range of explanatory variables.   
((1) Linking fields: (A) Pipe cohort ID, (B) weathercell_ID)

(2)Explanatory Weekly Weather data 12 years (e.g. 624 weeks for each pipe 
cohort) ((2) Linking fields:  (C) week,  (B) weathercell_ID)

(3)Bursts (40,000 bursts)   
 ((3) Linking 
fields: (A) Pipe cohort ID,  (C) week)

Effectively, the combination of tables (1) and (2) make the population.  Table 
(3) are the events, or failures.

JOINING THE THREE TABLES

I have previously had far fewer pipe cohort rows.

What I have been doing till now is joining the (1) pipe cohorts data  to the 
(2) weekly weather data.
This repeats the pipe cohort data, each week, for the 12 years, which, now, 
makes a very long table e.g. 624 x 114,000 rows =  71 million rows.
I would then join the (3) burst data to that to see how many bursts there were 
that week, on that pipe cohort.
This made a large, flat file, which I could feed into GLM.

This worked ok when there are not so many pipe cohorts, but now there are 
114,000 rows, when I join the data tables I produce a MASSIVE table (many, many 
GB) which kills my computers.

RELATIONAL DATABASE APPROACH?

I am thinking it would be better to have a relational database structure where, 
for each data point (row) being brought into the BIGLM model, it take the three 
tables and looks up the appropriate values each time, using the defined join 
fields (A, B +C), feeds that into the model, then goes back and looks up the 
next point.

ADVICE?

How would you approach this problem?

I have the data prepared in the three tables.
I need to fit lots of models to see which variables give me the best AIC 
(output: lots of model fits)
Then predict bursts using the best model and the available (1) pipe and (2) 
weather data

Would you use the package BIGLM, linking to a sqlite database? (Or do something 
completely different?)

Many thanks,

Tim



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [vegan]Envfit, pvalues and ggplot2

2014-08-11 Thread Tim Richter-Heitmann
Good Morning,

first let me thank you very much for answering my first two questions on 
this list.

Currently, i do vegan's EnvFit to simple PCA ordinations. When drawing 
the biplot, one can set a cutoff to just fit the parameters with 
significant p-values (via p.max=0.05 in the plot command).

There is already sufficient coverage on the net for biplotting this kind 
of data with ggplot2 (with the problem being the arrow length).
http://stackoverflow.com/questions/14711470/plotting-envfit-vectors-vegan-package-in-ggplot2

However, what the solution is not covering is the exclusion from 
insignificant environmental parameters, as the score extraction process 
as described in the link only works with 'display="vectors"':

|Envfit_scores<-  as.data.frame(scores(list_from_envfit,  display=  "vectors"))

|

envfit creates lists like this:

 PC1PC2r2Pr(>r)
param1-0.708820.705390.09940.000999***
param2-0.601220.799080.05930.000999***

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P values based on 999 permutations.

The list contains a vector called $pval containing the pvalues.

So, i need to reduce the list created by envfit to rows meeting a 
criterion in $pval (via "unlist" and "which", i suppose). However, i 
have difficulties to work out the correct code.


Any help is much appreciated!

-- 
Tim Richter-Heitmann (M.Sc.)
PhD Candidate



International Max-Planck Research School for Marine Microbiology
University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Superimposing graphs

2014-08-11 Thread Naser Jamil
Dear R-user,
May I seek your help to sort out a little problem. I have the following
codes
to draw two graphs. I want to superimpose the second one on each of the
first one.



library(nlme)
subject<-c(1,1,1,2,2,2,3,3,3)
time<-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0)
con.cohort<-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282,
0.10593292,1.20808375,0.47638394,0.02808967)

data.d=data.frame(subject=subject,time=time,conc=con.cohort)
grouped.data<-groupedData(formula=conc~time | subject, data =data.d)

plot(grouped.data)

##

f1<-function(x,v,cl,t) {
(x/v)*exp(-(cl/v)*t)
  }
t<-seq(0,30, .01)
plot(t,f1(0.5,0.5,0.06,t),type="l",pch=18, ylim=c(), xlab="time",
ylab="conc")


###

Any suggestion will really be helpful.


Regards,

Jamil.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Just stumbled across this: Advanced R programming text & code - from Hadley

2014-08-11 Thread Mitchell Maltenfort
Ah, what do you know anyway? -- as the book critic said to the author.

Ersatzistician and Chutzpahthologist

I can answer any question.  "I don't know" is an answer. "I don't know
yet" is a better answer.

"I can write better than anybody who can write faster, and I can write
faster than anybody who can write better" AJ Leibling


On Mon, Aug 11, 2014 at 8:38 AM, Hadley Wickham  wrote:
> Or just go to http://adv-r.had.co.nz/ ...
>
> Hadley
>
> On Sun, Aug 10, 2014 at 9:34 PM, John McKown
>  wrote:
>> Well, it says that it's from Hadley Wickham.
>>
>> https://github.com/hadley/adv-r
>> 
>>
>> This is code and text behind the Advanced R programming book.
>>
>> The site is built using jekyll, with a custom plugin to render .rmd
>> files with knitr and pandoc. To create the site, you need:
>>
>> jekyll and s3_websiter gems: gem install jekyll s3_website
>> pandoc
>> knitr: install.packages("knitr")
>>
>> 
>>
>> This contains a Rstudio project file. I know because I've done a git
>> clone on it and loaded it into Rstudio, on Linux. If you don't have
>> git, there is a "download zip" option on the site too.
>>
>> --
>> There is nothing more pleasant than traveling and meeting new people!
>> Genghis Khan
>>
>> Maranatha! <><
>> John McKown
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> http://had.co.nz/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Just stumbled across this: Advanced R programming text & code - from Hadley

2014-08-11 Thread Hadley Wickham
Or just go to http://adv-r.had.co.nz/ ...

Hadley

On Sun, Aug 10, 2014 at 9:34 PM, John McKown
 wrote:
> Well, it says that it's from Hadley Wickham.
>
> https://github.com/hadley/adv-r
> 
>
> This is code and text behind the Advanced R programming book.
>
> The site is built using jekyll, with a custom plugin to render .rmd
> files with knitr and pandoc. To create the site, you need:
>
> jekyll and s3_websiter gems: gem install jekyll s3_website
> pandoc
> knitr: install.packages("knitr")
>
> 
>
> This contains a Rstudio project file. I know because I've done a git
> clone on it and loaded it into Rstudio, on Linux. If you don't have
> git, there is a "download zip" option on the site too.
>
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C.D.F

2014-08-11 Thread Rolf Turner

On 11/08/14 20:17, pari hesabi wrote:

Hello everybody,

Can anybody help me to write a program for the CDF of sum of two
independent gamma random  variables ( covolution of two gamma
distributions)  with different amounts of parameters( the shape
parameters are the same)?



Is this homework?  The list has a no homework policy.

cheers,

Rolf Turner

--
Rolf Turner
Technical Editor ANZJS

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C.D.F

2014-08-11 Thread Prof. Dr. Matthias Kohl

Dear Diba,

you could try package distr; eg.

library(distr)
G1 <- Gammad(scale = 0.7, shape = 0.5)
G2 <- Gammad(scale = 2.1, shape = 1.7)
G3 <- G1+G2 # convolution
G3

For the convolution exact formulas are applied if available, otherwise 
we use FFT; see also http://www.jstatsoft.org/v59/i04/ (will appear 
soon) resp. a previous version at http://arxiv.org/abs/1006.0764


hth
Matthias

Am 11.08.2014 um 10:17 schrieb pari hesabi:

Hello everybody,

Can anybody help me to write a program for the CDF of sum of two independent 
gamma random  variables ( covolution of two gamma distributions)  with 
different amounts of parameters( the shape parameters are the same)?

Thank you

Diba

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Prof. Dr. Matthias Kohl
www.stamats.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need help in using Rcpp

2014-08-11 Thread PO SU
Dear Rcppusers,
  I can't figure out what  the following codes do:


int f4(Function pred, List x) {
int n = x.size();
for(int i = 0; i < n; ++i) {
LogicalVector res = pred(x[i]);
if (res[0]) return i + 1;
}
return 0;
}


I investigated it, and understand  applying a function to everypart of  a list 
then return a LogicalVector, but i can't understand 
if the first element of the LogicalVector is true then return the row index of 
the list.
Tks!






--

PO SU
mail: desolato...@163.com
Majored in Statistics from SJTU
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loops with assign() and get()

2014-08-11 Thread PO SU

It's a great method, but there is  a memory problem, DFS would occupy a large 
memory. So from this point of view, i prefer the loop.

>> for (i in 1 : nrow(unique)){
>> tmp=get(past0("DF",i))[1,]
>> assign(paste0("df",i),tmp)
>> dfi=dfi[,1:3]
>> names(dfi)=names(tmp[c(1,4,5)])
>> dfi=rbind(dfi,tmp[c(1,4,5)])

>> names(dfi)=c("UID","Date","Location")
>>}
NB: The code above  without any test!




--

PO SU
mail: desolato...@163.com
Majored in Statistics from SJTU



At 2014-08-10 06:32:38, "William Dunlap"  wrote:
>> I was able to create 102 distinct dataframes (DFs1, DFs2, DFs3, etc) using
>> the assign() in a loop.
>
>The first step to making things easier to do is to put those data.frames
>into a list.  I'll call it DFS and your data.frames will now be DFs[[1]],
>DFs[[2]], ..., DFs[[length(DFs)]].
>DFs <- lapply(paste0("DFs", 1:102), get)
>In the future, I think it would be easier if you skipped the 'assign()'
>and just put the data into a list from the start.
>
>Now use lapply to process that list, creating a new list called 'df', where
>df[[i]] is the result of processing DFs[[i]]:
>
>df <- lapply(DFs, FUN=function(DFsi) {
>  # your code from the for loop you supplied
>  dfi=DFsi[1,]
>  dfi=dfi[,1:3]
>  names(dfi)=names(DFsi[c(1,4,5)])
>  dfi=rbind(dfi,DFsi[c(1,4,5)])
>  names(dfi)=c("UID","Date","Location")
>  dfi # return this to put in list that lapply is making
>  })
>
>(You didn't supply sample data so I did not run this - there may be typos.)
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>
>On Sat, Aug 9, 2014 at 1:39 PM, Laura Villegas Ortiz  wrote:
>> Dear all,
>>
>> I was able to create 102 distinct dataframes (DFs1, DFs2, DFs3, etc) using
>> the assign() in a loop.
>>
>> Now, I would like to perform the following transformation for each one of
>> these dataframes:
>>
>> df1=DFs1[1,]
>> df1=df1[,1:3]
>> names(df1)=names(DFs1[c(1,4,5)])
>> df1=rbind(df1,DFs1[c(1,4,5)])
>> names(df1)=c("UID","Date","Location")
>>
>> something like this:
>>
>> for (i in 1 : nrow(unique)){
>>
>> dfi=DFsi[1,]
>> dfi=dfi[,1:3]
>> names(dfi)=names(DFsi[c(1,4,5)])
>> dfi=rbind(dfi,DFsi[c(1,4,5)])
>> names(dfi)=c("UID","Date","Location")
>>
>> }
>>
>> I thought it could be straightforward but has proven the opposite
>>
>> Many thanks
>>
>> Laura
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GSoC 2014 - an R package for working with RRD files

2014-08-11 Thread Plamen Dimitrov
Hello,

I'm taking part in Google Summer of Code 2014 wih Ganglia and I spent the
past few months implementing an R package that makes it possible to
directly import and work with RRD (http://oss.oetiker.ch/rrdtool/) files in
R.

There are currently three ways to use the package:

- importRRD("filename", "cf", start, stop, step) - returns a data.frame
containing the desired portion of an RRA
- importRRD('filename") - imports everything in the RRD file into a list of
data.frame objects (one per RRA)

(the metadata is read and appropriate names are given to columns and list
elements)

- getVal("filename", "cf", step, timestamp) - optimized for getting the
values at a specific timestamp, uses a cache to minimize the read frequency

Please, feel free to install and test the package: https://github.com/
pldimitrov/Rrd

I'm now getting close to finishing it so any feedback is more than welcome!

I'm especially worried about getting:
stack imbalance in '.Call', 28 then 29

warnings. Perhaps my protects are not matching the unprotects? Could you
suggest a good way to debug this?


Thanks,
Plamen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] C.D.F

2014-08-11 Thread pari hesabi
Hello everybody,

Can anybody help me to write a program for the CDF of sum of two independent 
gamma random  variables ( covolution of two gamma distributions)  with 
different amounts of parameters( the shape parameters are the same)?

Thank you

Diba 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.