Re: [R] How to connect .mdb file
imnew jubil...@live.com.sg writes: Hi, I'm currently having some problem connect .mdb file into R. I've installed the RODBC packages and I do the code this way: channel - odbcConnectAccess(C:/Users/Documents/XYZ) I have a total of 5 tables in the .mdb database. any one can help me with how to get the tables in ? You are one step away. Use sqlFetch(channel, table_name) to fetch a table into a data.frame as is. Or use sqlQuery(channel, sql) . Lookup those functions in the manual. -- Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do I need plyr, apply or something else?
R. Michael Weylandt michael.weyla...@gmail.com writes: On Wed, Jul 11, 2012 at 10:05 AM, Russell Bowdrey russell.bowd...@justretirement.com wrote: Dear all, This is what I'd like to do (I have an implementation using for loops, which I designed before I realised just how slow R is at executing them - this process currently takes days to run). I have a large dataframe containing corporate bond data, columns are: BondID Date (goes back 5years) Var1 Var2 Term2Maturity What I want to do is this: 1) For each bond, at each given date, look back over 1 year and append some statistics to each row ( sd(Var1), cor(Var1,Var2) over that year etc) Look at the TTR package and the various run** functions. Much faster. a. It seems I might be able to use ddply for this, but I can't work out how to code the stats function to only look back over one year, rather than the full data range b. For example: dfBondsWithCorr-ddply(dfBonds, .(BondID), transform,corr=cor(Var1,Var2),.progress=text) returns a dataframe where for each bond it has same corr for each date 2) On each date, subset dfBondsWithCorr by certain qualification criteria, then to the qualifiers fit a regression through a Var1 and Term2Maturity, output the regression as a df of curves (say for each date, a curve represented by points every 0.5 years) a. I can do this pretty efficiently for a single date (and I suppose I could wrap that in a function) , but can't quite see how to do the filtering and spitting out of curves over multiple dates without using for loops This ones harder. For simple linear regressions, you can solve the regression analytically (e.g., slope = runCov / runVar and mean similarly) but doing it for more complicated regressions will pretty much require a for loop of one sort or another. Can you say what sort of model you are looking to use? Would appreciate any thoughts, many thanks in advance I feel like PostgreSQL will do the work better. It has support for basic statistics [1] and you can use window functions [2] to limit the scope for last year only. Then you get your data with RODBC or something. I suspect you have you data in some sort of DB in the first place. Perhaps it has similar features. [1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE [2] http://www.postgresql.org/docs/9.1/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS -- Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is it possible to insert a figure into into another new figure by r script
Jie Tang totang...@gmail.com writes: hi R-users Now I have a figure in emf or png or tiff format that have been drawn by other tool and I want to insert this figure into my new figure by R script. I wonder if is possible ? This [1] might be relevant. [1] http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=168 -- Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make nice tiny sized figures on graphic devices producing scalable vector output?
Hello! Usually whenever I want a tiny plot, I just create it as is (or even large) and then downscale it in the end application like LaTeX of MS Word. However, all these graphic devices like postscript, pdf, win.metafile retain physical sizes, so it would be natural if I can just insert graphics as is provided those have proper physical sizes embedded. The question is what is the best method to create plots in R with their final physical sizes? I would like to create a good-looking figure by starting with letâs say win.metafile(âsome.emfâ, 3.35, 2) . Of course all defaults will produce something unreadable, so I have to scale down everything with cex at least, and probably with lwd changes. Iâve tried something like below. But dashed (and all other) lines are still too thick, plotting symbols are undistinguishable etc. And in general it looks like a mess to override all possible values. Is there a better way to downscale whatever is being plotted on a device? windowsFonts(Arial=windowsFont(TT Arial)) cex - .3 win.metafile(some.emf, 3.35, 2) data - data.frame(y=rep(c(a,b,c), each=10), x=runif(30)) bwplot(y~x, data, par.settings = modifyList( simpleTheme(cex=cex, lwd=cex), c( # theme.nopadding, axis.line=list(lwd=cex))), panel=function(...) { panel.bwplot(..., pch=|) # panel.mean(..., pch=16) }, cex=cex, ylab=list(label=expression(bold(my x label)), cex=cex), xlab=list(label=expression(bold(my y label)), cex=cex), scales=list(fontfamily=Arial, cex=cex), horizontal=TRUE) dev.off() Mikhail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make nice tiny sized figures on graphic devices producing scalable vector output?
Hello! I apologize for the previous e-mail that was unintentionally sent not as a plain text and was unreadable. Usually whenever I want a tiny plot, I just create it as is (or even large) and then downscale it in the end application like LaTeX of MS Word. However, all these graphic devices like postscript, pdf, win.metafile retain physical sizes, so it would be natural if I can just insert graphics as is provided those have proper physical sizes embedded. The question is what is the best method to create plots in R with their final physical sizes? I would like to create a good-looking figure by starting with let’s say win.metafile(“some.emf”, 3.35, 2) . Of course all defaults will produce something unreadable, so I have to scale down everything with cex at least, and probably with lwd changes. I’ve tried something like below. But dashed (and all other) lines are still too thick, plotting symbols are undistinguishable etc. And in general it looks like a mess to override all possible values. Is there a better way to downscale whatever is being plotted on a device? windowsFonts(Arial=windowsFont(TT Arial)) cex - .3 win.metafile(some.emf, 3.35, 2) data - data.frame(y=rep(c(a,b,c), each=10), x=runif(30)) bwplot(y~x, data, par.settings = modifyList( simpleTheme(cex=cex, lwd=cex), c( # theme.nopadding, axis.line=list(lwd=cex))), panel=function(...) { panel.bwplot(..., pch=|) # panel.mean(..., pch=16) }, cex=cex, ylab=list(label=expression(bold(my x label)), cex=cex), xlab=list(label=expression(bold(my y label)), cex=cex), scales=list(fontfamily=Arial, cex=cex), horizontal=TRUE) dev.off() Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Needing a better solution to a lookup problem.
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Davis, Brian Sent: Wednesday, March 14, 2012 2:28 PM To: r-help@R-project.org Subject: [R] Needing a better solution to a lookup problem. I have a solution (actually a few) to this problem, but none are computationally efficient enough to be useful. I'm hoping someone can enlighten me to a better solution. ... I have a solution that works reasonably well on small sets, but my current data set is ~100K snp entries, and my regions table has ~200K entries. I have ~1500 files to go through I haven't found a good way to efficiently solve this problem. I've tried various versions of mapply/lapply, for loops, etc which get the answer for small sets but takes hours (per file) on my real data. Bioconductor seemed like the obvious place to look, but my GoogleFu must not be that great. I never found anything relevant. Any ideas or points to the right direction would be greatly appreciated. Consider using a database. For instance PostgreSQL can easily handle large amount of data and can restrict data set to only those that are within a certain subset. While it requires some DB SQL knowledge, it will pay off. And you can query your data right from DB using RODBC or something. Solve this problem in DB and use R for further analysis. Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automating R script with Windows 7
vincent.deluard vincentdelua...@gmail.com writes: I am trying to automate the daily running of a simple R script from Windows 7. From previous posts, I understand that this needs to be done with the task scheduler. That is correct. I can schedule my laptop to automatically open R at a certain time, but not to execute a script. For example you can use schtasks /create /tn My R task /sc DAILY /ST 03:00:00 /TR C:\path_to_your_batch_file.cmd to start task daily at 3am. Secondary question: how do I save a list of R commands so that they get executed once the file is open? I highly recommend to read a manual on Rscript and use it in your batch file instead of the source-ing mentioned below. Right now, I save my code in a notepad doc and paste over in R, but there has to be another way. Consider using some IDE. If not Emacs+ESS or Eclipse, then at least Tinn-R. I have tried saving my code as .r file using the editor and open the file with R later but this does not seem to execute the code. You should source your file to execute it, i.e. source(path_to_my.R) -- Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running Total
Edward Patzelt patze...@umn.edu writes: I'm am trying to create a vector that has a running total that adds each time a 1 occurs. here's the code and data c(1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L) total - {}; for(i in 1:length(dat$Valid)){ total[i] - ifelse(dat$Valid[i-1]==1, total[i] + 1, total[i]) } total - cumsum(dat$Valid) -- Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running Total
Edward Patzelt patze...@umn.edu writes: Actually in looking at this I need it to only add if a 0 occurs instead of a 1. cumsum(1-x) On Mon, Mar 5, 2012 at 12:57 PM, jim holtman jholt...@gmail.com wrote: cumsum is probably what you want: x - c(1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, + 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, + 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, + 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, + 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, + 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, + 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, + 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L) cbind(x, cumsum(x)) x [1,] 1 1 [2,] 1 2 [3,] 0 2 [4,] 1 3 [5,] 1 4 [6,] 1 5 [7,] 1 6 [8,] 1 7 [9,] 0 7 [10,] 1 8 [11,] 0 8 [12,] 1 9 [13,] 1 10 [14,] 1 11 [15,] 1 12 [16,] 1 13 [17,] 0 13 On Mon, Mar 5, 2012 at 1:51 PM, Edward Patzelt patze...@umn.edu wrote: I'm am trying to create a vector that has a running total that adds each time a 1 occurs. here's the code and data c(1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L) total - {}; for(i in 1:length(dat$Valid)){ total[i] - ifelse(dat$Valid[i-1]==1, total[i] + 1, total[i]) } Cheers, -- Edward H. Patzelt Research Assistant TRiCAM Lab University of Minnesota Psychology/Psychiatry VA Medical Center S355 Elliot Hall: 612-626-0072 www.psych.umn.edu/research/tricam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. -- Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I map by results to original list of indices or first difference of column of data.frame with two factors?
Hello! I’m having stacked data in a data.frame with 2 factors, ordered POSIXct, and actual value as numeric (as if for lattice::xyplot). I would like to calculate first difference using “diff” function within corresponding subsets/partitions. Since data.frame is organized by factors and has sorted dates, it seems like by is a good candidate for the job. However it returns just a dumb list of vectors. It seems that I can use either expand.grid to remap results of by and hope that I won't mess up order, or I can use unique(subset(x,select=c(foo,bar))) In overall it looks like quite many steps for such task not counting assignment of those differences back to original data.frame starting from 2nd position in each partition (as diff returns shorter vector). Am I on the right track or is there an easier way to do that? Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I map by results to original list of indices or first difference of column of data.frame with two factors?
R. Michael Weylandt michael.weyla...@gmail.com writes: It'd be doubly helpful if you could post desired output as well. I beg alls pardon, I suddenly realized that in my case the solution is trivial. Here is an example with a mock-up data. Let's generate some data #+begin_src R qq - expand.grid( day=seq(ISOdate(2011,1,1),ISOdate(2011,12,31),by='day'), bar=1:4, foo=factor(c('A','B','G','I')) ) ww - within(qq, val - bar * sin(as.double(day-day[1],days) / as.double(diff(range(day)),days) * 2*pi + as.numeric(foo)/2 ) ) #+end_src We can take a look at it with #+begin_src R :results graphics :exports both :file z.png library(lattice) xyplot(val~day|foo,ww,group=ww$bar, type='l') #+end_src Now since we ditch first element in each partition anyway, we can apply diff on entire data set at once. Then we should ditch very first element in each partition. #+begin_src R ww[-1,diff] - diff(ww$val) ee - subset(ww, dayISOdate(2011,1,1)) #+end_src And a final result #+begin_src R :results graphics :exports both :file x.png xyplot(diff~day|foo,ee,group=ee$bar, type='l') #+end_src If you haven't seen it before, the easiest way to post R data is to use the dput() function to get a plain-text (mailing list friendly) representation. If your data is large, dput(head(DATA, 30)) should suffice. (We wouldn't want to clog those internet tubes...) Michael On Sat, Mar 3, 2012 at 8:55 PM, jim holtman jholt...@gmail.com wrote: If you would post a subset of your data so that we can see what you are talking about, we could probably help you come up with a solution. On Sat, Mar 3, 2012 at 7:50 PM, Mikhail Titov m...@gmx.us wrote: Hello! I’m having stacked data in a data.frame with 2 factors, ordered POSIXct, and actual value as numeric (as if for lattice::xyplot). I would like to calculate first difference using “diff” function within corresponding subsets/partitions. Since data.frame is organized by factors and has sorted dates, it seems like by is a good candidate for the job. However it returns just a dumb list of vectors. It seems that I can use either expand.grid to remap results of by and hope that I won't mess up order, or I can use unique(subset(x,select=c(foo,bar))) In overall it looks like quite many steps for such task not counting assignment of those differences back to original data.frame starting from 2nd position in each partition (as diff returns shorter vector). Am I on the right track or is there an easier way to do that? Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mikhail attachment: x.pngattachment: z.png__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing special chars in strings?
I usually use something like [\\] Mikhail On 08/31/2011 08:32 PM, . . wrote: Hi all, How can I replace those \ in the str? Thanks in advance. func - function(str) { print(gsub(\\,,str)) } func(bla\ble\bli) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Legent to the Periodogram
plot.spec uses matplot. see ?matplot for default col lty and use legend as usual. P.S. You can add plot=FALSE to spec.pgram to prevent it from plotting On 08/27/2011 05:39 PM, Peter Maclean wrote: How Can I add a legent (showing x1, x2, x3, x4) to the last plot? require(TSA) require(graphics) require(stats) t-1986:2011 x1-cos(t*1990/2011) x2-cos(t*2000/20011) x3-sin(t*1990/2011) x4-sin(t*2000/2011) y-cbind(t,x1,x2, x3,x4) y.time = ts(y.time, start=1986, frequency=1) y.spc-spec.pgram(y.time, spans = c(3,3), detrend=FALSE,log=no,plot = TRUE, kernel(modified.daniell, c(5,7))) plot(y.spc, plot.type = marginal, main=Smoothed Periodogram) Peter Maclean Department of Economics UDSM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice: How to get log base for each axis inside panel function?
Hello! I'd like to have a function to draw correct grid while using log axis with xyplot from lattice package. Right now I have the following code inside of my panel function: lim - current.panel.limits() v - latticeExtra:::logTicks(2^lim$xlim, loc=1) h - latticeExtra:::logTicks(2^lim$ylim, loc=1) panel.abline(h=log2(h), v=log2(v), col=LightGray) Is there an easy way to get log base used for particular axis to transform data so I can write general purpose panel.grid.log? Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I read a xlsx file
I prefer RODBC and odbcConnectExcel this way I can query subsets with SQL. Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ivan Calandra Sent: Monday, August 15, 2011 10:28 AM To: r-help@r-project.org Subject: Re: [R] how can I read a xlsx file ?read.xlsx Le 8/15/2011 17:19, albert coster a écrit : Hello, How can I read a xlsx file using xlsx package? Thanks Albert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Dept. Mammalogy Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternative and more efficient data manipulation
?reshape You have your data in a wide format, but you want it in a long format. reshape can convert it both ways. Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sam Albers Sent: Monday, August 15, 2011 6:58 PM To: r-help@r-project.org Subject: [R] Alternative and more efficient data manipulation Hello list, ## I have been doing the following process to convert data from one form to another for a while but it occurs to me that there is probably an easier way to do this. I am often given data that have column names which are actually data and I much prefer dealing with data that are sorted by factors. So to convert the columns I have previously made use of make.groups() in the lattice package which works completely satisfactorily. However, it is a bit clunky for what I am using it for and I have to carry the other variables forward. Can anyone suggest a better way of converting data like this? library(lattice) dat - data.frame(`x1`=runif(6, 0, 125), `x2`=runif(6, 50, 75), `x3`=runif(6, 0, 100), `x4`=runif(6, 0, 200), date = as.Date(c(2009-09-25,2009-09-28,2009-10-02,2009-10-07,2009-10- 15,2009-10-21)), yy= head(letters,2), check.names=FALSE) ## Here is an example of the type of data that NEED converting dat dat.group - with(dat, make.groups(x1,x2,x3,x4)) ## Carrying the other variables forward dat.group$date - dat$date dat.group$yy - dat$yy ## Here is an example of what I would like the data to look like dat.group ## The point of this all is so that I can used the data in a manner such as this: with(dat.group, xyplot(data ~ as.numeric(substr(which, 2,2))|yy, groups=date)) ## So I suppose what I am asking is if there is a more efficient way of doing this? Thanks so much in advance! Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I subset a dataframe
Eric: Create another column using grep and regular expression of your choice, then subset based on that column. Jorge: OP wants inexact match. P.S. I'd use RDBMS and SQL to pull data of interest Mikhail On 08/14/2011 02:20 AM, Jorge Ivan Velez wrote: Hi eric, See R ?%in% and try the following (untested): subset(zeespan, !customer %in% c(ibm , exxon , sears) ) HTH, Jorge On Sat, Aug 13, 2011 at 7:44 PM, eric wrote: I have a dataframe zeespan. One of the columns has the name customer. The data in the customer column is text. I would like to return a subset of the dataframe with all rows that DON'T begin with either ibm or exxon, or sears in the customer column. I tried subset(zeespan, customer != c(ibm | exxon | sears) ) That didn't work and even if it did, the text would have to be an exact match where what I really want is begins with. Suggestions on how to do this would be appreciated -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-subset-a-dataframe-tp3742172p3742172.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not sure how to use aggregate, colSums, by
I hope this will help you get going b - sapply(unique(test$e2), function(x) { out - aggregate(cbind(y,f)~e1, subset(test, e2==x),sum) out - rbind(out, data.frame(e1=total, y=sum(out$y), f=sum(out$f))) out - list(out) names(out) - x out }) b $std e1 y f 1 can 21 120 2 usa 42 198 3 total 63 318 $con e1 y f 1can 21 108 2 france 21 114 3 italy 21 126 4 total 63 348 On 08/14/2011 12:20 PM, eric wrote: I have a data frame called test shown below that i would like to summarize in a particular way : I want to show the column sums (columns y ,f) grouped by country (column e1). However, I'm looking for the data to be split according to column e2. In other words, two tables of sum by country. One table for con and one table for std shown in column e2. Finally at the bottom of the two tables, I would like the overall sum /Totals for all the countries for the two columns (y,f). The lay outs for the two tables I'm looking for are also shown below in case my description isn't completely clear I would also like to be able to use the Totals of y and f for the two tables in other calculations. I can get the two sets of totals with the following commands but not the sums by country. colSums(test[test$e2==std, c(3,4)]) colSums(test[test$e2==con, c(3,4)]) I know there's an easy way to do this with a combination of colSums, by, aggregate but I can't seem to get it. std y f usasum sum francesum sum cansum sum italy sum sum Totalssum sum con y f usa sum sum france sum sum can sum sum italy sum sum Totalssum sum e1 e2 y f 1 usa std 1 1 2 usa std 1 2 3 can con 1 3 4 france con 1 4 5 can std 1 5 6 italy con 1 6 7 usa std 2 7 8 usa std 2 8 9 can con 2 9 10 france con 2 10 11can std 2 11 12 italy con 2 12 13usa std 3 13 14usa std 3 14 15can con 3 15 16 france con 3 16 17can std 3 17 18 italy con 3 18 19usa std 4 19 20usa std 4 20 21can con 4 21 22 france con 4 22 23can std 4 23 24 italy con 4 24 25usa std 5 25 26usa std 5 26 27can con 5 27 28 france con 5 28 29can std 5 29 30 italy con 5 30 31usa std 6 31 32usa std 6 32 33can con 6 33 34 france con 6 34 35can std 6 35 36 italy con 6 36 -- View this message in context: http://r.789695.n4.nabble.com/Not-sure-how-to-use-aggregate-colSums-by-tp3743258p3743258.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Any alternatives to draw.colorkey from lattice package?
Felix: Thank you! Perhaps I should read documentation more careful as I missed that another `at`. lattice latticeExtra are so marvelous so I hardly want to use anything else. Mikhail On 08/13/2011 07:31 AM, Felix Andrews wrote: You can just specify the label positions, you don't need to give labels for every color change point: (there is an 'at' for the color changes and a 'labels$at' for the labels) levelplot(rnorm(100) ~ x * y, expand.grid(x = 1:10, y = 1:10), colorkey = list(at = seq(-3,3,length=100), labels = list(labels = paste(-3:3, units), at = -3:3))) On 13 August 2011 19:59, Jim Lemon j...@bitwrit.com.au wrote: On 08/13/2011 04:34 AM, Mikhail Titov wrote: Hello! I’d like to have a continuous color bar on my lattice xyplot with colors lets say from topo.colors such that it has ticks labels at few specific points only. Right now I use do.breaks level.colors with somewhat large number of steps. The problem is that color change point doesn’t necessary correspond to the value I’d like to label. Since I have many color steps and I don’t need high precision I generate labels like this labels- ifelse( sapply(at,function(x) any(abs(att-x).03)) , sprintf(depth= %s ft, at), ) , where `att` has mine points of interest on color scale bar and `at` corresponds to color change points used with level.colors . It is a bit inconvenient as I have to adjust threshold `.03`, number of color steps so that it labels only adjacent color change point with my labels. Q: Are there any ready to use functions that would generate some kind of GRaphical OBject with continuous color scale bar/key with custom at/labels such that it would work with `legend` argument of xyplot from lattice? Hi Mikhail, I think that color.legend in the plotrix package will do what you are asking, but it is in base graphics, and may not work with lattice. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression
Don't forget to load `lattice` package. `latticeExtra` with `panel.ablineeq` can be also helpful. This was however for plotting. For subset regression by each WR without plotting you'd use something like `lapply` or `sapply`. ans - sapply(unique(data$WR), function(dir) { out - list(lm(PM10~Ref, subset(data, WR=dir))) names(out) - dir out }) `ans$West` will return one of the results. There are many ways to skin a cat. Perhaps it was not the best one. Mikhail On 08/13/2011 11:30 AM, Dennis Murphy wrote: Hi: Try something like this, using dat as the name of your data frame: xyplot(PM10 ~ Ref | WR, data = dat, type = c('p', 'r')) The plot looks silly with the data snippet you provided, but should hopefully look more sensible with the complete data. The code creates a four panel plot, one per direction, with points and a least squares regression line fit in each panel. The regression line is specific to a data subset, not the entire data frame. HTH, Dennis On Sat, Aug 13, 2011 at 5:43 AM, maggy yan kiot...@googlemail.com wrote: dear R users, my data looks like this PM10 Ref UZ JZ WT RH FT WR 1 10.973195 4.338874 nein Winter Dienstag ja nein West 26.381684 2.250446 nein SommerSonntag nein ja Süd 3 62.586512 66.304869 ja SommerSonntag nein nein Ost 45.590101 8.526152 ja Sommer Donnerstag nein nein Nord 5 30.925054 16.073091 nein WinterSonntag nein nein Ost 6 10.750567 2.285075 nein Winter Mittwoch nein nein Süd 7 39.118316 17.128691 ja SommerSonntag nein nein Ost 89.327564 7.038572 ja Sommer Montag nein nein Nord 9 52.271744 15.021977 nein Winter Montag nein nein Ost 10 27.388416 22.449102 ja Sommer Montag nein nein Ost . . . . til 200 I'm trying to make a linear regression between PM10 and Ref for each of the four WR, I've tried this: plot(Nord$PM10 ~ Nord$Ref, main=Nord, xlab=Ref, ylab=PM10) but it does not work, because Nord cannot be found what was wrong? how can I do it? please help me [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice panel.abline use
I hope this helps: xyplot(val ~ x | type, panel=function(...) { panel.xyplot(...) panel.abline(h=.6) }) Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jjap Sent: Friday, August 12, 2011 10:41 AM To: r-help@r-project.org Subject: [R] lattice panel.abline use Dear R-users, I am unsuccessful in trying to add an horizontal line to all graphs in the example below: library(lattice) val-runif(15) x-rep(seq(1:5),3) type-c(rep(a,5), rep(b,5), rep(c,5)) xyplot(val ~ x | type, panel.abline(h=.6)) Any hints are appreciated. Best regards, ---Jean -- View this message in context: http://r.789695.n4.nabble.com/lattice-panel- abline-use-tp3739693p3739693.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode Variable in dependence of values of two other variables
?aggregate aggregate(X~ID, your.data.frame.goes.here, mean) Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Julia Moeller Sent: Friday, August 12, 2011 10:10 AM To: r-help@r-project.org Subject: [R] recode Variable in dependence of values of two other variables Hi, as an R-beginner, I have a recoding problem and hope you can help me: I am working on a SPSS dataset, which I loaded into R (load(C:/...) I have 2 existing Variables: ID and X , and one variable to be computed: meanX.dependID (=mean of X for all rows in which ID has the same value) ID = subject ID. Since it is a longitudinal dataset, there are repeated measurement points for each subject, each of which appears in a new row. So, each ID value appears in many rows. (e.g. ID ==1 in row 1:5; ID ==2 in rows 6:8 etc). Now: For all rows, in which ID has a certain value, meanX.dependID shall be the mean of X in for these rows. How can I automatisize that, without having to specify the number of the rows each time? e.g. IDXmeanX.dependID 122.25 132.25 112.25 132.25 253.3 223.3 233.3 343 313 323 333 343 353 Thanks a lot! Hope this is the right place to post, if not, please tell me! best, Julia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard error bars on bar plots
Or ?barplot2 barplot2(means,main=Proportion of time spent in AMU sector,xlab=Treatments,ylab=Proportion of time,names.arg=c(Solitary,Size-matched conspecific,Sub-adult conspecific),cex.names=0.85,axis.lty=1,ylim=c(0,0.4), plot.ci=TRUE,ci.l=means,ci.u=means+halfSE) Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Uwe Ligges Sent: Friday, August 12, 2011 12:04 PM To: Christopher Crooks Cc: r-help@r-project.org Subject: Re: [R] Standard error bars on bar plots On 12.08.2011 18:43, Christopher Crooks wrote: Hi, I know there have been numerous posts about this but I am unable to find one, or at least carry out one, that gives me the plot I want. I have managed to add the error bars to the plot, but they end up not aligned with the centre of the bars themselves. Here is my script: means-c(0.13528,0.082829167,0.2757625) SE-c(0.036364714,0.039582896,0.06867608) halfSE-c(0.018182357,0.019791448,0.03433804) barx-barplot(means,main=Proportion of time spent in AMU sector,xlab=Treatments,ylab=Proportion of time,names.arg=c(Solitary,Size-matched conspecific,Sub-adult conspecific),cex.names=0.85,axis.lty=1,ylim=c(0,0.4)) library(gplots) plotCI(x=means,uiw=halfSE,lty=1,gap=0,add=TRUE) plotCI(x=barx, y=means,uiw=halfSE,lty=1,gap=0,add=TRUE) Uwe Ligges Thanks for any suggestions or help you may have, CJ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Any alternatives to draw.colorkey from lattice package?
Hello! I’d like to have a continuous color bar on my lattice xyplot with colors lets say from topo.colors such that it has ticks labels at few specific points only. Right now I use do.breaks level.colors with somewhat large number of steps. The problem is that color change point doesn’t necessary correspond to the value I’d like to label. Since I have many color steps and I don’t need high precision I generate labels like this labels - ifelse( sapply(at,function(x) any(abs(att-x).03)) , sprintf(depth= %s ft, at), ) , where `att` has mine points of interest on color scale bar and `at` corresponds to color change points used with level.colors . It is a bit inconvenient as I have to adjust threshold `.03`, number of color steps so that it labels only adjacent color change point with my labels. Q: Are there any ready to use functions that would generate some kind of GRaphical OBject with continuous color scale bar/key with custom at/labels such that it would work with `legend` argument of xyplot from lattice? Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC: sqlUpdate doesn't handle properly POSIXct field?
Hello all! Can someone confirm whether there is a bug or not? I was trying to use sqlUpdate in place of sqlSave as data set I import has duplications. However I get errors while using fast=FALSE argument to safely update/ignore duplicates: Error while executing the query[RODBC] ERROR: Could not SQLExecDirect 'UPDATE data SET logger=1, value=0.0321584 WHERE time=2008-09-22 13:15:00' Error in sqlUpdate(con2, na.omit(dat), data, fast = FALSE) : 42601 7 ERROR: syntax error at or near 13; It looks like POSIXct class is not escaped properly. I have R 2.12.2 running on Windows XP 32 bit, and Iâm using PostgreSQL database. Column time is supposedly of âtimestamp without time zoneâ. Here is what I have in the data frame Iâm pushing to DB: class(dat$time) [1] POSIXct POSIXt Mikhail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC: sqlUpdate doesn't handle properly POSIXct field?
I apologize for the first e-mail as I didn't use plain text. Here is the full message. --8 --8-- Hello all! Can someone confirm whether there is a bug or not? I was trying to use sqlUpdate in place of sqlSave as data set I import has duplications. However I get errors while using fast=FALSE argument to safely update/ignore duplicates: Error while executing the query[RODBC] ERROR: Could not SQLExecDirect 'UPDATE data SET logger=1, value=0.0321584 WHERE time=2008-09-22 13:15:00' Error in sqlUpdate(con2, na.omit(dat), data, fast = FALSE) : 42601 7 ERROR: syntax error at or near 13; It looks like POSIXct class is not escaped properly. I have R 2.12.2 running on Windows XP 32 bit, and I’m using PostgreSQL database. Column time is supposedly of ‘timestamp without time zone’. Here is what I have in the data frame I’m pushing to DB: class(dat$time) [1] POSIXct POSIXt Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mikhail Titov Sent: Sunday, August 07, 2011 7:22 PM To: r-help@r-project.org Subject: [R] RODBC: sqlUpdate doesn't handle properly POSIXct field? Hello all! Can someone confirm whether there is a bug or not? I was trying to use sqlUpdate in place of sqlSave as data set I import has duplications. However I get errors while using fast=FALSE argument to safely update/ignore duplicates: Error while executing the query[RODBC] ERROR: Could not SQLExecDirect 'UPDATE data SET logger=1, value=0.0321584 WHERE time=2008-09-22 13:15:00' Error in sqlUpdate(con2, na.omit(dat), data, fast = FALSE) : 42601 7 ERROR: syntax error at or near 13; It looks like POSIXct class is not escaped properly. I have R 2.12.2 running on Windows XP 32 bit, and Ib __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC: sqlUpdate doesn't handle properly POSIXct field?
I've tried to upgrade to latest (?) http://cran.mtu.edu/bin/windows/contrib/2.12/RODBC_1.3-3.zip , but no luck. It looks like nothing is being escaped as I tried to use character class, but I got same error message. Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mikhail Titov Sent: Sunday, August 07, 2011 7:27 PM To: r-help@r-project.org Subject: Re: [R] RODBC: sqlUpdate doesn't handle properly POSIXct field? I apologize for the first e-mail as I didn't use plain text. Here is the full message. --8--- - --8-- Hello all! Can someone confirm whether there is a bug or not? I was trying to use sqlUpdate in place of sqlSave as data set I import has duplications. However I get errors while using fast=FALSE argument to safely update/ignore duplicates: Error while executing the query[RODBC] ERROR: Could not SQLExecDirect 'UPDATE data SET logger=1, value=0.0321584 WHERE time=2008-09-22 13:15:00' Error in sqlUpdate(con2, na.omit(dat), data, fast = FALSE) : 42601 7 ERROR: syntax error at or near 13; It looks like POSIXct class is not escaped properly. I have R 2.12.2 running on Windows XP 32 bit, and I’m using PostgreSQL database. Column time is supposedly of ‘timestamp without time zone’. Here is what I have in the data frame I’m pushing to DB: class(dat$time) [1] POSIXct POSIXt Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mikhail Titov Sent: Sunday, August 07, 2011 7:22 PM To: r-help@r-project.org Subject: [R] RODBC: sqlUpdate doesn't handle properly POSIXct field? Hello all! Can someone confirm whether there is a bug or not? I was trying to use sqlUpdate in place of sqlSave as data set I import has duplications. However I get errors while using fast=FALSE argument to safely update/ignore duplicates: Error while executing the query[RODBC] ERROR: Could not SQLExecDirect 'UPDATE data SET logger=1, value=0.0321584 WHERE time=2008-09-22 13:15:00' Error in sqlUpdate(con2, na.omit(dat), data, fast = FALSE) : 42601 7 ERROR: syntax error at or near 13; It looks like POSIXct class is not escaped properly. I have R 2.12.2 running on Windows XP 32 bit, and Ib __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert a single column into many rows
I guess matrix(x, ncol=73, byrow=TRUE) should work Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Zablone Owiti Sent: Wednesday, March 23, 2011 6:14 AM To: r-help@r-project.org Subject: [R] How to convert a single column into many rows Dear users, I wish to convert a column of data containing pentad (5day mean data) from 1962 - 2000 into rows with each row having 73 values (ie. 73 pentads per year). 1962 pent1 pent2 pent73 . . . . 2000 pent1 pent2 ..pent73 What commands should I use to achieve this? Thanks ZABLONE OWITI GRADUATE STUDENT College of Atmospheric Science Nanjing University of Information, Science and Technology Add: 219 Ning Liu Rd, Nanjing, Jiangsu, 21004, P.R. China Tel: +86-25-58731402 Fax: +86-25-58731456 Mob. 15077895632 Website: www.nuist.edu.cn DO NOT PRINT THIS E-MAIL UNLESS NECESSARY. THE ENVIRONMENT CONCERNS US ALL. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Any existing functions for reading and extracting data from path names?
I'm not sure what you are trying to achieve, but I think this can be a good starting point: files - list.files(deleteme, full.names=TRUE, recursive=TRUE) names - sapply(strsplit(files, /, TRUE), [, 2) x - lapply(files, function(f) { out - read.csv(f) out$city - strsplit(f, /, TRUE)[[1]][2] out }) y - do.call(rbind, x) Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ista Zahn Sent: Friday, March 11, 2011 10:53 AM To: r-help@r-project.org Subject: [R] Any existing functions for reading and extracting data from path names? Hi helpeRs, I have inherited a set of data files that use the file system as a sort of poor man's database, i.e., the data files are nested in directories that indicate which city they come from. For example: dir.create(deleteme) for(i in paste(deleteme, c(New York, Los Angeles), sep=/)) { dir.create(i) for(j in paste(data, 1:2, .csv, sep=)) { write.csv(data.frame(x=1:10), file=paste(i, j, sep=/)) } } list.files(deleteme, recursive=TRUE) What I want to end up with is xcity wave 1New York1 1 Los Angeles1 1New York2 1 Los Angeles2 I've started writting a simple function to do this, but it seems like a common situation and I'm wondering if there are any packages or functions that might make this easier. Thanks! Ista -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to group data by day
It depends what would you like to get at the end. Perhaps you don't necessary need this type of numbering. For instance, if you'd like to calculate daily average. london$id - as.Date(london$id) For sum by day you could use, let's say, this aggregate(words~id,london,FUN=sum) If you really want what you've asked: london$one=1 u=unique(london$id) z=aggregate(one~id,london,FUN=sum) london$day=rep(seq(along.with=z$one),z$one) Mikhail -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Michela Ferron Sent: Monday, February 14, 2011 11:09 AM To: r-help@r-project.org Subject: [R] How to group data by day Hi everybody, I'm a beginner in R and I'm having a hard time grouping my data by day. The data are in this format: id; words 2005-07-07T09:59:56Z; 35 2005-07-07T10:01:39Z; 13 2005-07-08T10:02:22Z; 1 2005-07-09T10:03:16Z; 23 2005-07-10T10:04:23Z; 39 2005-07-10T10:04:39Z; 15 I've transformed the date strings in dates with the function: london$id - transform(london$id, as.Date(london$id, format=%Y-%m- %d%T%H:%M:%S%Z)) and it seems to work. Now I would like to add a new day variable to group data by day, like this: id; words; day 2005-07-07T09:59:56Z; 35; 1 2005-07-07T10:01:39Z; 13; 1 2005-07-08T10:02:22Z; 1; 2 2005-07-09T10:03:16Z; 23; 3 2005-07-10T10:04:23Z; 39; 4 2005-07-10T10:04:39Z; 15; 4 How can I do that? Many thanks! Michela __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice: dots from xyplot to xscale.components
Hello! I posted a feature request at lattice page on r-forge at https://r-forge.r-project.org/tracker/index.php?func=detailaid=1127group_id=638atid=2570 , but I decided to duplicate it here to make sure that I understand everything correctly. I would like to slightly change the way my plot axis labels look alike based on custom extra arguments to xyplot and bwplot. Right now these arguments are passed to my panel function, but they are not a part of dots in xscale.components :( Is it a big problem to fix? Is there an easy workaround? Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] typo in ts detrending implementation in spec.pgram?
Hello! I wonder if there is a typo in detrending code of spec.pgram in spectrum.R from stats package. One can see in the code https://svn.r-project.org/R/trunk/src/library/stats/R/spectrum.R . I am afraid there is a typo and the code should look like if (detrend) { t - 1L:N - (N + 1)/2 sumt2 - N * (N^2 - 1)/12 for (i in 1L:ncol(x)) x[, i] - x[, i] - mean(x[, i]) - sum((x[, i]-mean(x[,i]) * t) * t/sumt2 } Note x[, i]-mean(x[,i]) instead of x[,i] only as in repository. Here is a quick reference http://en.wikipedia.org/wiki/Simple_linear_regression#Estimating_the_regression_line . Note $\hat b$ there. It has not x in summation, but x-mean(x). Perhaps, the even better solution would be resid(lm(x[,i] ~ seq(along = x[,i]))) . See http://tolstoy.newcastle.edu.au/R/help/05/01/10115.html Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] typo in Lomb-Scargle periodogram implementation in spec.ls() from cts package?
Hello! I tried to contact author of the package, but I got no reply. That is why I write it here. This might be useful for those who were using cts for spectral analysis of non-uniformly spaced data. In file spec.ls.R from cts_1.0-1.tar.gz lines 59-60 are written as pgram[k, i, j] - 0.5 * ((sum(x[1:length(ti)]* cos(2 * pi * freq.temp[k] * (ti - tao^2/sum((cos(2 * pi * freq.temp[k] * (ti - tao)))^2) + (sum(x[1:length(ti)] * sin(2 * pi * freq.temp[k] * (ti - tao^2 === ) === /sum((sin(2 * pi * freq.temp[k] * (ti - tao)))^2) Is there a misplaced bracket (shown like === ) ===)? Should it be like the following? pgram[k, i, j] - 0.5 * ((sum(x[1:length(ti)]* cos(2 * pi * freq.temp[k] * (ti - tao^2/sum((cos(2 * pi * freq.temp[k] * (ti - tao)))^2) + (sum(x[1:length(ti)] * sin(2 * pi * freq.temp[k] * (ti - tao^2/sum((sin(2 * pi * freq.temp[k] * (ti - tao)))^2) === ) === Here is quick reference http://en.wikipedia.org/wiki/Least-squares_spectral_analysis#The_Lomb.E2.80.93Scargle_periodogram . One half coefficient was not applied to entire expression. Also I find weird next lines (61-62) pgram[1, i, j] - 0.5 * (pgram[2, i, j] + pgram[N, i, j]) First of all, such things should not be in the for loop. Second, I don't quite understand the meaning of it. P.S. Should I use tapering of my data? If I just try to fit sine and cosine, I may not use it, however for FFT windowing is a must. What about Lomb-Scargle? Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.