[R] use of names() within lapply()
Dear all, List g has 2 elements names(g) [1] 2009-10-07 2012-02-29 and the list plot lapply(g, plot, main=names(g)) results in equal plot titles with both list names, whereas distinct titles names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in stead of consecutively passing g[1] and then g[2] to process the additional 'main' argument to plot. help(lapply) is mute as to what to element-wise pass parameters. Any suggestion would be appreciated. Kind regards, Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use of names() within lapply()
Dear Duncan and A.K. Many thanks for your super quick help. The modified lapply did the trick, mapply died with a error Error in dots[[2L]][[1L]] : object of type 'builtin' is not subsettable. Kind regards, Ivan On 17 Apr 2013, at 17:12, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 17/04/2013 11:04 AM, Ivan Alves wrote: Dear all, List g has 2 elements names(g) [1] 2009-10-07 2012-02-29 and the list plot lapply(g, plot, main=names(g)) results in equal plot titles with both list names, whereas distinct titles names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in stead of consecutively passing g[1] and then g[2] to process the additional 'main' argument to plot. help(lapply) is mute as to what to element-wise pass parameters. Any suggestion would be appreciated. I think you want mapply rather than lapply, or you could do lapply on a vector of indices. For example, mapply(plot, g, main=names) or lapply(1:2, function(i) plot(g[[i]], main=names(g)[i])) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] recursive function on a structured list of lists (dendrogram)
Dear all, I have been trying the following without avail and would be very grateful for any help. From a dendrogram (recursive list of lists with some structure), I would like to obtain some information of the component lists and of the enclosing list at the same time. In dendrogram-speech I basically would like the label of the leaf and the height of the enclosing branch. A dendrogram example (from the help file of stats::dendrogram), and some functions showing how it is structured: hc - hclust(dist(USArrests), ave) dend1 - as.dendrogram(hc) plot(dend1) str(dend1) Similarly to dendrapply(), I tried o recursively obtain from the tree a list including, for each member (leaf) the height of the list containing it. However, I fail to fully grasp how the 'recursiveness' is made within the function saving both elements at the leaf and branch levels. For reference the dendrapply function is as follows: function (X, FUN, ...) { FUN - match.fun(FUN) if (!inherits(X, dendrogram)) stop('X' is not a dendrogram) Napply - function(d) { r - FUN(d, ...) if (!is.leaf(d)) { if (!is.list(r)) r - as.list(r) if (length(r) (n - length(d))) r[seq_len(n)] - vector(list, n) r[] - lapply(d, Napply) } r } Napply(X) } I essentially don't manage to 'save' the height of a branch (a list of lists) so that it can be used at the next iterations for adding to the leafs there. Many thanks for any guidance on how to recursively implement a function. Kind regards, Ivan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 (related) problems with RODBC in 64 bit Windows
Dear Uwe, Many thanks for the reply. On 1, the problem is that RODBC on 32 bit ' interprets' factors correctly, whereas on 64 bit it gives the error below. On both systems forcing characters (via colClasses = character in read.csv), results in no problems. I still see this as a problem of implementation on 64 bit. On 2, many thanks, once I gather the courage to address Prof. Ripley I will send him a recollection of my experience. Kind regards, Ivan On 29 Aug 2012, at 15:08, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 24.08.2012 21:53, Ivan Alves wrote: Hi all, I am encountering an RODBC problem in R 2.15.1 in windows 64 bit which I do not encountered in the same set up in windows 32 bit (the latest binary version of RODBC in both cases from the same depository gotten by install.packages(‘RODBC’), Oracle ODBC client software installed in 64 and 32 bit respectively) 1. The code looks like library(RODBC) credentials - read.csv(~/credentials.csv, head=T, row.names=1) db - odbcConnect(dsn=DSN, uid=credentials[DSN, username], pwd=credentials[DSN, password], rows_at_time=1024) on which the odbcConnect call fails with the following error code Error in nchar(uid) : 'nchar()' requires a character vector ( credentials are processed correctly and credentials[DSN, username] correctly returns – by the way a factor – [1] _username_ Levels: … ). When I run the equivalent call with direct arguments db - odbcConnect(DSN, uid=_username_, pwd=_password_, rows_at_time=1024) it works just fine. Furthermore both work just fine on windows 32 bit, or on both systems when the colClasses = character option is used. Is this perhaps a problem with RODBC in 64 bit when dealing with factors that is not a problem in 32 bit? I think 32-bit and 64-bit behave the same way (but you have not compared exactly), reading credentials - read.csv(~/credentials.csv, head=T, row.names=1) results in factors for username and password that have to be converted to character. It is unrelated to RODBC. 2. Furthermore (and as reported in http://stackoverflow.com/questions/3407015/querying-oracle-db-from-revolution-r-using-rodbc), there are issues with using sqlQuery without the option believeNRows=FALSE, as RODBC seems to still have issues with signed vs. unsigned integer (or sizeof(long) between 32 and 64 bit. Don't know, but that is something you may want to report (preferrably including patches) to the package maintainer. Uwe ligges Any chance the problems have the same source in RODBC code and could be addressed in the near future (after apparently years of making difficult the transition to 64 bit for work with Oracle servers)? (is there an implicit encouragement to use RJDBC when combining 64 bit R use and Oracle databases?) Many thanks in advance for any guidance. Ivan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2 (related) problems with RODBC in 64 bit Windows
Hi all, I am encountering an RODBC problem in R 2.15.1 in windows 64 bit which I do not encountered in the same set up in windows 32 bit (the latest binary version of RODBC in both cases from the same depository gotten by install.packages(RODBC), Oracle ODBC client software installed in 64 and 32 bit respectively) 1. The code looks like library(RODBC) credentials - read.csv(~/credentials.csv, head=T, row.names=1) db - odbcConnect(dsn=DSN, uid=credentials[DSN, username], pwd=credentials[DSN, password], rows_at_time=1024) on which the odbcConnect call fails with the following error code Error in nchar(uid) : 'nchar()' requires a character vector ( credentials are processed correctly and credentials[DSN, username] correctly returns by the way a factor [1] _username_ Levels: ). When I run the equivalent call with direct arguments db - odbcConnect(DSN, uid=_username_, pwd=_password_, rows_at_time=1024) it works just fine. Furthermore both work just fine on windows 32 bit, or on both systems when the colClasses = character option is used. Is this perhaps a problem with RODBC in 64 bit when dealing with factors that is not a problem in 32 bit? 2. Furthermore (and as reported in http://stackoverflow.com/questions/3407015/querying-oracle-db-from-revolution-r-using-rodbc), there are issues with using sqlQuery without the option believeNRows=FALSE, as RODBC seems to still have issues with signed vs. unsigned integer (or sizeof(long) between 32 and 64 bit. Any chance the problems have the same source in RODBC code and could be addressed in the near future (after apparently years of making difficult the transition to 64 bit for work with Oracle servers)? (is there an implicit encouragement to use RJDBC when combining 64 bit R use and Oracle databases?) Many thanks in advance for any guidance. Ivan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Graph in R with edge weights
Hi Arthur, I was asking the same thing and came across the following (your need the sna library). http://students.washington.edu/mclarkso/documents/gplot%20Ver2.pdf Take a look at the edge.lwd and vertex.cex examples of the function gplot. You can use vectors for the different nodes. Kind regards, Ivan On Dec 1, 2010, at 9:31 AM, arturs.onz...@gmail.com wrote: Can you please show code example, how to draw graph with some nodes and edges, but with weights. I only found here http://www.bioconductor.org/packages/release/bioc/vignettes/Rgraphviz/inst/doc/Rgraphviz.pdf- Using edge weights for labels, but... Here an example: library(graph); library(Rgraphviz) myNodes = c(s, p, q, r) myEdges = list( s = list(edges = c(p, q)), p = list(edges = c(p, q)), q = list(edges = c(p, r)), r = list(edges = c(s))) g = new(graphNEL, nodes = myNodes, edgeL = myEdges, edgemode = directed) plot(g) but how about weights? Thanx. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 adding vertical line at a certain date
check out geom_vline + geom_vline(xintercept=as.numeric(as.Date(2002-11-01))) [you may not need to convert the date to numeric in the most recent ggplot2 version] On 27 May 2009, at 20:31, stephen sefick wrote: library(ggplot2) melt.updn - (structure(list(date = structure(c(11808, 11869, 11961, 11992, 12084, 12173, 12265, 12418, 12600, 12631, 12753, 12996, 13057, 13149, 11808, 11869, 11961, 11992, 12084, 12173, 12265, 12418, 12600, 12631, 12753, 12996, 13057, 13149), class = Date), variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(unrestored, restored), class = factor), value = c(1.1080259671261, 0.732188576856918, 0.410334408061265, 0.458980396410056, 0.429867902470711, 0.83126337241925, 0.602008712602784, 0.818751283264408, 1.12606382402475, 0.246174719479079, 0.941043753226865, 0.986511619794787, 0.291074883642735, 0.346361775752625, 1.36209038621623, 0.878561166753624, 0.525156715576168, 0.80305564765846, 1.08084449441812, 1.24906568558731, 0.970954515841768, 0.936838439269239, 1.26970090246036, 0.337831520417547, 0.909204325710795, 0.951009811036613, 0.290735620653709, 0.426683515714219)), .Names = c(date, variable, value), row.names = c(NA, -28L), class = data.frame)) qplot(date, value, data=melt.updn, shape=variable)+geom_smooth() #I would like to add a line at November 1, 2002 #thanks for the help -- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frames with å, ä, and ö (=n on-ASCII-characters) from windows to mac os x
Hi, On my system (see below), it works fine (inputing the code below at the R prompt). Make sure that the encoding of the input file is encoded UTF-8. Rgds, Ivan sessionInfo() R version 2.8.1 Patched (2009-01-14 r47602) i386-apple-darwin9.6.0 locale: en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,14L, 12L), .Label = c(AB, AC, BD, C, D, E, F, G,H, I, K, M, N, O, S, T, U, W, X, Y, Z), class = factor), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L,8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,19L, 11L), .Label = c(Blekinge län, Dalarnas län, Gotlands län,Gävleborgs län,Hallands län, Jämtlands län, Jönköpings län,Kalmar län, Kronobergs län, Norrbottens län, Skåne län,Stockholms län, Södermanlands län, Uppsala län, Värmlands län,Västerbottens län, Västernorrlands län, Västmanlands län,Västra Götalands län, Örebro län, Östergötlands län), class =factor)), .Names = c(LANKOD,Län), class = data.frame, row.names = c(0, 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20)) LANKOD Län 0 K Blekinge län 1 X Gävleborgs län 2 I Gotlands län 3 N Hallands län 4 ZJämtlands län 5 F Jönköpings län 6 H Kalmar län 7 W Dalarnas län 8 G Kronobergs län 9 BD Norrbottens län 10 T Örebro län 11 EÖstergötlands län 12 DSödermanlands län 13 C Uppsala län 14 SVärmlands län 15 ACVästerbottens län 16 Y Västernorrlands län 17 U Västmanlands län 18 AB Stockholms län 19 O Västra Götalands län 20 MSkåne län Länkarta - structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L,7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L,14L, 12L), .Label = c(AB, AC, BD, C, D, E, F, G,H, I, K, M, N, O, S, T, U, W, X, Y, Z), class = factor), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L,8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L,19L, 11L), .Label = c(Blekinge län, Dalarnas län, Gotlands län,Gävleborgs län,Hallands län, Jämtlands län, Jönköpings län,Kalmar län, Kronobergs län, Norrbottens län, Skåne län,Stockholms län, Södermanlands län, Uppsala län, Värmlands län,Västerbottens län, Västernorrlands län, Västmanlands län,Västra Götalands län, Örebro län, Östergötlands län), class =factor)), .Names = c(LANKOD,Län), class = data.frame, row.names = c(0, 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20)) ls() [1] Länkarta On 16 Jan 2009, at 14:13, Gustaf Rydevik wrote: Hi, I ran into this issue previously and managed to solve it, but I've forgotten how and am getting frustrated... I have a data frame (see below) with scandinavian characters in R (2.7.1) running on a Win Xp-computer. I save the data frame in an RData-file on a usb stick, and load() it in R (2.8.0) running on OS X 10.5. Now the name of the data frame and all factor labels with scandinavian characters are scrambled. How do I make R in OS X read my data frame? From what I've managed to find in the list archives and the FAQ I either 1) run Sys.setlocale(LC_ALL,en_US.UTF-8) ### Doesn't change anything or 2) run defaults write org.R-project.R force.LANG en_US.UTF-8 in the terminal, which doesn't help either. I must admit that I couldn't quite follow what documentation i found on locales, so I might have messed up somewhere along the line. Many thanks in advance for your help! Regards, Gustaf Länkarta - structure(list(LANKOD = structure(c(11L, 19L, 10L, 13L, 21L, 7L, 9L, 18L, 8L, 3L, 16L, 6L, 5L, 4L, 15L, 2L, 20L, 17L, 1L, 14L, 12L), .Label = c(AB, AC, BD, C, D, E, F, G, H, I, K, M, N, O, S, T, U, W, X, Y, Z ), class = factor), Län = structure(c(1L, 4L, 3L, 5L, 6L, 7L, 8L, 2L, 9L, 10L, 20L, 21L, 13L, 14L, 15L, 16L, 17L, 18L, 12L, 19L, 11L), .Label = c(Blekinge län, Dalarnas län, Gotlands län, Gävleborgs län, Hallands län, Jämtlands län, Jönköpings län, Kalmar län, Kronobergs län, Norrbottens län, Skåne län, Stockholms län, Södermanlands län, Uppsala län, Värmlands län, Västerbottens län, Västernorrlands län, Västmanlands län, Västra Götalands län, Örebro län, Östergötlands län), class = factor)), .Names = c(LANKOD, Län), class = data.frame, row.names = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)) -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide
[R] Treatment of Date ODBC objects in R (RODBC)
Dear all, Retrieving an Oracle Date data type by means of RODBC (version 1.2-4) I get different classes in R depending on which operating system I am in: On MacOSX I get Date class On Windows I get POSIXt POSIXct class The problem is material, as converting the POSIXt POSIXct object with as.Date() returns one day less (2008-12-17 00:00:00 CET is returned as 2008-12-16). I have 2 related questions: 1. Is there a way to control the conversion used by RODBC for types Date? or is this controlled by the ODBC Driver (in my case the Oracle driver in Windows and Actual on Mac OS X)? 2. What is the trick to get as.Date() to return the _intended_ date (the date that the OS X environment correctly reads)? Many thanks in advance for any guidance. Best regards, Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregating along bins and bin-quantiles
Dear Mark and all interested, Unfortunately the code provided by Mark does not work - there is a syntax error when run as provided. I looked at possibly solving the problem, but without much knowledge of the output of split (looks like a list of lists, and not a list of data frames), it is difficult to identify where in the call to lapply the problem arises. The problem both in Mark's code and my original (with tapply) is on the format of the output of the call to an implicit loop. In fact I find this area of R one of the most obscure to my simplistic way of thinking (I would expect the output to have the same format as the input (data.frame to data.frame), but I am certain there must be good reasons for the way implicit loop functions return what they do). Any further help would be appreciated, as I may have to resort to some (less elegant) loop... Kind regards, Ivan On 22 Oct 2008, at 00:22, [EMAIL PROTECTED] wrote: Hi Ivan: I think I understand better so below is some new code but I'm still not totally sure that it's what you want. If not, then I think it brings you closer anyway ? the split function is very useful and I think that's what you need. let me know if below is what you needed. if it's close but not quite right, i can look at it again. it's not a problem. if i'm totally off, maybe you should resend to the list because that means I probably can't fix it. #= = = = = = = = = = == a - read.csv ( file = /opt/mark/research/equity/projects/R_mails/ example.csv , colClasses = c ( Date , numeric ) ) #beware of the path # SPLIT BY DATE # TO CREATE A LIST OF # DATAFRAMES DFlist - split(a,a$Date) print(str(DFlist)) # USE LAPPLY TO CALL cut AND # THEN aggregate ON EACH COMPONENT # DATAFRAME IN THE LIST tempresult - lapply(DFlist,function(.df) { .df$quantile - cut(.df$value,breaks=quantile(.df $value,probs=seq(0,1,0.1),na.rm=TRUE)) aggregate(.df$value,list(DATE=.df$Date,QUANTILE=.df$quantile),sum) }) # CHECK IF IT WORKED print(tempresult) # RBIBND EVERYTHING BACK TOGETHER # SO THAT ITS ONE DATAFRAME finalresult - do.call(rbind,tempresult) print(finalresult) On Tue, Oct 21, 2008 at 5:47 PM, Ivan Alves wrote: Hello Mark, Many thanks for the reply. Your suggestion is essentially equivalent to my first attempt: the quantiles are estimated for the WHOLE of the a.value column. Essentially what I would need is to first break down the value column by bins determined by the a.date column and THEN estimate the quantile for each bin. you see, I would need the quantiles for each data entry, not for all the entries, thus if there are 12 dates (or bins), then I would need 12x#10 deciles, not just 10. Kind regards, Ivan On 21 Oct 2008, at 22:20, [EMAIL PROTECTED] wrote: Hi: I still wasn't very clear on what you wanted but that might be because i didn't save your original email ? I doubt that below helps. i used cut instead of cut2 because I didn't have Hmisc loaded and I think cut does what you want ? Jim will probably later with a better answer. He's the real expert with this type of thing. I just like to practice. a - read.csv ( file = /opt/mark/research/equity/projects/ R_mails/ example.csv , colClasses = c ( Date , numeric ) ) a$quantile - cut(a$value,breaks=quantile(a $value,probs=seq(0,1,0.1),na.rm=TRUE)) aggregate(a$value,list(DATE=a$Date,QUANTILE=a$quantile),sum) On 21 Oct 2008, at 09:25, Ivan Alves wrote: Dear all, Thanks to Jim and Mark for suggesting including the reproducible code. Please note that the enclosed file would need to go to into the home folder or that the path for reading the CSV file be changed. I hope no encoding issues emerge when reading it. And the code library(Hmisc) #need the cut2 function to mark the quantile a given line belongs to a - read.csv(file = ~/example.csv, colClasses=c(Date,numeric)) #beware of the path dim(a) #should give [1] 50762 aggregate(a$value, list(Date = a[,Date],Quantile=cut2(a $value,g=10)),sum) #should give the sum by year but on the quantiles for the whole population aggregate(a$value, list(Date = a[,Date],Quantile=tapply(a $value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below Once again, many thanks for any help Ivan On 21 Oct 2008, at 02:40, jim holtman wrote: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. You need to at least post a subset of your data so that we can understand the data structures that you are using. 'dput' will create an easily readable format for posting your data (much easier than if you post the listing of a table). Usually it is some 'type mismatch' which says you really have to have the data to run the script against. On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves [EMAIL PROTECTED] wrote: Dear all, I would like
[R] coalesce columns within a data frame
Dear all, I searched the mail archives and the R site and found no guidance (tried merge, cbind and terms like coalesce with no success). There surely is a way to coalesce (like in SQL) columns in a dataframe, right? For example, I would like to go from a dataframe with two columns to one with only one as follows: From Name.x Name.y nx1 ny1 nx2 NA NA ny3 NA NA ... To Name nx1 nx2 ny3 NA ... where column Name.x is taken if there is a value, and if not then column Name.y Any help would be appreciated Kind regards, Ivan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coalesce columns within a data frame
Dear all, Thanks for all the replies. I get something with Duncan's code (slightly more compact than the other two), but of class integer, whereas the two inputs are class factor. Clearly the name information is lost. I did not see anything on this in the help page for ifelse. On this experience I also tried df$Name - df$NAME.x df[is.na(df$NAME.x),Name] - df[is.na(df $NAME.x),NAME.y] but then again the factor issue was a problem (clearly the levels are not the same and then there is a conflict) Any further guidance? Kind regards, Ivan On 22 Oct 2008, at 17:26, Duncan Murdoch wrote: On 10/22/2008 11:21 AM, Ivan Alves wrote: Dear all, I searched the mail archives and the R site and found no guidance (tried merge, cbind and terms like coalesce with no success). There surely is a way to coalesce (like in SQL) columns in a dataframe, right? For example, I would like to go from a dataframe with two columns to one with only one as follows: From Name.x Name.y nx1 ny1 nx2 NA NA ny3 NA NA ... To Name nx1 nx2 ny3 NA ... where column Name.x is taken if there is a value, and if not then column Name.y Any help would be appreciated I don't know of any special function to do that, but ifelse() can handle it easily: Name - ifelse(is.na(Name.x), Name.y, Name.x) (If those are columns of a dataframe named df, you'd prefix each column name by df$, or do within(df, Name - ifelse(is.na(Name.x), Name.y, Name.x)) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coalesce columns within a data frame
Many thanks to all for their help. Factors are indeed very tricky and sided on the conversion to character. Kind regards, Ivan On 22 Oct 2008, at 19:01, Duncan Murdoch wrote: On 10/22/2008 12:09 PM, Ivan Alves wrote: Dear all, Thanks for all the replies. I get something with Duncan's code (slightly more compact than the other two), but of class integer, whereas the two inputs are class factor. Clearly the name information is lost. I did not see anything on this in the help page for ifelse. It is there, in this warning: The mode of the result may depend on the value of 'test', and the class attribute of the result is taken from 'test' and may be inappropriate for the values selected from 'yes' and 'no'. You'd want the result to be a factor, but those attributes are lost. I think this is a result of two design flaws: ifelse() shouldn't base the class on the test, it should base it on the values. And factors in S and R have all sorts of problems. You can work around this by converting to character vectors: Name - ifelse(is.na(Name.x), as.character(Name.y), as.character(Name.x)) If you really want factors, you can convert back at the end, but why would you want to? Duncan Murdoch On this experience I also tried df$Name - df$NAME.x df[is.na(df$NAME.x),Name] - df[is.na(df $NAME.x),NAME.y] but then again the factor issue was a problem (clearly the levels are not the same and then there is a conflict) Any further guidance? Kind regards, Ivan On 22 Oct 2008, at 17:26, Duncan Murdoch wrote: On 10/22/2008 11:21 AM, Ivan Alves wrote: Dear all, I searched the mail archives and the R site and found no guidance (tried merge, cbind and terms like coalesce with no success). There surely is a way to coalesce (like in SQL) columns in a dataframe, right? For example, I would like to go from a dataframe with two columns to one with only one as follows: From Name.x Name.y nx1 ny1 nx2 NA NA ny3 NA NA ... To Name nx1 nx2 ny3 NA ... where column Name.x is taken if there is a value, and if not then column Name.y Any help would be appreciated I don't know of any special function to do that, but ifelse() can handle it easily: Name - ifelse(is.na(Name.x), Name.y, Name.x) (If those are columns of a dataframe named df, you'd prefix each column name by df$, or do within(df, Name - ifelse(is.na(Name.x), Name.y, Name.x)) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregating along bins and bin-quantiles
Dear all, Thanks to Jim and Mark for suggesting including the reproducible code. Please note that the enclosed file would need to go to into the home folder or that the path for reading the CSV file be changed. I hope no encoding issues emerge when reading it. And the code library(Hmisc) #need the cut2 function to mark the quantile a given line belongs to a - read.csv(file = ~/example.csv, colClasses=c(Date,numeric)) #beware of the path dim(a) #should give [1] 50762 aggregate(a$value, list(Date = a[,Date],Quantile=cut2(a $value,g=10)),sum) #should give the sum by year but on the quantiles for the whole population aggregate(a$value, list(Date = a[,Date],Quantile=tapply(a $value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below Once again, many thanks for any help Ivan On 21 Oct 2008, at 02:40, jim holtman wrote: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. You need to at least post a subset of your data so that we can understand the data structures that you are using. 'dput' will create an easily readable format for posting your data (much easier than if you post the listing of a table). Usually it is some 'type mismatch' which says you really have to have the data to run the script against. On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves [EMAIL PROTECTED] wrote: Dear all, I would like to aggregate a data frame (consisting of 2 columns - one for the bins, say factors, and one for the values) along bins and quantiles within the bins. I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=cut2(data.frame$bin,g=10)),sum) but then the quantiles apply to the population as a whole and not the individual bins. Upon this realisation I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=tapply(data.frame$values,data.frame $bin,cut2,g=10)),sum) which gives the following error: Error in sort.list(unique.default(x), na.last = TRUE) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? clearly I am doing something wrong, but cannot figure out what. I believe the error stems either from a. the output of tapply being a list of a dimension equal to the number of bins, and not a list of equal dimension as the values, or b. that somehow aggregate does not like that the second list (of the quantiles within the bins are not sorted nicely) 1. Do you have a reference for doing the summation on both bins and quantiles within the bins? 2. If not, can you give me some guidance as to what I am doing wrong and how I can solve the sort/list issue? Any help would be greatly appreciated Kind regards, Ivan Alves [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregating along bins and bin-quantiles
Dear all, I would like to aggregate a data frame (consisting of 2 columns - one for the bins, say factors, and one for the values) along bins and quantiles within the bins. I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=cut2(data.frame$bin,g=10)),sum) but then the quantiles apply to the population as a whole and not the individual bins. Upon this realisation I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum) which gives the following error: Error in sort.list(unique.default(x), na.last = TRUE) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? clearly I am doing something wrong, but cannot figure out what. I believe the error stems either from a. the output of tapply being a list of a dimension equal to the number of bins, and not a list of equal dimension as the values, or b. that somehow aggregate does not like that the second list (of the quantiles within the bins are not sorted nicely) 1. Do you have a reference for doing the summation on both bins and quantiles within the bins? 2. If not, can you give me some guidance as to what I am doing wrong and how I can solve the sort/list issue? Any help would be greatly appreciated Kind regards, Ivan Alves [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregating along bins and bin-quantiles
Apologies, just a typo in the first instruction (when translating the names), the question is still valid On 21 Oct 2008, at 00:38, Ivan Alves wrote: Dear all, I would like to aggregate a data frame (consisting of 2 columns - one for the bins, say factors, and one for the values) along bins and quantiles within the bins. I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=cut2(data.frame$values,g=10)),sum) but then the quantiles apply to the population as a whole and not the individual bins. Upon this realisation I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum) which gives the following error: Error in sort.list(unique.default(x), na.last = TRUE) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? clearly I am doing something wrong, but cannot figure out what. I believe the error stems either from a. the output of tapply being a list of a dimension equal to the number of bins, and not a list of equal dimension as the values, or b. that somehow aggregate does not like that the second list (of the quantiles within the bins which do not appear to be sorted nicely) 1. Do you have a reference for doing the summation on both bins and quantiles within the bins? 2. If not, can you give me some guidance as to what I am doing wrong and how I can solve the sort/list issue? Any help would be greatly appreciated Kind regards, Ivan Alves [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.