[R] apply with multiple references and database interactivity
Hi R Colleagues, I have a small R script that relies on two for-loops to pull data from a database, make some edits to the data returned from the query, then inserts the updated data back into the database. The script works just fine, no problems, except that I am striving to get away from loops, and to focus on the apply family of tools. In this case, though, I did not know quite where to start with apply. I wonder if someone more adept with apply would not mind taking a look at this, and suggesting some tips as to how this could have been accomplished with apply instead of nested loops. More details on what the script is accomplishing are included below. Thanks in advance for your help and consideration. Steve Here, I have a df that includes a list of keywords that need to be edited, and the corresponding edit. The script goes through a database of people, identifies whether any of the keywords associated with each person are in the list of keywords to edit, and, if so, pulls in the list of keywords and the person details, swaps the new keyword for the old keyword, then inserts the updated keywords back into the database for that person (many keywords are associated with each person, and they are in an array, hence the somewhat complicated procedure). The if-statement provides a list of keywords in the df that were not found in the database, and 'm' is just a counter to help me know how many keywords the script changed. for(i in 1:nrow(keywords)) { pull <- dbGetQuery(conn = con, statement = paste0("SELECT person_id, expertise FROM people WHERE expertise RLIKE '; ", keywords[i, 2], ";'")) pull$expertise <- gsub(keywords[i, 2], keywords[i, 3], pull$expertise) if (nrow(pull)==0) { sink('~/Desktop/r1', append = TRUE) print(keywords[i, ]$keyword) sink() } else { for (j in 1:nrow(pull)) { dbSendQuery(conn = con, statement = paste0("UPDATE people SET expertise = '", pull[j, ]$expertise, "' WHERE person_id = ", pull[j, ]$person_id)) } m=m+1 } } -- View this message in context: http://r.789695.n4.nabble.com/apply-with-multiple-references-and-database-interactivity-tp4711148.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] modifying a package installed via GitHub
Hi Folks, I am working with a package installed via GitHub that I would like to modify. However, I am not sure how I would go about loading a 'local' version of the package after I have modified it, and whether that process would including uninstalling the original unmodified package (and, conversely, how to uninstall my local, modified version if I wanted to go back to the unmodified version available on GitHub). Any advice would be appreciated. Thanks, Steve -- View this message in context: http://r.789695.n4.nabble.com/modifying-a-package-installed-via-GitHub-tp4710016.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] operations on columns when data frames are in a list
Hello R folks, I have recently discovered the power of working with multiple data frames in lists. However, I am having trouble understanding how to perform operations on individual columns of data frames in the list. For example, I have a water quality data set (sample data included below) that consists of roughly a dozen data frames. Some of the data frames have a chr column called 'Month' that I need to to convert to a date with the proper format. I would like to iterate through all of the data frames in the list and format all of those that have the 'Month' column. I can accomplish this with a for-loop (e.g., below) but I cannot figure out how to do this with the plyr or apply families. This is just one example of the formatting that I have to perform so I would really like to avoid loops, and I would love to learn how to better work with lists as well. I would appreciate greatly any guidance. Thank you and regards, Stevan a for-loop like this works, but is not an ideal solution: for (i in 1:length(data)) {if ("Month" %in% names(data[[i]])) data[[i]]$Month<- as.POSIXct(data[[i]]$Month, format="%Y/%m/%d")} sample data (head of two data frames from the list of all data frames): structure(list(`3D_Fluorescence.csv` = structure(list(ID = 1:6, Site_Number = c("R5", "R6a", "R8", "R9a", "R14", "R15"), Month = c("2001/10/01", "2001/10/01", "2001/10/01", "2001/10/01", "2001/10/01", "2001/10/01"), Exc_A = c(215L, 215L, NA, NA, 215L, 215L), Em_A = c(422.5, 410.5, NA, NA, 408.5, 408), Fl_A = c(303, 296.86, NA, NA, 297.62, 174.75), Exc_B = c(325L, 325L, NA, NA, 325L, 325L), Em_B = c(416, 413, NA, NA, 418.5, 417.5), Fl_B = c(137.32, 116.1, NA, NA, 132.48, 77.44)), .Names = c("ID", "Site_Number", "Month", "Exc_A", "Em_A", "Fl_A", "Exc_B", "Em_B", "Fl_B"), row.names = c(NA, 6L), class = "data.frame"), algae.csv = structure(list( ID = 1:6, SiteNumber = c("R1", "R2A", "R2B", "R3", "R4", "R5"), SiteLocation = c("CAP canal above Waddell Canal", "Lake Pleasant integrated sample", "Lake Pleasant integrated sample", "Waddell Canal", "Cap Canal at 7th St.", "Verde River btwn Horseshoe and Bartlett" ), ClusterName = c("cap", "cap", "cap", "cap", "cap", "verde" ), SiteAcronym = c("cap-siphon", "pleasant-epi", "pleasant-hypo", "waddell canal", "cap @ 7th st", "verde abv bartlett"), Date = c("1999/08/18", "1999/08/18", "1999/08/18", "1999/08/18", "1999/08/18", "1999/08/16" ), Month = c("1999/08/01", "1999/08/01", "1999/08/01", "1999/08/01", "1999/08/01", "1999/08/01"), SampleType = c("", "", "", "", "", ""), Conductance = c(800, 890, 850, 870, 830, 500), ChlA = c(0.3, 0.3, 0.6, 0.8, 1.1, 7.6), Phaeophytin = c(0, 0, 0, 0, 0.7, 4.7), PhaeophytinChlA = c(0.7, 0.7, 1.3, 5.3, 0.7, 4.7), Chlorophyta = c(0L, 0L, 18L, 0L, 0L, 21L), Cyanophyta = c(8L, 0L, 0L, 0L, 7L, 79L), Bacillariophyta = c(135L, 76L, 0L, 18L, 54L, 195L), Total = c(147L, 76L, 18L, 18L, 61L, 302L ), AlgaeComments = c("", "", "", "", "", "")), .Names = c("ID", "SiteNumber", "SiteLocation", "ClusterName", "SiteAcronym", "Date", "Month", "SampleType", "Conductance", "ChlA", "Phaeophytin", "PhaeophytinChlA", "Chlorophyta", "Cyanophyta", "Bacillariophyta", "Total", "AlgaeComments"), row.names = c(NA, 6L), class = "data.frame")), .Names = c("3D_Fluorescence.csv", "algae.csv")) -- View this message in context: http://r.789695.n4.nabble.com/operations-on-columns-when-data-frames-are-in-a-list-tp4705757.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help incorporating data subset lengths in function with ddply
Jeff - Thanks so very much for the solution and tips, all very much appreciated! Regards, Stevan -- View this message in context: http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926p4688999.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help incorporating data subset lengths in function with ddply
Hi Frede - Thank you for responding. Not quite what I am after. Notice that I included two data sets in my post, the first is the raw data whereas the second (the desired df) is similar but has a column of sequential numbers in another column at the end - that column of sequential numbers for each storm (i.e., subset of data) is what I am after. Thanks again, Stevan -- View this message in context: http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926p4688933.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help incorporating data subset lengths in function with ddply
Dear R Community, I am having some trouble with a task that I hope you might be able to help with. I have a dataset that includes the time and corresponding stream discharge from numerous storms (example of structure with simplified data below). I would like to produce a field that details the duration of each storm, where each storm is a subset of the data and the duration runs from zero to end for each unique storm. I have been trying to accomplish this with ddply but to no avail as I am unable to provide ddply (e.g., below) with the length of the storm (i.e., subset of data). Thank you in advance, any help would be appreciated. existing df: storm,Q_time,Q s1,2008-08-07 21:15:00,0.000 s1,2008-08-07 21:16:00,3.020 s1,2008-08-07 21:17:00,6.041 s1,2008-08-07 21:18:00,9.061 s1,2008-08-07 21:19:00,12.082 s1,2008-08-07 21:20:00,15.102 s1,2008-08-07 21:21:00,18.123 s1,2008-08-07 21:22:00,11.143 s1,2008-08-07 21:23:00,0.000 s2,2010-10-05 21:00:00,0.000 s2,2010-10-05 21:01:00,1.812 s2,2010-10-05 21:02:00,3.625 s2,2010-10-05 21:03:00,5.437 s2,2010-10-05 21:04:00,7.249 s2,2010-10-05 21:05:00,9.061 s2,2010-10-05 21:06:00,0.874 s2,2010-10-05 21:07:00,0.000 desired df: storm,Q_time,Q, duration s1,2008-08-07 21:15:00,0.000,1 s1,2008-08-07 21:16:00,3.020,2 s1,2008-08-07 21:17:00,6.041,3 s1,2008-08-07 21:18:00,9.061,4 s1,2008-08-07 21:19:00,12.082,5 s1,2008-08-07 21:20:00,15.102,6 s1,2008-08-07 21:21:00,18.123,7 s1,2008-08-07 21:22:00,11.143,8 s1,2008-08-07 21:23:00,0.000,9 s2,2010-10-05 21:00:00,0.000,1 s2,2010-10-05 21:01:00,1.812,2 s2,2010-10-05 21:02:00,3.625,3 s2,2010-10-05 21:03:00,5.437,4 s2,2010-10-05 21:04:00,7.249,5 s2,2010-10-05 21:05:00,9.061,6 s2,2010-10-05 21:06:00,0.874,7 s2,2010-10-05 21:07:00,0.000,8 I have been trying variations of the following statement, but I cannot seem to get the length of the subset correct as I receive an error of the type 'Error: arguments imply differing number of rows: 2401, 0'. newdf <- ddply(df, "storm", transform, FUN = function(x) {duration=seq(from=1, by=1, length.out=nrow(x))}) I would really like to get a handle on ddply in this instance as it will be quite helpful for many other similar calculations that I need to do with this dataset. Thanks again, Stevan -- View this message in context: http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with rle function on paired data
Dear R Community - I hope you might be able to provide some guidance regarding the use of the rle function. I have a set of time-series data where a measured value is recorded every 30 seconds after the start of an experiment. Many of the measured values repeat and I am interested only in the values when there is a change. If I turn the measured values into a vector, the rle function works perfectly for this but I need also the corresponding time of the value and I am not sure how to use rle on paired data. Below is a brief example to help explain the problem. I thank you in advance for any assistance you might be able to provide. Regards, Steve Original dataset: ElpsdTime, DataValue 0, 1 30, 1 60, 1 90, 2 120, 2 150, 3 180, 2 210, 3 240, 3 . . Desired dataset: ElpTime DataValue 0, 1 90, 2 150, 3 180, 2 210, 3 . . -- View this message in context: http://r.789695.n4.nabble.com/help-with-rle-function-on-paired-data-tp4632856.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help wrapping findInterval into a function
Michael (and others) - Right, 'within' did work, I had placed it in the wrong location previously, which your example code made clear. I wrapped several of these functions within a function to address all the desired flags in a single pass (probably horribly inefficient but it works). Thanks again for your most generous assistance. Steve -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4173391.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help wrapping findInterval into a function
forgot to attach the data set http://r.789695.n4.nabble.com/file/n4170695/WaterData.txt WaterData.txt -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4170695.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help wrapping findInterval into a function
Thanks to everyone for continued assistance with this problem. I realize that I had not included enough information, hopefully I have done so here. I attached a dput output of a sample of the data titled 'WaterData' (and str output below). Below are dput outputs of the function I am trying to get working and the resulting array when I run it. Unfortunately, Michael, changing 'with' to 'within' did not solve the problem, as running the function in that case produced no discernible output or result. What I meant by the function now producing an array of values (though the result I am looking for) that are not attached to the data frame, is that they show up separately in a result window (in a similar format to what you get from dput() and are not at all associated with the data frame). Again, thanks so much! > dput(WQFunc) function (dataframe) { dataframe$CalcFlag <- with(dataframe, ifelse(variable == "CaD_ICP", (dataqualifier <- c("Y", "Q", "", "A")[findInterval(dataframe$value, c(-Inf, 0.027, 0.1, 100, Inf))]), "")) } > str(WaterData) 'data.frame': 126 obs. of 5 variables: $ Site : Factor w/ 6 levels "BV","CB","KP",..: 3 3 3 3 3 3 3 3 3 3 ... $ Time : Factor w/ 84 levels "0:00:00","0:00:52",..: 1 1 1 1 2 5 16 16 19 20 ... $ DateCorrectFmt: Factor w/ 9 levels "2010-08-17","2010-08-21",..: 4 8 1 3 8 5 5 8 8 8 ... $ variable : Factor w/ 3 levels "CaD_ICP","NaD_ICP",..: 1 1 1 1 1 1 1 1 1 1 ... $ value : num 0.044 0.1316 0.0101 0.0114 80.13 ... Below is the output I get if if I run the WQFunc as: flagged <- WQFunc(WaterData) > dput(Flagged) c("Q", "", "Y", "Y", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "" ) > Again, though, 'Flagged' is an array of those values in a output window but are not 'attached' to WaterData. -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4170688.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help wrapping findInterval into a function
Bill (and David), Thank you very much for taking the time to respond to my query. You were right, I was creating and calling the function exactly as you had predicted. I revised the structure based on your suggestion. It runs but the output is an array of the flags that are not attached to the data frame, not a new column in the data frame as was my intention. So, the new configuration I tried was like this (where DataFrame is not a real data frame but just the word "DataFrame"): WQFlags <- function(DataFrame) {DataFrame$CalciumFlag <- with(DataFrame, ifelse(variable == "CaD_ICP", (dataqualifier <- c("Y", 'Q', "", "A") [findInterval(DataFrame$value, c(-Inf, 0.027, 0.1, 100, Inf))]),"")) } I called it using: WaterQualityData <- WQFlags(WaterQualityData) Again, the output is simply an array of the flags, unattached to a data frame. Can you suggest a way to modify this to make it work as desired, or, in the worst case, can I attach the resulting array of flag values? Thank you again! -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4166826.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help wrapping findInterval into a function
Dear R Community, I hope you might be able to assist with a small problem creating a function. I am working with water-quality data sets that contain the concentration of many different elements in water samples. I need to assign quality-control flags to values that fall into various concentration ranges. Rather than a web of nested if statements, I am employing the findInterval function to identify values that need to be flagged and to assign the appropriate flag. The data consist of a sample identifier, the analysis, and corresponding value. The findInterval function works well; however, I would like to incorporate it into a function so that I can run multiple findInterval functions for many different water-quality analyses (and I have to do this for many dataset) but it seems to fall apart when incorporated into a function. Run straighforward, the findInterval function works as desired, e.g. below, creating the new CalciumFlag column with the appropriate flag for, in this case, levels of calcium in the water: WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100, Inf))]),"")) However, it does not worked when wrapped in a function (no error messages are thrown, it simply does not seem to do anything): WQfunc <- function() { WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100, Inf))]),"")) } Calling the function WQfunc() does not produce an error but also does not produce the expected CalciumFlag, it seems to not do anything. Ultimately, what I need to get to is something like below where multiple findInterval functions for different analyses are included in a single function, then I can concatenate the results into a single column containing all flags for all analyses, e.g.: WQfunc <- function() { WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100, Inf))]),"")) WQdata$SodiumFlag <- with(WQdata, ifelse(analysis == "Sodium", (flags <- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.050, 0.125, 125, Inf))]),"")) WQdata$MagnesiumFlag <- with(WQdata, ifelse(analysis == "Magnesium", (flags <- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.065, 0.15, 75, Inf))]),"")) .etc for additional water-quality analyses... } As an aside, I started working with the findInterval tool from an example that I found online but am not clear as to how the multi-component configuration incorporating brackets actually works, can anyone suggest a good resource that explains this? I thank you very much for any assistance you may be able to provide. Regards, Steve -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4165464.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help subsetting data based on date AND time
Dear R Community, I am new to R, and have a question that I suspect may be quite simple but is proving a formidable roadblock for me. I have a large data set that includes water-quality measurements collected over many 24-hour periods. The date and time of sample collection are in a combined Date/Time field in the format -mm-dd hh:mm:ss. I need to be able to subset the data for analysis of different date and time windows. Thus far, I have tried casting the Date/Time field using several approaches, such as: DataSet$NewDateTime <- strptime(DataSet$DateTime, '%Y-%m-%d %H:%M:%S') DataSet$NewDateTime <- as.POSIXlt(strptime(DataSet$DateTime, '%Y-%m-%d %H:%M:S')) These instructions seem to cast the NewDateTime field correctly (at least it appears to be in the correct format, and I assume R sees the field as a date and a time) but I am then unable to subset the data using instructions such as: with(DataSet, subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:15:00')) DataSubset <- subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:00:00', select = DataSet) I have tried also separating the date and time fields in the input file, and casting with instructions such as: DataSet$NewTime <- strptime(DataSet$Time, '%H:%M:%S') DataSet$NewTime <- as.POSIXct(strptime(DataSet$Time, '%H:%M:%S')) but these seem to generate a NewTime field that contains today's date + the time data, and also will not subset based on date/time. I appreciate greatly any help and advice, Steve -- View this message in context: http://r.789695.n4.nabble.com/help-subsetting-data-based-on-date-AND-time-tp3799933p3799933.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.