[R] apply with multiple references and database interactivity

2015-08-15 Thread Steve E.
Hi R Colleagues,

I have a small R script that relies on two for-loops to pull data from a
database, make some edits to the data returned from the query, then inserts
the updated data back into the database. The script works just fine, no
problems, except that I am striving to get away from loops, and to focus on
the apply family of tools. In this case, though, I did not know quite where
to start with apply. I wonder if someone more adept with apply would not
mind taking a look at this, and suggesting some tips as to how this could
have been accomplished with apply instead of nested loops. More details on
what the script is accomplishing are included below.

Thanks in advance for your help and consideration.


Steve

Here, I have a df that includes a list of keywords that need to be edited,
and the corresponding edit. The script goes through a database of people,
identifies whether any of the keywords associated with each person are in
the list of keywords to edit, and, if so, pulls in the list of keywords and
the person details, swaps the new keyword for the old keyword, then inserts
the updated keywords back into the database for that person (many keywords
are associated with each person, and they are in an array, hence the
somewhat complicated procedure). The if-statement provides a list of
keywords in the df that were not found in the database, and 'm' is just a
counter to help me know how many keywords the script changed.

for(i in 1:nrow(keywords)) {
  pull <- dbGetQuery(conn = con, statement = paste0("SELECT person_id,
expertise FROM people WHERE expertise RLIKE '; ", keywords[i, 2], ";'"))
  pull$expertise <- gsub(keywords[i, 2], keywords[i, 3], pull$expertise)
  if (nrow(pull)==0) {
sink('~/Desktop/r1', append = TRUE)
print(keywords[i, ]$keyword)
sink() } else
{
for (j in 1:nrow(pull)) {
dbSendQuery(conn = con, statement = paste0("UPDATE people SET expertise
= '", pull[j, ]$expertise, "' WHERE person_id = ", pull[j, ]$person_id)) }
  m=m+1
} }




--
View this message in context: 
http://r.789695.n4.nabble.com/apply-with-multiple-references-and-database-interactivity-tp4711148.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] modifying a package installed via GitHub

2015-07-17 Thread Steve E.
Hi Folks,

I am working with a package installed via GitHub that I would like to
modify. However, I am not sure how I would go about loading a 'local'
version of the package after I have modified it, and whether that process
would including uninstalling the original unmodified package (and,
conversely, how to uninstall my local, modified version if I wanted to go
back to the unmodified version available on GitHub).

Any advice would be appreciated.


Thanks,
Steve



--
View this message in context: 
http://r.789695.n4.nabble.com/modifying-a-package-installed-via-GitHub-tp4710016.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] operations on columns when data frames are in a list

2015-04-13 Thread Steve E.
Hello R folks,

I have recently discovered the power of working with multiple data frames in
lists. However, I am having trouble understanding how to perform operations
on individual columns of data frames in the list. For example, I have a
water quality data set (sample data included below) that consists of roughly
a dozen data frames. Some of the data frames have a chr column called
'Month' that I need to to convert to a date with the proper format. I would
like to iterate through all of the data frames in the list and format all of
those that have the 'Month' column. I can accomplish this with a for-loop
(e.g., below) but I cannot figure out how to do this with the plyr or apply
families. This is just one example of the formatting that I have to perform
so I would really like to avoid loops, and I would love to learn how to
better work with lists as well.

I would appreciate greatly any guidance.


Thank you and regards,
Stevan


a for-loop like this works, but is not an ideal solution:

for (i in 1:length(data)) {if ("Month" %in% names(data[[i]]))
data[[i]]$Month<- as.POSIXct(data[[i]]$Month, format="%Y/%m/%d")}



sample data (head of two data frames from the list of all data frames):

structure(list(`3D_Fluorescence.csv` = structure(list(ID = 1:6, 
Site_Number = c("R5", "R6a", "R8", "R9a", "R14", "R15"), 
Month = c("2001/10/01", "2001/10/01", "2001/10/01", "2001/10/01", 
"2001/10/01", "2001/10/01"), Exc_A = c(215L, 215L, NA, NA, 
215L, 215L), Em_A = c(422.5, 410.5, NA, NA, 408.5, 408), 
Fl_A = c(303, 296.86, NA, NA, 297.62, 174.75), Exc_B = c(325L, 
325L, NA, NA, 325L, 325L), Em_B = c(416, 413, NA, NA, 418.5, 
417.5), Fl_B = c(137.32, 116.1, NA, NA, 132.48, 77.44)), .Names =
c("ID", 
"Site_Number", "Month", "Exc_A", "Em_A", "Fl_A", "Exc_B", "Em_B", 
"Fl_B"), row.names = c(NA, 6L), class = "data.frame"), algae.csv =
structure(list(
ID = 1:6, SiteNumber = c("R1", "R2A", "R2B", "R3", "R4", 
"R5"), SiteLocation = c("CAP canal above Waddell Canal", 
"Lake Pleasant integrated sample", "Lake Pleasant integrated sample", 
"Waddell Canal", "Cap Canal at 7th St.", "Verde River btwn Horseshoe and
Bartlett"
), ClusterName = c("cap", "cap", "cap", "cap", "cap", "verde"
), SiteAcronym = c("cap-siphon", "pleasant-epi", "pleasant-hypo", 
"waddell canal", "cap @ 7th st", "verde abv bartlett"), Date =
c("1999/08/18", 
"1999/08/18", "1999/08/18", "1999/08/18", "1999/08/18", "1999/08/16"
), Month = c("1999/08/01", "1999/08/01", "1999/08/01", "1999/08/01", 
"1999/08/01", "1999/08/01"), SampleType = c("", "", "", "", 
"", ""), Conductance = c(800, 890, 850, 870, 830, 500), ChlA = c(0.3, 
0.3, 0.6, 0.8, 1.1, 7.6), Phaeophytin = c(0, 0, 0, 0, 0.7, 
4.7), PhaeophytinChlA = c(0.7, 0.7, 1.3, 5.3, 0.7, 4.7), 
Chlorophyta = c(0L, 0L, 18L, 0L, 0L, 21L), Cyanophyta = c(8L, 
0L, 0L, 0L, 7L, 79L), Bacillariophyta = c(135L, 76L, 0L, 
18L, 54L, 195L), Total = c(147L, 76L, 18L, 18L, 61L, 302L
), AlgaeComments = c("", "", "", "", "", "")), .Names = c("ID", 
"SiteNumber", "SiteLocation", "ClusterName", "SiteAcronym", "Date", 
"Month", "SampleType", "Conductance", "ChlA", "Phaeophytin", 
"PhaeophytinChlA", "Chlorophyta", "Cyanophyta", "Bacillariophyta", 
"Total", "AlgaeComments"), row.names = c(NA, 6L), class = "data.frame")),
.Names = c("3D_Fluorescence.csv", 
"algae.csv")) 



--
View this message in context: 
http://r.789695.n4.nabble.com/operations-on-columns-when-data-frames-are-in-a-list-tp4705757.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help incorporating data subset lengths in function with ddply

2014-04-17 Thread Steve E.
Jeff - Thanks so very much for the solution and tips, all very much
appreciated! Regards, Stevan



--
View this message in context: 
http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926p4688999.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help incorporating data subset lengths in function with ddply

2014-04-16 Thread Steve E.
Hi Frede - Thank you for responding. Not quite what I am after. Notice that I
included two data sets in my post, the first is the raw data whereas the
second (the desired df) is similar but has a column of sequential numbers in
another column at the end - that column of sequential numbers for each storm
(i.e., subset of data) is what I am after. Thanks again, Stevan



--
View this message in context: 
http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926p4688933.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help incorporating data subset lengths in function with ddply

2014-04-16 Thread Steve E.
Dear R Community,

I am having some trouble with a task that I hope you might be able to help
with. I have a dataset that includes the time and corresponding stream
discharge from numerous storms (example of structure with simplified data
below). I would like to produce a field that details the duration of each
storm, where each storm is a subset of the data and the duration runs from
zero to end for each unique storm. I have been trying to accomplish this
with ddply but to no avail as I am unable to provide ddply (e.g., below)
with the length of the storm (i.e., subset of data). Thank you in advance,
any help would be appreciated.


existing df:
storm,Q_time,Q
s1,2008-08-07 21:15:00,0.000
s1,2008-08-07 21:16:00,3.020
s1,2008-08-07 21:17:00,6.041
s1,2008-08-07 21:18:00,9.061
s1,2008-08-07 21:19:00,12.082
s1,2008-08-07 21:20:00,15.102
s1,2008-08-07 21:21:00,18.123
s1,2008-08-07 21:22:00,11.143
s1,2008-08-07 21:23:00,0.000
s2,2010-10-05 21:00:00,0.000
s2,2010-10-05 21:01:00,1.812
s2,2010-10-05 21:02:00,3.625
s2,2010-10-05 21:03:00,5.437
s2,2010-10-05 21:04:00,7.249
s2,2010-10-05 21:05:00,9.061
s2,2010-10-05 21:06:00,0.874
s2,2010-10-05 21:07:00,0.000

desired df:
storm,Q_time,Q, duration
s1,2008-08-07 21:15:00,0.000,1
s1,2008-08-07 21:16:00,3.020,2
s1,2008-08-07 21:17:00,6.041,3
s1,2008-08-07 21:18:00,9.061,4
s1,2008-08-07 21:19:00,12.082,5
s1,2008-08-07 21:20:00,15.102,6
s1,2008-08-07 21:21:00,18.123,7
s1,2008-08-07 21:22:00,11.143,8
s1,2008-08-07 21:23:00,0.000,9
s2,2010-10-05 21:00:00,0.000,1
s2,2010-10-05 21:01:00,1.812,2
s2,2010-10-05 21:02:00,3.625,3
s2,2010-10-05 21:03:00,5.437,4
s2,2010-10-05 21:04:00,7.249,5
s2,2010-10-05 21:05:00,9.061,6
s2,2010-10-05 21:06:00,0.874,7
s2,2010-10-05 21:07:00,0.000,8

I have been trying variations of the following statement, but I cannot seem
to get the length of the subset correct as I receive an error of the type
'Error: arguments imply differing number of rows: 2401, 0'.

newdf <- ddply(df, "storm", transform, FUN = function(x)
{duration=seq(from=1, by=1, length.out=nrow(x))})

I would really like to get a handle on ddply in this instance as it will be
quite helpful for many other similar calculations that I need to do with
this dataset.

Thanks again,
Stevan




--
View this message in context: 
http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with rle function on paired data

2012-06-08 Thread Steve E.
Dear R Community - I hope you might be able to provide some guidance
regarding the use of the rle function. I have a set of time-series data
where a measured value is recorded every 30 seconds after the start of an
experiment. Many of the measured values repeat and I am interested only in
the values when there is a change. If I turn the measured values into a
vector, the rle function works perfectly for this but I need also the
corresponding time of the value and I am not sure how to use rle on paired
data. Below is a brief example to help explain the problem. I thank you in
advance for any assistance you might be able to provide. Regards, Steve

Original dataset:

ElpsdTime, DataValue
0, 1
30, 1
60, 1
90, 2
120, 2
150, 3
180, 2
210, 3
240, 3
.
.

Desired dataset:

ElpTime DataValue
0, 1
90, 2
150, 3
180, 2
210, 3
.
.


--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-rle-function-on-paired-data-tp4632856.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help wrapping findInterval into a function

2011-12-08 Thread Steve E.
Michael (and others) - Right, 'within' did work, I had placed it in the wrong
location previously, which your example code made clear.  I wrapped several
of these functions within a function to address all the desired flags in a
single pass (probably horribly inefficient but it works).  Thanks again for
your most generous assistance.  Steve

--
View this message in context: 
http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4173391.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help wrapping findInterval into a function

2011-12-07 Thread Steve E.
forgot to attach the data set
http://r.789695.n4.nabble.com/file/n4170695/WaterData.txt WaterData.txt 

--
View this message in context: 
http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4170695.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help wrapping findInterval into a function

2011-12-07 Thread Steve E.
Thanks to everyone for continued assistance with this problem.  I realize
that I had not included enough information, hopefully I have done so here. 
I attached a dput output of a sample of the data titled 'WaterData' (and str
output below).  Below are dput outputs of the function I am trying to get
working and the resulting array when I run it.  Unfortunately, Michael,
changing 'with' to 'within' did not solve the problem, as running the
function in that case produced no discernible output or result.  What I
meant by the function now producing an array of values (though the result I
am looking for) that are not attached to the data frame, is that they show
up separately in a result window (in a similar format to what you get from
dput() and are not at all associated with the data frame).  Again, thanks so
much!

> dput(WQFunc)
function (dataframe) 
{
dataframe$CalcFlag <- with(dataframe, ifelse(variable == 
"CaD_ICP", (dataqualifier <- c("Y", "Q", "",
"A")[findInterval(dataframe$value, 
c(-Inf, 0.027, 0.1, 100, Inf))]), ""))
}

> str(WaterData)
'data.frame':   126 obs. of  5 variables:
 $ Site  : Factor w/ 6 levels "BV","CB","KP",..: 3 3 3 3 3 3 3 3 3 3
...
 $ Time  : Factor w/ 84 levels "0:00:00","0:00:52",..: 1 1 1 1 2 5
16 16 19 20 ...
 $ DateCorrectFmt: Factor w/ 9 levels "2010-08-17","2010-08-21",..: 4 8 1 3
8 5 5 8 8 8 ...
 $ variable  : Factor w/ 3 levels "CaD_ICP","NaD_ICP",..: 1 1 1 1 1 1 1
1 1 1 ...
 $ value : num  0.044 0.1316 0.0101 0.0114 80.13 ...

Below is the output I get if if I run the WQFunc as:
flagged <- WQFunc(WaterData)

> dput(Flagged)
c("Q", "", "Y", "Y", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""
)
> 
 

Again, though, 'Flagged' is an array of those values in a output window but
are not 'attached' to WaterData.

--
View this message in context: 
http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4170688.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help wrapping findInterval into a function

2011-12-06 Thread Steve E.
Bill (and David),

Thank you very much for taking the time to respond to my query.

You were right, I was creating and calling the function exactly as you had
predicted.  I revised the structure based on your suggestion.  It runs but
the output is an array of the flags that are not attached to the data frame,
not a new column in the data frame as was my intention.

So, the new configuration I tried was like this (where DataFrame is not a
real data frame but just the word "DataFrame"):

WQFlags <- function(DataFrame) {DataFrame$CalciumFlag <- with(DataFrame,
ifelse(variable == "CaD_ICP", (dataqualifier <- c("Y", 'Q', "", "A")
[findInterval(DataFrame$value, c(-Inf, 0.027, 0.1, 100, Inf))]),""))
}

I called it using:

WaterQualityData <- WQFlags(WaterQualityData)

Again, the output is simply an array of the flags, unattached to a data
frame.  Can you suggest a way to modify this to make it work as desired, or,
in the worst case, can I attach the resulting array of flag values?


Thank you again!

--
View this message in context: 
http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4166826.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help wrapping findInterval into a function

2011-12-06 Thread Steve E.
Dear R Community,

I hope you might be able to assist with a small problem creating a function. 
I am working with water-quality data sets that contain the concentration of
many different elements in water samples.  I need to assign quality-control
flags to values that fall into various concentration ranges.  Rather than a
web of nested if statements, I am employing the findInterval function to
identify values that need to be flagged and to assign the appropriate flag. 
The data consist of a sample identifier, the analysis, and corresponding
value.  The findInterval function works well; however, I would like to
incorporate it into a function so that I can run multiple findInterval
functions for many different water-quality analyses (and I have to do this
for many dataset) but it seems to fall apart when incorporated into a
function.

Run straighforward, the findInterval function works as desired, e.g. below,
creating the new CalciumFlag column with the appropriate flag for, in this
case, levels of calcium in the water:

WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))

However, it does not worked when wrapped in a function (no error messages
are thrown, it simply does not seem to do anything):

WQfunc <- function() {
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags 
<-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))
}

Calling the function WQfunc() does not produce an error but also does not
produce the expected CalciumFlag, it seems to not do anything.

Ultimately, what I need to get to is something like below where multiple
findInterval functions for different analyses are included in a single
function, then I can concatenate the results into a single column containing
all flags for all analyses, e.g.:

WQfunc <- function() {
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags 
<-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))

WQdata$SodiumFlag <- with(WQdata, ifelse(analysis == "Sodium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.050, 0.125, 125,
Inf))]),""))

WQdata$MagnesiumFlag <- with(WQdata, ifelse(analysis == "Magnesium", 
(flags
<- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.065, 0.15, 75,
Inf))]),""))

.etc for additional water-quality analyses...

}

As an aside, I started working with the findInterval tool from an example
that I found online but am not clear as to how the multi-component
configuration incorporating brackets actually works, can anyone suggest a
good resource that explains this?


I thank you very much for any assistance you may be able to provide.


Regards,
Steve

--
View this message in context: 
http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4165464.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help subsetting data based on date AND time

2011-09-08 Thread Steve E.
Dear R Community,

I am new to R, and have a question that I suspect may be quite simple but is
proving a formidable roadblock for me.  I have a large data set that
includes water-quality measurements collected over many 24-hour periods. 
The date and time of sample collection are in a combined Date/Time field in
the format -mm-dd hh:mm:ss.  I need to be able to subset the data for
analysis of different date and time windows.  Thus far, I have tried casting
the Date/Time field using several approaches, such as:

DataSet$NewDateTime <- strptime(DataSet$DateTime, '%Y-%m-%d %H:%M:%S')
DataSet$NewDateTime <- as.POSIXlt(strptime(DataSet$DateTime, '%Y-%m-%d
%H:%M:S'))

These instructions seem to cast the NewDateTime field correctly (at least it
appears to be in the correct format, and I assume R sees the field as a date
and a time) but I am then unable to subset the data using instructions such
as:

with(DataSet, subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:15:00'))
DataSubset <- subset(DataSet, DataSet$NewDateTime < '2004-08-05 14:00:00',
select = DataSet)

I have tried also separating the date and time fields in the input file, and
casting with instructions such as:

DataSet$NewTime <- strptime(DataSet$Time, '%H:%M:%S')
DataSet$NewTime <- as.POSIXct(strptime(DataSet$Time, '%H:%M:%S'))

but these seem to generate a NewTime field that contains today's date + the
time data, and also will not subset based on date/time.

I appreciate greatly any help and advice,
Steve

--
View this message in context: 
http://r.789695.n4.nabble.com/help-subsetting-data-based-on-date-AND-time-tp3799933p3799933.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.