I am using R as a data manipulation tool for a SQL database. So in some of my R scripts I use the RODBC package to retreive data, then run analysis, and use the sqlSave function in the RODBC package to store the results in a database.
There are two problems I want to avoid, and they are highly related: (1) having R rerun analysis which has already been done and saved into output database table, and (2) ending up with more than one identical row in my output database table. ------------------------------------- The analysis I am running allows the user to input a large number of variables, for example: date, version, a, b, c, d, e, f, g, ... After R completes its analysis, I write the results to a database table in the format: Value, date, version, a, b, c, d, e, f, g, ... where Value is the result of the R analysis, and the rest of the columns are the criteria that was used to get that value. -------------------------------------- Can anyone think of a way to address these problems? The only thing I can think of so far is to run an sqlQuery to get a table of all the variable combinations that are saved at the start, and then simply avoid computing and re-outputing those results. However, my results database table currently has over 200K rows (and will grow very quickly as I keep going with this project), so I think that would not be the most expeditious answer as I think just the SQL query to download 200K rows x 10+ columns is going to be time consuming in and of itself. I know this is kindof a weird problem, and am open to all sorts of ideas... Thanks! [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.