I am using R as a data manipulation tool for a SQL database.  So in some of
my R scripts I use the RODBC package to retreive data, then run analysis,
and use the sqlSave function in the RODBC package to store the results in a
database.

There are two problems I want to avoid, and they are highly related: (1)
having R rerun analysis which has already been done and saved into output
database table, and (2) ending up with more than one identical row in
my output database table.

-------------------------------------
The analysis I am running allows the user to input a large number of
variables, for example:
date, version, a, b, c, d, e, f, g, ...

After R completes its analysis, I write the results to a database table in
the format:
Value, date, version, a, b, c, d, e, f, g, ...

where Value is the result of the R analysis, and the rest of the columns are
the criteria that was used to get that value.
--------------------------------------

Can anyone think of a way to address these problems?  The only thing I can
think of so far is to run an sqlQuery to get a table of all the variable
combinations that are saved at the start, and then simply avoid computing
and re-outputing those results.  However, my results database table
currently has over 200K rows (and will grow very quickly as I keep going
with this project), so I think that would not be the most expeditious answer
as I think just the SQL query to download 200K rows x 10+ columns is going
to be time consuming in and of itself.

I know this is kindof a weird problem, and am open to all sorts of ideas...

Thanks!

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to