[R] tsmp::find_discord help
147171, 234.080896296026, 238.081307704421, 240.863466976117, 240.980620885221, 242.865790697373, 242.192567156209, 244.400966976397, 230.117940277327, 246.551293436438, 248.382975894492, 251.654291217308, 256.004255625047, 257.742864777334, 262.281011258299, 261.284388574678, 261.408284877567, 262.63402390047, 262.027830331167, 263.04610984805, 264.671093973145, 264.421757659875, 265.237929494493, 266.872099533025, 264.696907634661, 264.260035745148, 263.499976769043, 262.425159591483, 260.981390716648, 261.082354369201, 259.278915649932, 259.796834945399, 257.282395334402, 255.692769738659, 256.477281617327, 256.18586372775, 255.027394031128, 253.672028350737, 253.725117554283, 253.678845994268, 252.442158376286, 252.079511253303, 254.582988659106, 254.874491709983, 252.931852529524, 252.177873732708, 251.521926319506, 252.912911842065, 253.201235300861, 251.230596607132, 250.384639365599, 251.930171640497, 247.943138212385, 227.360327759245, 236.807827285165, 236.256607593177, 245.676176491845, 248.029262333829, 246.607130076038, 247.609178481344, 247.744748076377, 247.872489046957, 247.48352803234, 249.194618296623, 249.265455424832, 249.594569674553, 250.408728755265, 247.515878390195, 248.641169324564, 249.340687829629, 249.36502006175, 250.620096849697, 250.924085881794, 251.446678319806, 252.231693277322, 253.722321051266, 251.152605426591, 254.144238363625, 255.721436826838, 255.942811810924, 256.764821865549, 257.00394097222, 258.391524989996, 259.436215976905, 260.193549101287, 261.534455659939, 261.899602734251, 262.616657174518, 263.481273697829, 264.543112079194, 266.521162025398, 267.961329632392, 266.7146434505, 264.785401400272, 265.351171001932, 270.092530136788, 262.900027005747, 258.104321945971, 243.820629691239, 251.983970767446, 255.462414786033, 260.955431247223, 262.743916866882, 262.536868907092, 263.482801351137, 260.690328582004, 259.374907453964, 259.448360515013, 260.766381761152, 266.51212834334, 261.178649726696, 262.13204070949, 260.329801393952, 257.072608946357, 256.699290173501, 259.76781842229, 259.634321800619, 256.142043520836, 253.331642591441, 252.409818426427, 253.606348613091, 254.796218211763, 255.09623971465, 256.665105978632, 255.078059232282, 254.524220743589, 252.710077736247, 252.699360275641, 252.984852506546, 251.256105145533, 249.920269056596, 244.926144783245, 233.956022891542, 220.813007885031, 215.895915163588, 219.629850208992, 226.529124786425, 235.032123494009, 250.546880197711, 254.095666445652, 253.991281213844, 257.473874318134, 255.711023276206, 254.706592350919, 255.897299568681, 255.423667051829, 254.964414633531, 252.309596570162, 250.816389805404, 252.822825341718, 252.433744078688, 253.788538636919, 253.051533051301, 252.259801258752, 251.700941357203, 252.480709904758, 253.304639213951, 253.813290173747, 252.224179178895, 252.5256305716, 252.953270357102, 253.056596744107, 250.887811907567, 249.785344008962, 249.608008671692, 250.198318963079, 249.545803866489, 248.918121550092, 250.413202223321, 248.905568386056, 249.915943725407, 249.996626693197, 248.072932397947, 254.216837446764, 253.47859208961, 254.138987503108, 252.998968019197, 252.145336908707, 253.886762010027, 256.153072158387, 253.416947685881, 256.800148054864, 259.224442075146, 259.926382599212, 260.744060715055, 261.333067704411, 261.70551071479, 260.628314612014, 260.506542761438, 263.354614044726, 266.13290756708, 262.400702435337, 268.301353736501, 267.563565765135, 265.179775401112, 267.50084469174, 265.084311419213, 265.91631323141, 265.908458636794, 267.454699323839, 270.140777570102, 270.156144841528, 269.984118177928, 269.369715720695, 269.112561377231, 267.868800598569, 270.406678517722, 269.055867271265, 265.774697500886, 265.295737018483, 264.470831104601, 265.450849219086, 267.017599059781, 266.64912298019, 263.305864860443, 261.791948233685, 260.177485275595, 261.369886779133, 259.69575348706, 259.332128965948, 255.718119134055, 256.938581692474, 256.615531389322 ) mp.obj <- stomp(myData, window_size = windowSize) find_discord(mp.obj, data=input.dt$MeasurementValue) ## Warning in dist_profile(data, data, nn, window_size = .mp$w, index = discord_idx) : ## Warning: Result may be inconsistent if the size of the queries are different. ## Error in fast_avg_sd(data, window_size) : ## 'window_size' must be at least 2. > *David Katz*, TIBCO Data Science 1.541.324.7417 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tsmp::find_discord failure
555921, 226.583060863288, 229.061297126161, 231.837136147171, 234.080896296026, 238.081307704421, 240.863466976117, 240.980620885221, 242.865790697373, 242.192567156209, 244.400966976397, 230.117940277327, 246.551293436438, 248.382975894492, 251.654291217308, 256.004255625047, 257.742864777334, 262.281011258299, 261.284388574678, 261.408284877567, 262.63402390047, 262.027830331167, 263.04610984805, 264.671093973145, 264.421757659875, 265.237929494493, 266.872099533025, 264.696907634661, 264.260035745148, 263.499976769043, 262.425159591483, 260.981390716648, 261.082354369201, 259.278915649932, 259.796834945399, 257.282395334402, 255.692769738659, 256.477281617327, 256.18586372775, 255.027394031128, 253.672028350737, 253.725117554283, 253.678845994268, 252.442158376286, 252.079511253303, 254.582988659106, 254.874491709983, 252.931852529524, 252.177873732708, 251.521926319506, 252.912911842065, 253.201235300861, 251.230596607132, 250.384639365599, 251.930171640497, 247.943138212385, 227.360327759245, 236.807827285165, 236.256607593177, 245.676176491845, 248.029262333829, 246.607130076038, 247.609178481344, 247.744748076377, 247.872489046957, 247.48352803234, 249.194618296623, 249.265455424832, 249.594569674553, 250.408728755265, 247.515878390195, 248.641169324564, 249.340687829629, 249.36502006175, 250.620096849697, 250.924085881794, 251.446678319806, 252.231693277322, 253.722321051266, 251.152605426591, 254.144238363625, 255.721436826838, 255.942811810924, 256.764821865549, 257.00394097222, 258.391524989996, 259.436215976905, 260.193549101287, 261.534455659939, 261.899602734251, 262.616657174518, 263.481273697829, 264.543112079194, 266.521162025398, 267.961329632392, 266.7146434505, 264.785401400272, 265.351171001932, 270.092530136788, 262.900027005747, 258.104321945971, 243.820629691239, 251.983970767446, 255.462414786033, 260.955431247223, 262.743916866882, 262.536868907092, 263.482801351137, 260.690328582004, 259.374907453964, 259.448360515013, 260.766381761152, 266.51212834334, 261.178649726696, 262.13204070949, 260.329801393952, 257.072608946357, 256.699290173501, 259.76781842229, 259.634321800619, 256.142043520836, 253.331642591441, 252.409818426427, 253.606348613091, 254.796218211763, 255.09623971465, 256.665105978632, 255.078059232282, 254.524220743589, 252.710077736247, 252.699360275641, 252.984852506546, 251.256105145533, 249.920269056596, 244.926144783245, 233.956022891542, 220.813007885031, 215.895915163588, 219.629850208992, 226.529124786425, 235.032123494009, 250.546880197711, 254.095666445652, 253.991281213844, 257.473874318134, 255.711023276206, 254.706592350919, 255.897299568681, 255.423667051829, 254.964414633531, 252.309596570162, 250.816389805404, 252.822825341718, 252.433744078688, 253.788538636919, 253.051533051301, 252.259801258752, 251.700941357203, 252.480709904758, 253.304639213951, 253.813290173747, 252.224179178895, 252.5256305716, 252.953270357102, 253.056596744107, 250.887811907567, 249.785344008962, 249.608008671692, 250.198318963079, 249.545803866489, 248.918121550092, 250.413202223321, 248.905568386056, 249.915943725407, 249.996626693197, 248.072932397947, 254.216837446764, 253.47859208961, 254.138987503108, 252.998968019197, 252.145336908707, 253.886762010027, 256.153072158387, 253.416947685881, 256.800148054864, 259.224442075146, 259.926382599212, 260.744060715055, 261.333067704411, 261.70551071479, 260.628314612014, 260.506542761438, 263.354614044726, 266.13290756708, 262.400702435337, 268.301353736501, 267.563565765135, 265.179775401112, 267.50084469174, 265.084311419213, 265.91631323141, 265.908458636794, 267.454699323839, 270.140777570102, 270.156144841528, 269.984118177928, 269.369715720695, 269.112561377231, 267.868800598569, 270.406678517722, 269.055867271265, 265.774697500886, 265.295737018483, 264.470831104601, 265.450849219086, 267.017599059781, 266.64912298019, 263.305864860443, 261.791948233685, 260.177485275595, 261.369886779133, 259.69575348706, 259.332128965948, 255.718119134055, 256.938581692474, 256.615531389322 ) library(tsmp) mp.obj <- stomp(myData, window_size = windowSize) #ok find_discord(mp.obj, data=input.dt$MeasurementValue) ## Warning in dist_profile(data, data, nn, window_size = .mp$w, index = discord_idx) : ## Warning: Result may be inconsistent if the size of the queries are different. ## Error in fast_avg_sd(data, window_size) : ## 'window_size' must be at least 2. > *David Katz*, TIBCO Data Science 1.541.324.7417 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem using tsmp::find_discord
214007, 221.640786242951, 219.97023160113, 210.878755430737, 179.209233482601, 144.494342460437, 163.888149251789, 186.003606780246, 189.352081675269, 194.529081804026, 196.759613684891, 223.580900584348, 227.607943509612, 218.162016380718, 211.594753752323, 214.163046110841, 215.744377558026, 227.310642507114, 227.799354057526, 227.551336807385, 226.132788368966, 221.52987925848, 218.445829530153, 218.89121471229, 220.106824906403, 224.207053555921, 226.583060863288, 229.061297126161, 231.837136147171, 234.080896296026, 238.081307704421, 240.863466976117, 240.980620885221, 242.865790697373, 242.192567156209, 244.400966976397, 230.117940277327, 246.551293436438, 248.382975894492, 251.654291217308, 256.004255625047, 257.742864777334, 262.281011258299, 261.284388574678, 261.408284877567, 262.63402390047, 262.027830331167, 263.04610984805, 264.671093973145, 264.421757659875, 265.237929494493, 266.872099533025, 264.696907634661, 264.260035745148, 263.499976769043, 262.425159591483, 260.981390716648, 261.082354369201, 259.278915649932, 259.796834945399, 257.282395334402, 255.692769738659, 256.477281617327, 256.18586372775, 255.027394031128, 253.672028350737, 253.725117554283, 253.678845994268, 252.442158376286, 252.079511253303, 254.582988659106, 254.874491709983, 252.931852529524, 252.177873732708, 251.521926319506, 252.912911842065, 253.201235300861, 251.230596607132, 250.384639365599, 251.930171640497, 247.943138212385, 227.360327759245, 236.807827285165, 236.256607593177, 245.676176491845, 248.029262333829, 246.607130076038, 247.609178481344, 247.744748076377, 247.872489046957, 247.48352803234, 249.194618296623, 249.265455424832, 249.594569674553, 250.408728755265, 247.515878390195, 248.641169324564, 249.340687829629, 249.36502006175, 250.620096849697, 250.924085881794, 251.446678319806, 252.231693277322, 253.722321051266, 251.152605426591, 254.144238363625, 255.721436826838, 255.942811810924, 256.764821865549, 257.00394097222, 258.391524989996, 259.436215976905, 260.193549101287, 261.534455659939, 261.899602734251, 262.616657174518, 263.481273697829, 264.543112079194, 266.521162025398, 267.961329632392, 266.7146434505, 264.785401400272, 265.351171001932, 270.092530136788, 262.900027005747, 258.104321945971, 243.820629691239, 251.983970767446, 255.462414786033, 260.955431247223, 262.743916866882, 262.536868907092, 263.482801351137, 260.690328582004, 259.374907453964, 259.448360515013, 260.766381761152, 266.51212834334, 261.178649726696, 262.13204070949, 260.329801393952, 257.072608946357, 256.699290173501, 259.76781842229, 259.634321800619, 256.142043520836, 253.331642591441, 252.409818426427, 253.606348613091, 254.796218211763, 255.09623971465, 256.665105978632, 255.078059232282, 254.524220743589, 252.710077736247, 252.699360275641, 252.984852506546, 251.256105145533, 249.920269056596, 244.926144783245, 233.956022891542, 220.813007885031, 215.895915163588, 219.629850208992, 226.529124786425, 235.032123494009, 250.546880197711, 254.095666445652, 253.991281213844, 257.473874318134, 255.711023276206, 254.706592350919, 255.897299568681, 255.423667051829, 254.964414633531, 252.309596570162, 250.816389805404, 252.822825341718, 252.433744078688, 253.788538636919, 253.051533051301, 252.259801258752, 251.700941357203, 252.480709904758, 253.304639213951, 253.813290173747, 252.224179178895, 252.5256305716, 252.953270357102, 253.056596744107, 250.887811907567, 249.785344008962, 249.608008671692, 250.198318963079, 249.545803866489, 248.918121550092, 250.413202223321, 248.905568386056, 249.915943725407, 249.996626693197, 248.072932397947, 254.216837446764, 253.47859208961, 254.138987503108, 252.998968019197, 252.145336908707, 253.886762010027, 256.153072158387, 253.416947685881, 256.800148054864, 259.224442075146, 259.926382599212, 260.744060715055, 261.333067704411, 261.70551071479, 260.628314612014, 260.506542761438, 263.354614044726, 266.13290756708, 262.400702435337, 268.301353736501, 267.563565765135, 265.179775401112, 267.50084469174, 265.084311419213, 265.91631323141, 265.908458636794, 267.454699323839, 270.140777570102, 270.156144841528, 269.984118177928, 269.369715720695, 269.112561377231, 267.868800598569, 270.406678517722, 269.055867271265, 265.774697500886, 265.295737018483, 264.470831104601, 265.450849219086, 267.017599059781, 266.64912298019, 263.305864860443, 261.791948233685, 260.177485275595, 261.369886779133, 259.69575348706, 259.332128965948, 255.718119134055, 256.938581692474, 256.615531389322 ) mp.obj <- stomp(myData, window_size = windowSize) find_discord(mp.obj, data=input.dt$MeasurementValue) ## Warning in dist_profile(data, data, nn, window_size = .mp$w, index = discord_idx) : ## Warning: Result may be inconsistent if the size of the queries are different. ## Error in fast_avg_sd(data, window_size) : ## 'window_size' must be at least 2. *David Katz*, TIBCO Data Science 1.541.324.7417 [[alternative HTML version deleted]] __ R-help@r-project.or
[R] gbm question
R-Help, Please help me understand why these models and predictions are different: library(gbm) set.seed(32321) N <- 1000 X1 <- runif(N) X2 <- 2*runif(N) X3 <- ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1]) X4 <- factor(sample(letters[1:6],N,replace=TRUE)) X5 <- factor(sample(letters[1:3],N,replace=TRUE)) X6 <- 3*runif(N) mu <- c(-1,0,1,2)[as.numeric(X3)] SNR <- 10 # signal-to-noise ratio Y <- X1**1.5 + 2 * (X2**.5) + mu sigma <- sqrt(var(Y)/SNR) Y <- Y + rnorm(N,0,sigma) # introduce some missing values X1[sample(1:N,size=500)] <- NA X4[sample(1:N,size=300)] <- NA data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6) set.seed(32321) gbm.formula <- gbm(Y~X1+X2+X3+X4+X5+X6, # formula data=data, # dataset distribution="gaussian", # see the help for other choices n.trees=1000,# number of trees shrinkage=0.05, # shrinkage or learning rate, # 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best train.fraction = 1,# fraction of data for training, # first train.fraction*N used for training n.minobsinnode = 10, # minimum total weight needed in each node keep.data=TRUE, # keep a copy of the dataset with the object verbose=FALSE) # don't print out progress set.seed(32321) gbm.Fit <- gbm.fit(x=data[,-1],y=Y, distribution="gaussian", # see the help for other choices n.trees=1000,# number of trees shrinkage=0.05, # shrinkage or learning rate, # 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best nTrain=length(Y), # first train.fraction*N used for training n.minobsinnode = 10, # minimum total weight needed in each node keep.data=TRUE, # keep a copy of the dataset with the object verbose=FALSE) # don't print out progress all.equal(predict(gbm.formula,n.trees=100), predict(gbm.Fit,n.trees=100)) > [1] "Mean relative difference: 0.3585409" #all.equal(gbm.formula,gbm.Fit) no! (Based on the package examples) Thanks *David Katz*| IAG, TIBCO Spotfire [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FlexBayes installation from R-Forge Problem R 3.2.2
Pascal, Oops. Thanks! *David Katz*| IAG, TIBCO Spotfire O: 1.541.203.7084 | M: 1.541.324.7417 On Mon, Sep 28, 2015 at 5:47 PM, Pascal Oettli <kri...@ymail.com> wrote: > You misspelled the web address. It is "R-project", not "R.project". > Thus, the command line should be: > > install.packages("FlexBayes", repos="http://R-Forge.R-project.org;) > > Regards, > Pascal > > On Mon, Sep 28, 2015 at 9:17 AM, Davidwkatz <dk...@tibco.com> wrote: > > I tried to install FlexBayes like this: > > > > install.packages("FlexBayes", repos="http://R-Forge.R.project.org;) but > got > > errors: > > > > Here's the transcript in R: > > > > R version 3.2.2 (2015-08-14) -- "Fire Safety" > > Copyright (C) 2015 The R Foundation for Statistical Computing > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > > R is free software and comes with ABSOLUTELY NO WARRANTY. > > You are welcome to redistribute it under certain conditions. > > Type 'license()' or 'licence()' for distribution details. > > > > Natural language support but running in an English locale > > > > R is a collaborative project with many contributors. > > Type 'contributors()' for more information and > > 'citation()' on how to cite R or R packages in publications. > > > > Type 'demo()' for some demos, 'help()' for on-line help, or > > 'help.start()' for an HTML browser interface to help. > > Type 'q()' to quit R. > > > >> install.packages("FlexBayes", repos="http://R-Forge.R.project.org;) > > Installing package into ‘C:/Users/dkatz/R/win-library/3.2’ > > (as ‘lib’ is unspecified) > > Error: Line starting ' > > > > > Any help will be much appreciated! > > > > Thanks, > > > > > > > > -- > > View this message in context: > http://r.789695.n4.nabble.com/FlexBayes-installation-from-R-Forge-Problem-R-3-2-2-tp4712861.html > > Sent from the R help mailing list archive at Nabble.com. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Pascal Oettli > Project Scientist > JAMSTEC > Yokohama, Japan > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mgcv::gam in splus?
Is mgcv and particularly its gam available for Splus? I've been using it happily in R and need to implement something in Splus for which the automatic smoothing parameter selection is needed. Thanks for any guidance, David Katz da...@davidkatzconsulting.com -- View this message in context: http://r.789695.n4.nabble.com/mgcv-gam-in-splus-tp4546261p4546261.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about glmnet
I believe you can in this sense: use model.matrix to create X for glmnet(X,y,...). However, when dropping variables this will drop the indicators individually, not per factor, which may not be what you are looking for. Good luck, David Katz Axel Urbiz wrote: Hi, Is it possible to include factor variables as model inputs using this package? I'm quite sure it is not possible, but would like to double check. Thanks, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Question-about-glmnet-tp3006439p3517635.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list concatenation
Without any error checking, I'd try this: recurse - function(l1,l2){ if(is(l1[[1]],list)) mapply(recurse,l1,l2,SIMPLIFY=F) else {mapply(c,l1,l2,SIMPLIFY=F)} } recurse(list.1,list.2) which recursively traverses each tree (list) in tandem until one is not composed of lists and then concatenates. If the structure of the lists does not agree, this will fail. Regards, David Katz da...@davidkatzconsulting.com -- View this message in context: http://r.789695.n4.nabble.com/list-concatenation-tp3209182p3209324.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a list of dataframes
1) You have redefined the command list which creates lists - not a great idea. 2) See lapply; for example. Try something like: list.of.df - lapply(list.of.filenames,read.csv) list.of.results - lapply(list.of.df,your.application.function) Regards, David -- View this message in context: http://r.789695.n4.nabble.com/creating-a-list-of-dataframes-tp3071598p3071710.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the first. from SAS in R
Often the purpose of first/last in sas is to facilitate grouping of observations in a sequential algorithm. This purpose is better served in R by using vectorized methods like those in package plyr. Also, note that first/last has different meanings in the context of by x; versus by x notsorted;. R duplicated does not address the latter, which splits noncontiguous records with equal x. Regards, David -- View this message in context: http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3057476.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Identifying integers (as opposed to real #s) in matrix
Is there a way to identify (for subsequent replacement) which rows in a matrix are comprised entirely of *integers*? I have a large set of *nx3 *matrices where each row either consists of a set of 3 integers or a set of 3 real numbers. A given matrix might looks something like this: [ ,1] [ ,2] [ ,3] [1, ] 121.-98. 276. [2, ] 10.1234 25.4573 -188.9204 [3, ] 121.-98. 276. [4, ] -214.4982 -99.1043-312.0495 . [n, ] 99. 1. -222. Ultimately, I'm going to replace the values in the integer-only rows with NAs. But first I need r to recognize the integer-only rows. I assume whatever function I write will be keyed off of the .s, but have no clue how to write that function. Any ideas? David Katz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset function unexpected behavior
I was surprised to see this unexpected behavior of subset in a for loop. I looked in subset.data.frame and it seemed to me that both versions should work, since the subset call should be evaluated in the global environment - but perhaps I don't understand environments well enough. Can someone enlighten me? In any case, this is a bit of a gotcha for naive users of subset. input.data - data.frame(sch=c(1,1,2,2), pop=c(100,200,300,400)) school.var - sch school.list - 1:2 for(sch in school.list){ print(sch) #do this before subset!: right.sch.p - input.data[,school.var] == sch print( subset(input.data,right.sch.p)) #this is what I expected } ## [1] 1 ## sch pop ## 1 1 100 ## 2 1 200 ## [1] 2 ## sch pop ## 3 2 300 ## 4 2 400 for(sch in school.list){ print(sch) print(subset(input.data,input.data[,school.var] == sch)) #note - compact version fails! } ## [1] 1 ## sch pop ## 1 1 100 ## 2 1 200 ## 3 2 300 ## 4 2 400 ## [1] 2 ## sch pop ## 1 1 100 ## 2 1 200 ## 3 2 300 ## 4 2 400 -- View this message in context: http://n4.nabble.com/subset-function-unexpected-behavior-tp1459535p1459535.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset function unexpected behavior
Thanks, that helps! Subset creates a new context where a name clash can occur. So if I don't want to check for that possibility, I should use a special kind of index like .sch, or avoid subset: for(sch in school.list){ print(sch) print(input.data[input.data[,school.var] == sch,])} which works no matter what variable names I use. That seems like a reasonable requirement for good code. (Checking for a name clash would be at least theoretically needed since school.var is a parameter that can be any character name.) Although subset conveniently avoids extra typing in many cases (not here), this suggests to me that it's not ideal for code that can be used in a variety of contexts. Note that unlike attach, subset does not issue a warning! - Hi: Try this for your second loop instead: for(s in school.list){ print(s) print(subset(input.data, sch == s)) } [1] 1 sch pop 1 1 100 2 1 200 [1] 2 sch pop 3 2 300 4 2 400 Don't confound the 'sch' variable in your data frame with the index in your loop :) HTH, Dennis On Mon, Feb 1, 2010 at 8:17 PM, David Katz [hidden email]wrote: - Hide quoted text - I was surprised to see this unexpected behavior of subset in a for loop. I looked in subset.data.frame and it seemed to me that both versions should work, since the subset call should be evaluated in the global environment - but perhaps I don't understand environments well enough. Can someone enlighten me? In any case, this is a bit of a gotcha for naive users of subset. input.data - data.frame(sch=c(1,1,2,2), pop=c(100,200,300,400)) school.var - sch school.list - 1:2 for(sch in school.list){ print(sch) #do this before subset!: right.sch.p - input.data[,school.var] == sch print( subset(input.data,right.sch.p)) #this is what I expected } ## [1] 1 ## sch pop ## 1 1 100 ## 2 1 200 ## [1] 2 ## sch pop ## 3 2 300 ## 4 2 400 for(sch in school.list){ print(sch) print(subset(input.data,input.data[,school.var] == sch)) #note - compact version fails! } ## [1] 1 ## sch pop ## 1 1 100 ## 2 1 200 ## 3 2 300 ## 4 2 400 ## [1] 2 ## sch pop ## 1 1 100 ## 2 1 200 ## 3 2 300 ## 4 2 400 -- View this message in context: http://n4.nabble.com/subset-function-unexpected-behavior-tp1459535p1460057.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Windows Graphics Device Lockups with Rterm
I've been using Rterm with ESS to run R for some time. Recently I've experienced lockups when displaying graphics; the first display seems to work, but then refuses to respond and must be killed with dev.off(). Rgui has no problems. I've tried eliminating all other processes that might cause conflicts, to no avail. I'm using win XP and R 2.9.0. Here's a transcript using rterm: R version 2.9.0 (2009-04-17) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. if(!exists(baseenv, mode=function)) baseenv - function() NULL options(STERM='iESS', editor='gnuclient.exe') plot(1:5) #locked graphics device! dev.off() null device 1 Thanks for any suggestions. -- View this message in context: http://www.nabble.com/Windows-Graphics-Device-Lockups-with-Rterm-tp24428960p24428960.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] www.rpad.org
I've noticed this website has been down for several days. Does anyone have any information on whether/when it is coming back? Thanks. -- View this message in context: http://www.nabble.com/www.rpad.org-tp24175392p24175392.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to subsample all possible combinations of n species taken 1:n at a time?
If I understand your problem properly, you could just note that selecting 1:n of n objects is the same as deciding separately whether each one is included or not. (exclude the case where none are selected). Take 1000 of these and you are there- except some are duplicates - so generate extras and eliminate the duplicates, discard the extras. Something like this (not tested): p - 2^(n-1) / (2^n - 1) #all combinations have equal probability - removing rows with all zeros result - matrix(0,1200*n,nrow=1200) #plenty of extras for duplicates for(i in 1:1200) result[i,] - rbinom(n,1,p) result - subset(result,apply(result,1,sum) 0) #cases which have at least 1 species result - unique(result)[1:1000,] Might be interesting to see the effect of varying p on the rest of your analysis. Further memory might be saved by using sparse matrices - see the Matrix package. David Katz www.davidkatzconsulting.com jasper slingsby wrote: Hello I apologise for the length of this entry but please bear with me. In short: I need a way of subsampling communities from all possible communities of n taxa taken 1:n at a time without having to calculate all possible combinations (because this gives me a memory error - using combn() or expand.grid() at least). Does anyone know of a function? Or can you help me edit the combn or expand.grid functions to generate subsamples? -- View this message in context: http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22919388.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to subsample all possible combinations of n species taken 1:n at a time?
This is very cool indeed until you want to use more than 32 or so terms and most operating systems force you to go to floating point. x=sample(2^34,1000) Error in sample(2^34, 1000) : invalid 'x' argument In addition: Warning message: In sample(2^34, 1000) : NAs introduced by coercion jholtman wrote: Are you just trying to obtain a combination from 25 possible terms? If so, then just sample the number you want and convert the number to binary: sample(33554432,100) [1] 6911360 5924262 23052661 12888381 25831589 16700013 24079278 33282839 12751862 26086726 31363494 7118320 21866536 4212929 David Katz www.davidkatzconsulting.com -- View this message in context: http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22919597.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] autologistic modelling in R
Curiosity and Google lead me to this paper which may be of interest: Assessing the validity of autologistic regression Purchase the full-text article References and further reading may be available for this article. To view references and further reading you must purchase this article. Carsten F. DormannCorresponding Author Contact Information, a, E-mail The Corresponding Author aDepartment of Computational Landscape Ecology, UFZ Centre for Environmental Research, Permoserstr. 15, 04318 Leipzig, Germany Received 11 July 2006; revised 30 April 2007; accepted 7 May 2007. Available online 20 June 2007. Abstract In autologistic regression models employed in the analysis of species’ spatial distributions, an additional explanatory variable, the autocovariate, is used to correct the effect of spatial autocorrelation. The values of the autocovariate depend on the values of the response variable in the neighbourhood. While this approach has been widely used over the last ten years in biogeographical analyses, it has not been assessed for its validity and performance against artificial simulation data with known properties. I here present such an assessment, varying the range and strength of spatial autocorrelation in the data as well as the prevalence of the focal species. Autologistic regression models consistently underestimate the effect of the environmental variable in the model and give biased estimates compared to a non-spatial logistic regression. A comparison with other methods available for the correction of spatial autocorrelation shows that autologistic regression is more biased and less reliable and hence should be used only in concert with other reference methods. charlotte.bell wrote: Hi, I have spatially autocorrelated data (with a binary response variable and continuous predictor variables). I believe I need to do an autologistic model, does anyone know a method for doing this in R? Many thanks C Bell __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/autologistic-modelling-in-R-tp21072582p21108851.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Population Decay in R
barplot(5000*((1-.26)^(0:49))) jimdare wrote: Hi, I am new to R. I am trying to plot the decay of a population over time (0-50yrs). I have the initial population value (5000) and the mortality rate (0.26/yr) and I can't figure out how to apply this so I get a remaining population value each year. In excel (ignoring headings) I would put 5000 in A1, in B2 I would enter the formula A1*0.26, and then in A2 (the next years population) I would subtract B2 from A1. I would continue this process untill I had calculated the population for the 50th year. Any ideas of how to do this in R? :-/ -- View this message in context: http://www.nabble.com/Population-Decay-in-R-tp21024561p21025051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tcl/tk example in batch
The example for learning tcl/tk under R at http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/OKtoplevel.html suggests running it from batch - but when I do, the window flashes by and the example ends. I'm under XP pro. Is there a workaround? Should I create a modal window instead so it persists? Thanks. -- View this message in context: http://www.nabble.com/tcl-tk-example-in-batch-tp18964294p18964294.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Case statements in R
See ?cut for creating a factor based on ranges of values. Regards, Wade Wall wrote: Hi all, I am trying to convert geometric means in a matrix to cover classes. My values are as such: perc-c(0,0.025136418, 0.316227766, 1.414213562,3.16227766, 7.071067812, 15.8113883, 35.35533906, 61.23724357, 84.40971508, 97.46794345) cover-c(0,1,2,3,4,5,6,7,8,9,10) This is what I am trying to accomplish veg_mean[veg_mean0 veg_mean .1] - 1 veg_mean[veg_mean= .1 veg_mean 1.0] - 2 veg_mean[veg_mean=1.0 veg_mean 2.0] - 3 veg_mean[veg_mean=2.0 veg_mean 5.0] - 4 veg_mean[veg_mean= 5.0 veg_mean 10.0] - 5 veg_mean[veg_mean= 10.0 veg_mean 25] - 6 veg_mean[veg_mean= 25.0 veg_mean 50.0] - 7 veg_mean[veg_mean=50.0 veg_mean 75.0] - 8 veg_mean[veg_mean= 75.0 veg_mean 95.0 ] - 9 veg_mean[veg_mean= 95.0 veg_mean = 100] - 10 veg_mean[veg_mean 100] - NA where values are assigned based on the geometric means. However, I think that my syntax for the operator is wrong and I can't find a reference to proper syntax. I basically want to bin the geometric means. Any help would be greatly appreciated. Thanks, Wade [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Case-statements-in-R-tp18695725p18696107.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random Forest %var(y)
The verbose option gives a display like: rf.500 - + randomForest(new.x,trn.y,do.trace=20,ntree=100,nodesize=500, +importance=T) | Out-of-bag | Tree | MSE %Var(y) | 20 | 0.9279 100.84 | What is the meaning of %var(y)100%? I expected that to correspond to a model that was worse than random, but the predictions seem much better than that on the o-o-bag estimates from predict(rf.500). -- View this message in context: http://www.nabble.com/Random-Forest--var%28y%29-tp18295412p18295412.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mgcv::gam error message for predict.gam
Sometimes, for specific models, I get this error from predict.gam in library mgcv: Error in complete.cases(object) : negative length vectors are not allowed Here's an example: model.calibrate - gam(meansalesw ~ s(tscore,bs=cs,k=4), data=toplot, weights=weight, gam.method=perf.magic) test - predict(model.calibrate,newdata) Error in complete.cases(object) : negative length vectors are not allowed The data is shown below: toplot[,c(meansalesw,tscore,weight)] meansalesw tscore weight 1 0.1275841 0.003446797 15224 2 0.1495748 0.004017158 15523 3 0.2245844 0.004375278 15520 4 0.2197668 0.004753941 15525 5 0.1317830 0.005049050 15524 6 0.2809621 0.005403199 15498 7 0.2933119 0.005764413 15529 8 0.4791150 0.006335145 15514 9 0.1833688 0.006617095 15528 10 0.3200599 0.007135850 15527 11 0.4931882 0.007781095 15529 12 0.4207684 0.008766088 15512 13 0.5928568 0.009731357 15514 14 0.8025296 0.010927579 15520 15 0.6286192 0.012004714 15513 16 0.7477922 0.014083143 15527 17 0.7251362 0.017382274 15531 18 1.1871948 0.025481173 15521 19 1.6495832 0.048264689 15524 20 5.1180227 0.131198022 15218 newdata tscore 1 0.5059341 2 0.4125522 3 1.4335818 4 0.7060673 5 0.3229316 Thanks! -- View this message in context: http://www.nabble.com/mgcv%3A%3Agam-error-message-for-predict.gam-tp17789318p17789318.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] proto naming clash?
Trying to learn Proto. This threw me: #startup r... library(proto) a - proto(x=10) a$x [1] 10 x - proto(x=100) x$x Error in get(x, env = x, inherits = TRUE) : invalid 'envir' argument Do I simply need to be careful to name proto objects and proto components uniquely? Is this the desired behavior for proto objects? Thanks. -- View this message in context: http://www.nabble.com/proto-naming-clash--tp17258403p17258403.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mgcv::gam shrinkage of smooths
In Dr. Wood's book on GAM, he suggests in section 4.1.6 that it might be useful to shrink a single smooth by adding S=S+epsilon*I to the penalty matrix S. The context was the need to be able to shrink the term to zero if appropriate. I'd like to do this in order to shrink the coefficients towards zero (irrespective of the penalty for wiggliness) - but not necessarily all the way to zero. IE, my informal prior is to keep the contribution of a specific term small. 1) Is adding eps*I to the penalty matrix an effective way to achieve this goal? 2) How do I accomplish this in practice using mgcv::gam? Thanks. -- View this message in context: http://www.nabble.com/mgcv%3A%3Agam-shrinkage-of-smooths-tp17093645p17093645.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mgcv::predict.gam lpmatrix for prediction outside of R
This is in regards to the suggested use of type=lpmatrix in the documentation for mgcv::predict.gam. Could one not get the same result more simply by using type=terms and interpolating each term directly? What is the advantage of the lpmatrix approach for prediction outside R? Thanks. -- View this message in context: http://www.nabble.com/mgcv%3A%3Apredict.gam-lpmatrix-for-prediction-outside-of-R-tp16587009p16587009.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mgcv::gam prediction using lpmatrix
The documentation for predict.gam in library mgcv gives an example of using an lpmatrix to do approximate prediction via interpolation. However, the code is specific to the example wrt the number of smooth terms, df's for each,etc. (which is entirely appropriate for an example) Has anyone generalized this to directly generate code from a gam object (eg SAS or C code)? I wanted to check before I reinvent the wheel. Thanks. -- View this message in context: http://www.nabble.com/mgcv%3A%3Agam-prediction-using-lpmatrix-tp16531418p16531418.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] re gression trees: mean square vs. absolute errors
You need to think through the application of your model. Is it more important to get more cases classified correctly, or to avoid bigger errors versus a probability prediction? You should optimize your choice of a loss function so that it is appropriate to the way in which the model will be used. lubaroz wrote: Hi, I am working with CART regression now to predict a probability; the response is binary. Could anyone tell me in which cases it is better to use mean square error for splitting nodes and when mean absolute error should be preferred. I am now using the default (MSE) version and I can see that the obtained optimal tree is very different from the tree with the least mean absolute error. Thanks in advance, Luba -- View this message in context: http://www.nabble.com/regression-trees%3A-mean-square-vs.-absolute-errors-tp16274094p16286639.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] running balance down a dataframe referring back to previous row
Try: cs - with(txns,cumsum(cr - dr)) You could if needed adjust the starting value to zero by concatenating a zero in front and dropping the last entry. txns$running.bal - c(0,cs[seq(length(cs) - 1)]) Good luck. seanpor wrote: Good morning, I've searched high and low and I've tried many different ways of doing this, but I can't seem to get it to work. I'm looking for a way of vectorising a running balance; i.e. the value in the first row of the dataframe is zero, and following rows add to this running balance. This is easy to write in a loop, but I can't seem to get it working in vectorised code. Hopefully the example below will explain what I'm trying to do... Many thanks in advance, Best regards, Sean O'Riordain -- View this message in context: http://www.nabble.com/running-balance-down-a-dataframe-referring-back-to-previous-row-tp16142263p16145133.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error in random forest
I've had the same problem and solved it by removing the cases with the new levels - they need to be handled some other way, either by building a new model or reassigning the factor level to one in the training set. Nagu wrote: Hi, I get the following error when I try to predict the probabilities of a test sample: Error in predict.randomForest(fit.EBA.OM.rf.50, x.OM, type = prob) : New factor levels not present in the training data I have about 630 predictor variables in the dataset x.OM (25 factor variables and the remaining are continuous variables). Any ideas on how to trace it? Thank you, Nagu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/error-in-random-forest-tp15904235p15922797.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] randomForest() for regression produces offset predictions
I would expect this regression towards the mean behavior on a new or hold out dataset, not on the training data. In RF terminology, this means that the model prediction from predict is the in-bag estimate, but the out-of-bag estimate is what you want for prediction. In Joshua's example, rf.rf$predicted is an out-of-bag estimate, but since newdata is given, it appears that the result is the in-bag estimate, which still needs an adjustment like Joshua's (and perhaps a more complex one might be needed in some cases). This is a bit confusing since predict() usually matches what's in model$fitted.values. I imagine that's why the author used predicted as the component name instead of the standard fitted.values. The documentation for predict.randomForest explains: newdata - a data frame or matrix containing new data. (Note: If not given, the out-of-bag prediction in object is returned. Patrick Burns wrote: What I see is the predictions being less extreme than the actual values -- predictions for large actual values are smaller than the actual, and predictions for small actual values are larger than the actual. That makes sense to me. The object is to maximize out-of-sample predictive power, not in-sample predictive power. Or am I missing something in what you are saying? Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Joshua Knowles wrote: Hi all, I have observed that when using the randomForest package to do regression, the predicted values of the dependent variable given by a trained forest are not centred and have the wrong slope when plotted against the true values. This means that the R^2 value obtained by squaring the Pearson correlation are better than those obtained by computing the coefficient of determination directly. The R^2 value obtained by squaring the Pearson can, however, be exactly reproduced by the coeff. of det. if the predicted values are first linearly transformed (using lm() to find the required intercept and slope). Does anyone know why the randomForest behaves in this way - producing offset predictions? Does anyone know a fix for the problem? (By the way, the feature is there even if the original dependent variable values are initially transformed to have zero mean and unit variance.) As an example, here is some simple R code that uses the available swiss dataset to show the effect I am observing. Thanks for any help. -- EXAMPLE OF RANDOM FOREST REGRESSION library(randomForest) data(swiss) swiss #Build the random forest to predict Infant Mortality rf.rf-randomForest(Infant.Mortality ~ ., data=swiss) #And predict the training set again pred-c(predict(rf.rf,swiss)) actual-swiss$Infant.Mortality #Plotting predicted against actual values shows the effect (uncomment to see this) #plot(pred,actual) #abline(0,1) # calculate R^2 as pearson coefficient squared R2one-cor(pred,actual)^2 # calculate R^2 value as fraction of variance explained residOpt-(actual-pred) residnone-(actual-mean(actual)) R2two-1-var(residOpt,na.rm=TRUE)/var(residnone, na.rm=TRUE) # now fit a line through the predicted and true values and # use this to normalize the data before calculating R^2 fit-lm(actual ~ pred) coef(fit) pred2-pred*coef(fit)[2]+coef(fit)[1] residOpt-(actual-pred2) R2three-1-var(residOpt,na.rm=TRUE)/var(residnone, na.rm=TRUE) cat(Pearson squared = ,R2one,\n) cat(Coeff of determination = , R2two, \n) cat(Coeff of determination after linear fitting = , R2three, \n) ## END __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/randomForest%28%29-for-regression-produces-offset-predictions-tp14415517p14447468.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help google group archive
Also see www.nabble.com for a very nice interface to current and archived posts. vince-28 wrote: I made a google group archive of current and future R-help posts at http://groups.google.com/group/r-help-archive If you are signed-up for the R-help mailing list with a gmail account you can post/reply through the google group pages. Note that this is not a separate mailing-list, just a copy of the original. Only posts after December 2nd 2007 will be available. I assume there are no objections to this. In case I am wrong please let me know. Vincent __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/R-help-google-group-archive-tf494.html#a14152914 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R as server application
This sounds useful, but can you give more info on forward X Thanks. Scionforbai wrote: Do you need something more than a simple ssh connection to a remote host in which you run R (trivial when the server is Linux)? My advice is to run R in a screen session on the remote host (it protects from sudden disconnections). Then you have a window on your screen with the R command line, which you can copy/paste your scripts to (from whichever editor you want) as if it was running locally. Of course, on-screen graphics works (if you forward X... if you see what I mean) but it depends on connection speed (in LAN no problem, through internet it can be a pain, I usually don't use it then) and you need an X server running locally (if the 'client' is windows, cygwin highly recommended). A real R 'server' could be very useful though. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/R-as-server-application-tf4849719.html#a13944110 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Largest N Values Efficiently?
x is a 1XN sparse matrix of numerics. I am using the Matrix package to represent as a sparse matrix; the representation has a numeric vector representing the positions within the matrix. My goal is find the columns with the n largest values, here positive correlations. Part of my strategy is to only sort the nonzeros which are available as a numeric vector. Thanks for your interest and input. Prof Brian Ripley wrote: What is 'x' here? What type? Does it contain NAs? Are there ties? R's ordering functions are rather general, and you can gain efficiency by ruling some of these out. See ?sort, look at the 'partial' argument, including the comments in the Details. And also look at ?sort.list. sort.int(x) is more efficient than x[order(x)], and x[order(x)[1:n]] is more efficient than x[order(x)][1:n] for most types. Finally, does efficiency matter? As the examples in ?sort show, R can sort a vector of length 2000 is well under 1ms, and 1e7 random normals in less time than they take to generate. There are not many tasks where gaining efficiency over x[order(x)][1:n] will be important. E.g. system.time(x - rnorm(1e6)) user system elapsed 0.440.000.44 system.time(x[order(x)][1:4]) user system elapsed 1.720.001.72 system.time(x2 - sort.int(x, method = quick)[1:4]) user system elapsed 0.310.000.32 system.time(min(x)) user system elapsed 0.020.000.02 system.time(x2 - sort.int(x, partial=1)[1]) user system elapsed 0.070.000.07 and do savings of tenths of a second matter? (There is also quantreg::kselect, if you work out how to use it, which apparently is a bit faster at partial sorting on MacOS X but not elsewhere.) On Sun, 11 Nov 2007, David Katz wrote: What is the most efficient alternative to x[order(x)][1:n] where length(x)n? That is the smallest n values, pace your subject line. I also need the positions of the mins/maxs perhaps by preserving names. Thanks for any suggestions. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13708965 Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.