[R] tsmp::find_discord help

2021-11-20 Thread David Katz via R-help
147171, 234.080896296026, 238.081307704421,
240.863466976117, 240.980620885221, 242.865790697373, 242.192567156209,
244.400966976397, 230.117940277327, 246.551293436438, 248.382975894492,
251.654291217308, 256.004255625047, 257.742864777334, 262.281011258299,
261.284388574678, 261.408284877567, 262.63402390047, 262.027830331167,
263.04610984805, 264.671093973145, 264.421757659875, 265.237929494493,
266.872099533025, 264.696907634661, 264.260035745148, 263.499976769043,
262.425159591483, 260.981390716648, 261.082354369201, 259.278915649932,
259.796834945399, 257.282395334402, 255.692769738659, 256.477281617327,
256.18586372775, 255.027394031128, 253.672028350737, 253.725117554283,
253.678845994268, 252.442158376286, 252.079511253303, 254.582988659106,
254.874491709983, 252.931852529524, 252.177873732708, 251.521926319506,
252.912911842065, 253.201235300861, 251.230596607132, 250.384639365599,
251.930171640497, 247.943138212385, 227.360327759245, 236.807827285165,
236.256607593177, 245.676176491845, 248.029262333829, 246.607130076038,
247.609178481344, 247.744748076377, 247.872489046957, 247.48352803234,
249.194618296623, 249.265455424832, 249.594569674553, 250.408728755265,
247.515878390195, 248.641169324564, 249.340687829629, 249.36502006175,
250.620096849697, 250.924085881794, 251.446678319806, 252.231693277322,
253.722321051266, 251.152605426591, 254.144238363625, 255.721436826838,
255.942811810924, 256.764821865549, 257.00394097222, 258.391524989996,
259.436215976905, 260.193549101287, 261.534455659939, 261.899602734251,
262.616657174518, 263.481273697829, 264.543112079194, 266.521162025398,
267.961329632392, 266.7146434505, 264.785401400272, 265.351171001932,
270.092530136788, 262.900027005747, 258.104321945971, 243.820629691239,
251.983970767446, 255.462414786033, 260.955431247223, 262.743916866882,
262.536868907092, 263.482801351137, 260.690328582004, 259.374907453964,
259.448360515013, 260.766381761152, 266.51212834334, 261.178649726696,
262.13204070949, 260.329801393952, 257.072608946357, 256.699290173501,
259.76781842229, 259.634321800619, 256.142043520836, 253.331642591441,
252.409818426427, 253.606348613091, 254.796218211763, 255.09623971465,
256.665105978632, 255.078059232282, 254.524220743589, 252.710077736247,
252.699360275641, 252.984852506546, 251.256105145533, 249.920269056596,
244.926144783245, 233.956022891542, 220.813007885031, 215.895915163588,
219.629850208992, 226.529124786425, 235.032123494009, 250.546880197711,
254.095666445652, 253.991281213844, 257.473874318134, 255.711023276206,
254.706592350919, 255.897299568681, 255.423667051829, 254.964414633531,
252.309596570162, 250.816389805404, 252.822825341718, 252.433744078688,
253.788538636919, 253.051533051301, 252.259801258752, 251.700941357203,
252.480709904758, 253.304639213951, 253.813290173747, 252.224179178895,
252.5256305716, 252.953270357102, 253.056596744107, 250.887811907567,
249.785344008962, 249.608008671692, 250.198318963079, 249.545803866489,
248.918121550092, 250.413202223321, 248.905568386056, 249.915943725407,
249.996626693197, 248.072932397947, 254.216837446764, 253.47859208961,
254.138987503108, 252.998968019197, 252.145336908707, 253.886762010027,
256.153072158387, 253.416947685881, 256.800148054864, 259.224442075146,
259.926382599212, 260.744060715055, 261.333067704411, 261.70551071479,
260.628314612014, 260.506542761438, 263.354614044726, 266.13290756708,
262.400702435337, 268.301353736501, 267.563565765135, 265.179775401112,
267.50084469174, 265.084311419213, 265.91631323141, 265.908458636794,
267.454699323839, 270.140777570102, 270.156144841528, 269.984118177928,
269.369715720695, 269.112561377231, 267.868800598569, 270.406678517722,
269.055867271265, 265.774697500886, 265.295737018483, 264.470831104601,
265.450849219086, 267.017599059781, 266.64912298019, 263.305864860443,
261.791948233685, 260.177485275595, 261.369886779133, 259.69575348706,
259.332128965948, 255.718119134055, 256.938581692474, 256.615531389322
)

mp.obj <- stomp(myData, window_size = windowSize)

find_discord(mp.obj, data=input.dt$MeasurementValue)



## Warning in dist_profile(data, data, nn, window_size = .mp$w, index =
discord_idx) :
##   Warning: Result may be inconsistent if the size of the queries are
different.
## Error in fast_avg_sd(data, window_size) :
##   'window_size' must be at least 2.
>

*David Katz*, TIBCO Data Science


1.541.324.7417

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tsmp::find_discord failure

2021-11-20 Thread David Katz via R-help
555921, 226.583060863288,
229.061297126161, 231.837136147171, 234.080896296026, 238.081307704421,
240.863466976117, 240.980620885221, 242.865790697373, 242.192567156209,
244.400966976397, 230.117940277327, 246.551293436438, 248.382975894492,
251.654291217308, 256.004255625047, 257.742864777334, 262.281011258299,
261.284388574678, 261.408284877567, 262.63402390047, 262.027830331167,
263.04610984805, 264.671093973145, 264.421757659875, 265.237929494493,
266.872099533025, 264.696907634661, 264.260035745148, 263.499976769043,
262.425159591483, 260.981390716648, 261.082354369201, 259.278915649932,
259.796834945399, 257.282395334402, 255.692769738659, 256.477281617327,
256.18586372775, 255.027394031128, 253.672028350737, 253.725117554283,
253.678845994268, 252.442158376286, 252.079511253303, 254.582988659106,
254.874491709983, 252.931852529524, 252.177873732708, 251.521926319506,
252.912911842065, 253.201235300861, 251.230596607132, 250.384639365599,
251.930171640497, 247.943138212385, 227.360327759245, 236.807827285165,
236.256607593177, 245.676176491845, 248.029262333829, 246.607130076038,
247.609178481344, 247.744748076377, 247.872489046957, 247.48352803234,
249.194618296623, 249.265455424832, 249.594569674553, 250.408728755265,
247.515878390195, 248.641169324564, 249.340687829629, 249.36502006175,
250.620096849697, 250.924085881794, 251.446678319806, 252.231693277322,
253.722321051266, 251.152605426591, 254.144238363625, 255.721436826838,
255.942811810924, 256.764821865549, 257.00394097222, 258.391524989996,
259.436215976905, 260.193549101287, 261.534455659939, 261.899602734251,
262.616657174518, 263.481273697829, 264.543112079194, 266.521162025398,
267.961329632392, 266.7146434505, 264.785401400272, 265.351171001932,
270.092530136788, 262.900027005747, 258.104321945971, 243.820629691239,
251.983970767446, 255.462414786033, 260.955431247223, 262.743916866882,
262.536868907092, 263.482801351137, 260.690328582004, 259.374907453964,
259.448360515013, 260.766381761152, 266.51212834334, 261.178649726696,
262.13204070949, 260.329801393952, 257.072608946357, 256.699290173501,
259.76781842229, 259.634321800619, 256.142043520836, 253.331642591441,
252.409818426427, 253.606348613091, 254.796218211763, 255.09623971465,
256.665105978632, 255.078059232282, 254.524220743589, 252.710077736247,
252.699360275641, 252.984852506546, 251.256105145533, 249.920269056596,
244.926144783245, 233.956022891542, 220.813007885031, 215.895915163588,
219.629850208992, 226.529124786425, 235.032123494009, 250.546880197711,
254.095666445652, 253.991281213844, 257.473874318134, 255.711023276206,
254.706592350919, 255.897299568681, 255.423667051829, 254.964414633531,
252.309596570162, 250.816389805404, 252.822825341718, 252.433744078688,
253.788538636919, 253.051533051301, 252.259801258752, 251.700941357203,
252.480709904758, 253.304639213951, 253.813290173747, 252.224179178895,
252.5256305716, 252.953270357102, 253.056596744107, 250.887811907567,
249.785344008962, 249.608008671692, 250.198318963079, 249.545803866489,
248.918121550092, 250.413202223321, 248.905568386056, 249.915943725407,
249.996626693197, 248.072932397947, 254.216837446764, 253.47859208961,
254.138987503108, 252.998968019197, 252.145336908707, 253.886762010027,
256.153072158387, 253.416947685881, 256.800148054864, 259.224442075146,
259.926382599212, 260.744060715055, 261.333067704411, 261.70551071479,
260.628314612014, 260.506542761438, 263.354614044726, 266.13290756708,
262.400702435337, 268.301353736501, 267.563565765135, 265.179775401112,
267.50084469174, 265.084311419213, 265.91631323141, 265.908458636794,
267.454699323839, 270.140777570102, 270.156144841528, 269.984118177928,
269.369715720695, 269.112561377231, 267.868800598569, 270.406678517722,
269.055867271265, 265.774697500886, 265.295737018483, 264.470831104601,
265.450849219086, 267.017599059781, 266.64912298019, 263.305864860443,
261.791948233685, 260.177485275595, 261.369886779133, 259.69575348706,
259.332128965948, 255.718119134055, 256.938581692474, 256.615531389322
)

library(tsmp)
mp.obj <- stomp(myData, window_size = windowSize) #ok

find_discord(mp.obj, data=input.dt$MeasurementValue)



## Warning in dist_profile(data, data, nn, window_size = .mp$w, index =
discord_idx) :
##   Warning: Result may be inconsistent if the size of the queries are
different.
## Error in fast_avg_sd(data, window_size) :
##   'window_size' must be at least 2.
>

*David Katz*, TIBCO Data Science


1.541.324.7417

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem using tsmp::find_discord

2021-11-20 Thread David Katz via R-help
214007, 221.640786242951, 219.97023160113,
210.878755430737, 179.209233482601, 144.494342460437, 163.888149251789,
186.003606780246, 189.352081675269, 194.529081804026, 196.759613684891,
223.580900584348, 227.607943509612, 218.162016380718, 211.594753752323,
214.163046110841, 215.744377558026, 227.310642507114, 227.799354057526,
227.551336807385, 226.132788368966, 221.52987925848, 218.445829530153,
218.89121471229, 220.106824906403, 224.207053555921, 226.583060863288,
229.061297126161, 231.837136147171, 234.080896296026, 238.081307704421,
240.863466976117, 240.980620885221, 242.865790697373, 242.192567156209,
244.400966976397, 230.117940277327, 246.551293436438, 248.382975894492,
251.654291217308, 256.004255625047, 257.742864777334, 262.281011258299,
261.284388574678, 261.408284877567, 262.63402390047, 262.027830331167,
263.04610984805, 264.671093973145, 264.421757659875, 265.237929494493,
266.872099533025, 264.696907634661, 264.260035745148, 263.499976769043,
262.425159591483, 260.981390716648, 261.082354369201, 259.278915649932,
259.796834945399, 257.282395334402, 255.692769738659, 256.477281617327,
256.18586372775, 255.027394031128, 253.672028350737, 253.725117554283,
253.678845994268, 252.442158376286, 252.079511253303, 254.582988659106,
254.874491709983, 252.931852529524, 252.177873732708, 251.521926319506,
252.912911842065, 253.201235300861, 251.230596607132, 250.384639365599,
251.930171640497, 247.943138212385, 227.360327759245, 236.807827285165,
236.256607593177, 245.676176491845, 248.029262333829, 246.607130076038,
247.609178481344, 247.744748076377, 247.872489046957, 247.48352803234,
249.194618296623, 249.265455424832, 249.594569674553, 250.408728755265,
247.515878390195, 248.641169324564, 249.340687829629, 249.36502006175,
250.620096849697, 250.924085881794, 251.446678319806, 252.231693277322,
253.722321051266, 251.152605426591, 254.144238363625, 255.721436826838,
255.942811810924, 256.764821865549, 257.00394097222, 258.391524989996,
259.436215976905, 260.193549101287, 261.534455659939, 261.899602734251,
262.616657174518, 263.481273697829, 264.543112079194, 266.521162025398,
267.961329632392, 266.7146434505, 264.785401400272, 265.351171001932,
270.092530136788, 262.900027005747, 258.104321945971, 243.820629691239,
251.983970767446, 255.462414786033, 260.955431247223, 262.743916866882,
262.536868907092, 263.482801351137, 260.690328582004, 259.374907453964,
259.448360515013, 260.766381761152, 266.51212834334, 261.178649726696,
262.13204070949, 260.329801393952, 257.072608946357, 256.699290173501,
259.76781842229, 259.634321800619, 256.142043520836, 253.331642591441,
252.409818426427, 253.606348613091, 254.796218211763, 255.09623971465,
256.665105978632, 255.078059232282, 254.524220743589, 252.710077736247,
252.699360275641, 252.984852506546, 251.256105145533, 249.920269056596,
244.926144783245, 233.956022891542, 220.813007885031, 215.895915163588,
219.629850208992, 226.529124786425, 235.032123494009, 250.546880197711,
254.095666445652, 253.991281213844, 257.473874318134, 255.711023276206,
254.706592350919, 255.897299568681, 255.423667051829, 254.964414633531,
252.309596570162, 250.816389805404, 252.822825341718, 252.433744078688,
253.788538636919, 253.051533051301, 252.259801258752, 251.700941357203,
252.480709904758, 253.304639213951, 253.813290173747, 252.224179178895,
252.5256305716, 252.953270357102, 253.056596744107, 250.887811907567,
249.785344008962, 249.608008671692, 250.198318963079, 249.545803866489,
248.918121550092, 250.413202223321, 248.905568386056, 249.915943725407,
249.996626693197, 248.072932397947, 254.216837446764, 253.47859208961,
254.138987503108, 252.998968019197, 252.145336908707, 253.886762010027,
256.153072158387, 253.416947685881, 256.800148054864, 259.224442075146,
259.926382599212, 260.744060715055, 261.333067704411, 261.70551071479,
260.628314612014, 260.506542761438, 263.354614044726, 266.13290756708,
262.400702435337, 268.301353736501, 267.563565765135, 265.179775401112,
267.50084469174, 265.084311419213, 265.91631323141, 265.908458636794,
267.454699323839, 270.140777570102, 270.156144841528, 269.984118177928,
269.369715720695, 269.112561377231, 267.868800598569, 270.406678517722,
269.055867271265, 265.774697500886, 265.295737018483, 264.470831104601,
265.450849219086, 267.017599059781, 266.64912298019, 263.305864860443,
261.791948233685, 260.177485275595, 261.369886779133, 259.69575348706,
259.332128965948, 255.718119134055, 256.938581692474, 256.615531389322
)

mp.obj <- stomp(myData, window_size = windowSize)

find_discord(mp.obj, data=input.dt$MeasurementValue)

## Warning in dist_profile(data, data, nn, window_size = .mp$w, index =
discord_idx) :
##   Warning: Result may be inconsistent if the size of the queries are
different.
## Error in fast_avg_sd(data, window_size) :
##   'window_size' must be at least 2.

*David Katz*, TIBCO Data Science


1.541.324.7417

[[alternative HTML version deleted]]

__
R-help@r-project.or

[R] gbm question

2016-11-21 Thread David Katz via R-help
R-Help,

Please help me understand why these models and predictions are different:


library(gbm)

set.seed(32321)
 N <- 1000
 X1 <- runif(N)
 X2 <- 2*runif(N)
 X3 <- ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1])
 X4 <- factor(sample(letters[1:6],N,replace=TRUE))
 X5 <- factor(sample(letters[1:3],N,replace=TRUE))
 X6 <- 3*runif(N)
 mu <- c(-1,0,1,2)[as.numeric(X3)]

 SNR <- 10 # signal-to-noise ratio
 Y <- X1**1.5 + 2 * (X2**.5) + mu
 sigma <- sqrt(var(Y)/SNR)
 Y <- Y + rnorm(N,0,sigma)

 # introduce some missing values
 X1[sample(1:N,size=500)] <- NA
 X4[sample(1:N,size=300)] <- NA

 data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)

set.seed(32321)
gbm.formula <-
 gbm(Y~X1+X2+X3+X4+X5+X6, # formula
 data=data,   # dataset
 distribution="gaussian", # see the help for other choices
 n.trees=1000,# number of trees
 shrinkage=0.05,  # shrinkage or learning rate,
  # 0.001 to 0.1 usually work
 interaction.depth=3, # 1: additive model, 2: two-way
interactions, etc.
 bag.fraction = 0.5,  # subsampling fraction, 0.5 is
probably best
 train.fraction = 1,# fraction of data for training,
  # first train.fraction*N used for
training
 n.minobsinnode = 10, # minimum total weight needed in each
node
 keep.data=TRUE,  # keep a copy of the dataset with the
object
 verbose=FALSE)   # don't print out progress



set.seed(32321)
gbm.Fit <-
 gbm.fit(x=data[,-1],y=Y,
 distribution="gaussian", # see the help for other choices
 n.trees=1000,# number of trees
 shrinkage=0.05,  # shrinkage or learning rate,
  # 0.001 to 0.1 usually work
 interaction.depth=3, # 1: additive model, 2: two-way
interactions, etc.
 bag.fraction = 0.5,  # subsampling fraction, 0.5 is
probably best
 nTrain=length(Y),
  # first train.fraction*N used for
training
 n.minobsinnode = 10, # minimum total weight needed in each
node
 keep.data=TRUE,  # keep a copy of the dataset with the
object
 verbose=FALSE)   # don't print out progress





all.equal(predict(gbm.formula,n.trees=100), predict(gbm.Fit,n.trees=100))

> [1] "Mean relative difference: 0.3585409"

#all.equal(gbm.formula,gbm.Fit) no!

(Based on the package examples)

Thanks

*David Katz*| IAG, TIBCO Spotfire

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FlexBayes installation from R-Forge Problem R 3.2.2

2015-09-28 Thread David Katz
Pascal,

Oops. Thanks!


*David Katz*| IAG, TIBCO Spotfire

O: 1.541.203.7084 | M: 1.541.324.7417



On Mon, Sep 28, 2015 at 5:47 PM, Pascal Oettli <kri...@ymail.com> wrote:

> You misspelled the web address. It is "R-project", not "R.project".
> Thus, the command line should be:
>
> install.packages("FlexBayes", repos="http://R-Forge.R-project.org;)
>
> Regards,
> Pascal
>
> On Mon, Sep 28, 2015 at 9:17 AM, Davidwkatz <dk...@tibco.com> wrote:
> > I tried to install FlexBayes like this:
> >
> > install.packages("FlexBayes", repos="http://R-Forge.R.project.org;) but
> got
> > errors:
> >
> > Here's the transcript in R:
> >
> > R version 3.2.2 (2015-08-14) -- "Fire Safety"
> > Copyright (C) 2015 The R Foundation for Statistical Computing
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> > R is free software and comes with ABSOLUTELY NO WARRANTY.
> > You are welcome to redistribute it under certain conditions.
> > Type 'license()' or 'licence()' for distribution details.
> >
> >   Natural language support but running in an English locale
> >
> > R is a collaborative project with many contributors.
> > Type 'contributors()' for more information and
> > 'citation()' on how to cite R or R packages in publications.
> >
> > Type 'demo()' for some demos, 'help()' for on-line help, or
> > 'help.start()' for an HTML browser interface to help.
> > Type 'q()' to quit R.
> >
> >> install.packages("FlexBayes", repos="http://R-Forge.R.project.org;)
> > Installing package into ‘C:/Users/dkatz/R/win-library/3.2’
> > (as ‘lib’ is unspecified)
> > Error: Line starting ' >
> >
> > Any help will be much appreciated!
> >
> > Thanks,
> >
> >
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/FlexBayes-installation-from-R-Forge-Problem-R-3-2-2-tp4712861.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Pascal Oettli
> Project Scientist
> JAMSTEC
> Yokohama, Japan
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mgcv::gam in splus?

2012-04-10 Thread David Katz
Is mgcv and particularly its gam available for Splus? I've been using it
happily in R and need to implement something in Splus for which the
automatic smoothing parameter selection is needed.

Thanks for any guidance,

David Katz

da...@davidkatzconsulting.com

--
View this message in context: 
http://r.789695.n4.nabble.com/mgcv-gam-in-splus-tp4546261p4546261.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about glmnet

2011-05-12 Thread David Katz
I believe you can in this sense: use model.matrix to create X for
glmnet(X,y,...).

However, when dropping variables this will drop the indicators individually,
not per factor, which may not be what you are looking for.

Good luck,
David Katz


Axel Urbiz wrote:
 
 Hi,
 
 Is it possible to include factor variables as model inputs using this
 package? I'm quite sure it is not possible, but would like to double
 check.
 
 Thanks,
 
 Axel.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


--
View this message in context: 
http://r.789695.n4.nabble.com/Question-about-glmnet-tp3006439p3517635.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list concatenation

2011-01-11 Thread David Katz

Without any error checking, I'd try this:

recurse -
function(l1,l2){
if(is(l1[[1]],list))
mapply(recurse,l1,l2,SIMPLIFY=F) else
{mapply(c,l1,l2,SIMPLIFY=F)}
}

recurse(list.1,list.2)

which recursively traverses each tree (list) in tandem until one is not
composed of lists and then concatenates. If the structure of the lists does
not agree, this will fail.

Regards,

David Katz
da...@davidkatzconsulting.com
-- 
View this message in context: 
http://r.789695.n4.nabble.com/list-concatenation-tp3209182p3209324.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a list of dataframes

2010-12-03 Thread David Katz

1) You have redefined the command list which creates lists - not a great
idea.

2) See lapply; for example. Try something like:

list.of.df - lapply(list.of.filenames,read.csv)
list.of.results - lapply(list.of.df,your.application.function)

Regards,
David
-- 
View this message in context: 
http://r.789695.n4.nabble.com/creating-a-list-of-dataframes-tp3071598p3071710.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the first. from SAS in R

2010-11-24 Thread David Katz

Often the purpose of first/last in sas is to facilitate grouping of
observations in a sequential algorithm. This purpose is better served in R
by using vectorized methods like those in package plyr.

Also, note that first/last has different meanings in the context of by x;
versus by x notsorted;. R duplicated does not address the latter, which
splits noncontiguous records with equal x.

Regards,
David
-- 
View this message in context: 
http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3057476.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Identifying integers (as opposed to real #s) in matrix

2010-08-10 Thread David Katz
Is there a way to identify (for subsequent replacement) which rows in a
matrix are comprised entirely of *integers*?  I have a large set of
*nx3 *matrices
where each row either consists of a set of 3 integers or a set of 3 real
numbers.  A given matrix might looks something like this:

   [ ,1]   [ ,2]   [ ,3]

[1, ] 121.-98.   276.

[2, ]  10.1234 25.4573 -188.9204

[3, ]  121.-98.   276.

[4, ]  -214.4982   -99.1043-312.0495

.

[n, ]  99.  1.   -222.

Ultimately, I'm going to replace the values in the integer-only rows with
NAs.  But first I need r to recognize the integer-only rows.  I assume
whatever function I write will be keyed off of the .s, but have no
clue how to write that function.  Any ideas?

David Katz

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset function unexpected behavior

2010-02-02 Thread David Katz

I was surprised to see this unexpected behavior of subset in a for loop. I
looked in subset.data.frame and it seemed to me that both versions should
work, since the subset call should be evaluated in the global environment -
but perhaps I don't understand environments well enough. Can someone
enlighten me? In any case, this is a bit of a gotcha for naive users of
subset.

input.data -
  data.frame(sch=c(1,1,2,2),
 pop=c(100,200,300,400))

school.var - sch

school.list - 1:2

for(sch in school.list){
  print(sch)
  #do this before subset!:
  right.sch.p -
input.data[,school.var] == sch
  print(  subset(input.data,right.sch.p)) #this is what I expected
}

## [1] 1
##   sch pop
## 1   1 100
## 2   1 200
## [1] 2
##   sch pop
## 3   2 300
## 4   2 400


for(sch in school.list){
  print(sch)
  print(subset(input.data,input.data[,school.var] == sch)) #note - compact
version fails!
}

## [1] 1
##   sch pop
## 1   1 100
## 2   1 200
## 3   2 300
## 4   2 400
## [1] 2
##   sch pop
## 1   1 100
## 2   1 200
## 3   2 300
## 4   2 400

-- 
View this message in context: 
http://n4.nabble.com/subset-function-unexpected-behavior-tp1459535p1459535.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subset function unexpected behavior

2010-02-02 Thread David Katz

Thanks, that helps! Subset creates a new context where a name clash can
occur. So if I don't want to check for that possibility, I should use a
special kind of index like .sch, or avoid subset:

for(sch in school.list){
  print(sch)
  print(input.data[input.data[,school.var] == sch,])}

which works no matter what variable names I use. That seems like a
reasonable requirement for good code.

(Checking for a name clash would be at least theoretically needed since
school.var is a parameter that can be any character name.)

Although subset conveniently avoids extra typing in many cases (not here),
this suggests to me that it's not ideal for code that can be used in a
variety of contexts. Note that unlike attach, subset does not issue a
warning! 

-

Hi: 

Try this for your second loop instead: 

for(s in school.list){ 
  print(s) 
  print(subset(input.data, sch == s)) 
 } 
[1] 1 
  sch pop 
1   1 100 
2   1 200 
[1] 2 
  sch pop 
3   2 300 
4   2 400 

Don't confound the 'sch' variable in your data frame with the 
index in your loop :) 

HTH, 
Dennis 

On Mon, Feb 1, 2010 at 8:17 PM, David Katz [hidden email]wrote: 
- Hide quoted text -

 
 I was surprised to see this unexpected behavior of subset in a for loop. I 
 looked in subset.data.frame and it seemed to me that both versions should 
 work, since the subset call should be evaluated in the global environment
 - 
 but perhaps I don't understand environments well enough. Can someone 
 enlighten me? In any case, this is a bit of a gotcha for naive users of 
 subset. 
 
 input.data - 
  data.frame(sch=c(1,1,2,2), 
 pop=c(100,200,300,400)) 
 
 school.var - sch 
 
 school.list - 1:2 
 
 for(sch in school.list){ 
  print(sch) 
  #do this before subset!: 
  right.sch.p - 
input.data[,school.var] == sch 
  print(  subset(input.data,right.sch.p)) #this is what I expected 
 } 
 
 ## [1] 1 
 ##   sch pop 
 ## 1   1 100 
 ## 2   1 200 
 ## [1] 2 
 ##   sch pop 
 ## 3   2 300 
 ## 4   2 400 
 
 
 for(sch in school.list){ 
  print(sch) 
  print(subset(input.data,input.data[,school.var] == sch)) #note - compact 
 version fails! 
 } 
 
 ## [1] 1 
 ##   sch pop 
 ## 1   1 100 
 ## 2   1 200 
 ## 3   2 300 
 ## 4   2 400 
 ## [1] 2 
 ##   sch pop 
 ## 1   1 100 
 ## 2   1 200 
 ## 3   2 300 
 ## 4   2 400 
 

-- 
View this message in context: 
http://n4.nabble.com/subset-function-unexpected-behavior-tp1459535p1460057.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Windows Graphics Device Lockups with Rterm

2009-07-10 Thread David Katz

I've been using Rterm with ESS to run R for some time. Recently I've
experienced lockups when displaying graphics; the first display seems to
work, but then refuses to respond and must be killed with dev.off(). Rgui
has no problems. I've tried eliminating all other processes that might cause
conflicts, to no avail.

I'm using win XP and R 2.9.0. Here's a transcript using rterm:


R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 if(!exists(baseenv, mode=function)) baseenv - function() NULL
options(STERM='iESS', editor='gnuclient.exe')
  plot(1:5)
#locked graphics device!
 dev.off()
null device 
  1 
 

Thanks for any suggestions.
-- 
View this message in context: 
http://www.nabble.com/Windows-Graphics-Device-Lockups-with-Rterm-tp24428960p24428960.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] www.rpad.org

2009-06-23 Thread David Katz

I've noticed this website has been down for several days. Does anyone have
any information on whether/when it is coming back? Thanks.
-- 
View this message in context: 
http://www.nabble.com/www.rpad.org-tp24175392p24175392.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to subsample all possible combinations of n species taken 1:n at a time?

2009-04-06 Thread David Katz

If I understand your problem properly, you could just note that selecting 1:n
of n objects is the same as deciding separately whether each one is included
or not. (exclude the case where none are selected).

Take 1000 of these and you are there- except some are duplicates - so
generate extras and eliminate the duplicates, discard the extras.

Something like this (not tested):

p - 2^(n-1) / (2^n - 1) #all combinations have equal probability - removing
rows with all zeros
result - matrix(0,1200*n,nrow=1200) #plenty of extras for duplicates
for(i in 1:1200) result[i,] - rbinom(n,1,p)
result - subset(result,apply(result,1,sum)  0) #cases which have at least
1 species
result - unique(result)[1:1000,]

Might be interesting to see the effect of varying p on the rest of your
analysis.

Further memory might be saved by using sparse matrices - see the Matrix
package.

David Katz
www.davidkatzconsulting.com


jasper slingsby wrote:
 
 Hello
 
 I apologise for the length of this entry but please bear with me.
 
 In short:
 I need a way of subsampling communities from all possible communities of n
 taxa taken 1:n at a time without having to calculate all possible
 combinations (because this gives me a memory error - using 
 combn() or expand.grid() at least). Does anyone know of a function? Or can
 you help me edit the 
 combn
 or 
 expand.grid 
 functions to generate subsamples?
 
 

-- 
View this message in context: 
http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22919388.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to subsample all possible combinations of n species taken 1:n at a time?

2009-04-06 Thread David Katz

This is very cool indeed until you want to use more than 32 or so terms and
most operating systems force you to go to floating point. 

 x=sample(2^34,1000)
Error in sample(2^34, 1000) : invalid 'x' argument
In addition: Warning message:
In sample(2^34, 1000) : NAs introduced by coercion



jholtman wrote:
 
 Are you just trying to obtain a combination from 25 possible terms?
 If so, then just sample the number you want and convert the number to
 binary:
 
 sample(33554432,100)
   [1]  6911360  5924262 23052661 12888381 25831589 16700013 24079278
 33282839 12751862 26086726 31363494  7118320 21866536  4212929
  

David Katz
www.davidkatzconsulting.com

-- 
View this message in context: 
http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22919597.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] autologistic modelling in R

2008-12-20 Thread David Katz

Curiosity and Google lead me to this paper which may be of interest:

Assessing the validity of autologistic regression
Purchase the full-text article



References and further reading may be available for this article. To view
references and further reading you must purchase this article.

Carsten F. DormannCorresponding Author Contact Information, a, E-mail The
Corresponding Author

aDepartment of Computational Landscape Ecology, UFZ Centre for Environmental
Research, Permoserstr. 15, 04318 Leipzig, Germany

Received 11 July 2006; 
revised 30 April 2007; 
accepted 7 May 2007. 
Available online 20 June 2007.

Abstract

In autologistic regression models employed in the analysis of species’
spatial distributions, an additional explanatory variable, the
autocovariate, is used to correct the effect of spatial autocorrelation. The
values of the autocovariate depend on the values of the response variable in
the neighbourhood. While this approach has been widely used over the last
ten years in biogeographical analyses, it has not been assessed for its
validity and performance against artificial simulation data with known
properties. I here present such an assessment, varying the range and
strength of spatial autocorrelation in the data as well as the prevalence of
the focal species. Autologistic regression models consistently underestimate
the effect of the environmental variable in the model and give biased
estimates compared to a non-spatial logistic regression. A comparison with
other methods available for the correction of spatial autocorrelation shows
that autologistic regression is more biased and less reliable and hence
should be used only in concert with other reference methods.



charlotte.bell wrote:
 
 Hi,
 
 I have spatially autocorrelated data (with a binary response variable and
 continuous predictor variables). I believe I need to do an autologistic
 model, does anyone know a method for doing this in R?
 
 Many thanks
 
 C Bell
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/autologistic-modelling-in-R-tp21072582p21108851.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Population Decay in R

2008-12-15 Thread David Katz

barplot(5000*((1-.26)^(0:49)))


jimdare wrote:
 
 Hi,
 
 I am new to R.  I am trying to plot the decay of a population over time
 (0-50yrs).  I have the initial population value (5000) and the mortality
 rate (0.26/yr) and I can't figure out how to apply this so I get a
 remaining population value each year.  In excel (ignoring headings) I
 would put 5000 in A1, in B2 I would enter the formula A1*0.26, and then in
 A2 (the next years population) I would subtract B2 from A1.  I would
 continue this process untill I had calculated the population for the 50th
 year.  Any ideas of how to do this in R?  :-/
 

-- 
View this message in context: 
http://www.nabble.com/Population-Decay-in-R-tp21024561p21025051.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tcl/tk example in batch

2008-08-13 Thread David Katz

The example for learning tcl/tk under R at
http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/OKtoplevel.html
suggests running it from batch - but when I do, the window flashes by and
the example ends. I'm under XP pro. Is there a workaround? Should I create a
modal window instead so it persists? Thanks.
-- 
View this message in context: 
http://www.nabble.com/tcl-tk-example-in-batch-tp18964294p18964294.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Case statements in R

2008-07-28 Thread David Katz

See ?cut for creating a factor based on ranges of values.

Regards,


Wade Wall wrote:
 
 Hi all,
 
 I am trying to convert geometric means in a matrix to cover classes.  My
 values are as such:
 
 perc-c(0,0.025136418, 0.316227766, 1.414213562,3.16227766, 7.071067812,
   15.8113883, 35.35533906, 61.23724357, 84.40971508, 97.46794345)
 cover-c(0,1,2,3,4,5,6,7,8,9,10)
 
 This is what I am trying to accomplish
 
 veg_mean[veg_mean0  veg_mean  .1] - 1
 veg_mean[veg_mean= .1  veg_mean  1.0] - 2
 veg_mean[veg_mean=1.0   veg_mean  2.0] - 3
 veg_mean[veg_mean=2.0   veg_mean  5.0] - 4
 veg_mean[veg_mean= 5.0   veg_mean  10.0] - 5
 veg_mean[veg_mean= 10.0   veg_mean  25] - 6
 veg_mean[veg_mean= 25.0  veg_mean  50.0] - 7
 veg_mean[veg_mean=50.0  veg_mean  75.0] - 8
 veg_mean[veg_mean= 75.0  veg_mean  95.0 ] - 9
 veg_mean[veg_mean= 95.0  veg_mean = 100] - 10
 veg_mean[veg_mean 100] - NA
 
 where values are assigned based on the geometric means.  However, I think
 that my syntax for the  operator is wrong and I can't find a reference
 to
 proper syntax.  I basically want to bin the geometric means.
 
 Any help would be greatly appreciated.
 
 Thanks,
 
 Wade
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Case-statements-in-R-tp18695725p18696107.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Random Forest %var(y)

2008-07-05 Thread David Katz

The verbose option gives a display like:

 rf.500 -
+   randomForest(new.x,trn.y,do.trace=20,ntree=100,nodesize=500,
+importance=T)
 |  Out-of-bag   |
Tree |  MSE  %Var(y) |
  20 |   0.9279   100.84 |


What is the meaning of %var(y)100%? I expected that to correspond to a
model that was worse than random, but the predictions seem much better than
that on the o-o-bag estimates from predict(rf.500).
-- 
View this message in context: 
http://www.nabble.com/Random-Forest--var%28y%29-tp18295412p18295412.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mgcv::gam error message for predict.gam

2008-06-11 Thread David Katz

Sometimes, for specific models, I get this error from predict.gam in library
mgcv:

Error in complete.cases(object) : negative length vectors are not allowed

 Here's an example:

model.calibrate -
  gam(meansalesw ~ s(tscore,bs=cs,k=4),
  data=toplot,
  weights=weight,
  gam.method=perf.magic)


 test - predict(model.calibrate,newdata)
Error in complete.cases(object) : negative length vectors are not allowed
 

The data is shown below:

 toplot[,c(meansalesw,tscore,weight)]
   meansalesw  tscore weight
1   0.1275841 0.003446797  15224
2   0.1495748 0.004017158  15523
3   0.2245844 0.004375278  15520
4   0.2197668 0.004753941  15525
5   0.1317830 0.005049050  15524
6   0.2809621 0.005403199  15498
7   0.2933119 0.005764413  15529
8   0.4791150 0.006335145  15514
9   0.1833688 0.006617095  15528
10  0.3200599 0.007135850  15527
11  0.4931882 0.007781095  15529
12  0.4207684 0.008766088  15512
13  0.5928568 0.009731357  15514
14  0.8025296 0.010927579  15520
15  0.6286192 0.012004714  15513
16  0.7477922 0.014083143  15527
17  0.7251362 0.017382274  15531
18  1.1871948 0.025481173  15521
19  1.6495832 0.048264689  15524
20  5.1180227 0.131198022  15218

 newdata
 tscore
1 0.5059341
2 0.4125522
3 1.4335818
4 0.7060673
5 0.3229316

Thanks!
-- 
View this message in context: 
http://www.nabble.com/mgcv%3A%3Agam-error-message-for-predict.gam-tp17789318p17789318.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] proto naming clash?

2008-05-15 Thread David Katz

Trying to learn Proto. This threw me:

#startup r...
  library(proto)
 a - proto(x=10)
 a$x
[1] 10
 x - proto(x=100)
 x$x
Error in get(x, env = x, inherits = TRUE) : invalid 'envir' argument
 

Do I simply need to be careful to name proto objects and proto components
uniquely? Is this the desired behavior for proto objects?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/proto-naming-clash--tp17258403p17258403.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mgcv::gam shrinkage of smooths

2008-05-06 Thread David Katz

In Dr. Wood's book on GAM, he suggests in section 4.1.6 that it might be
useful to shrink a single smooth by adding S=S+epsilon*I to the penalty
matrix S. The context was the need to be able to shrink the term to zero if
appropriate. I'd like to do this in order to shrink the coefficients towards
zero (irrespective of the penalty for wiggliness) - but not necessarily
all the way to zero. IE, my informal prior is to keep the contribution of a
specific term small.

1) Is adding eps*I to the penalty matrix an effective way to achieve this
goal?

2) How do I accomplish this in practice using mgcv::gam?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/mgcv%3A%3Agam-shrinkage-of-smooths-tp17093645p17093645.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mgcv::predict.gam lpmatrix for prediction outside of R

2008-04-09 Thread David Katz

This is in regards to the suggested use of type=lpmatrix in the
documentation for mgcv::predict.gam. Could one not get the same result more
simply by using type=terms and interpolating each term directly? What is
the advantage of the lpmatrix approach for prediction outside R? Thanks.
-- 
View this message in context: 
http://www.nabble.com/mgcv%3A%3Apredict.gam-lpmatrix-for-prediction-outside-of-R-tp16587009p16587009.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mgcv::gam prediction using lpmatrix

2008-04-06 Thread David Katz

The documentation for predict.gam in library mgcv gives an example of using
an lpmatrix to do approximate prediction via interpolation. However, the
code is specific to the example wrt  the number of smooth terms, df's for
each,etc. (which is entirely appropriate for an example)

Has anyone generalized this to directly generate code from a gam object (eg
SAS or C code)? I wanted to check before I reinvent the wheel. Thanks.
-- 
View this message in context: 
http://www.nabble.com/mgcv%3A%3Agam-prediction-using-lpmatrix-tp16531418p16531418.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re gression trees: mean square vs. absolute errors

2008-03-25 Thread David Katz

You need to think through the application of your model. Is it more important
to get more cases classified correctly, or to avoid bigger errors versus a
probability prediction? You should optimize your choice of a loss function
so that it is appropriate to the way in which the model will be used.


lubaroz wrote:
 
 Hi,
 I am working with CART regression now to predict a probability; the
 response is binary. Could anyone tell me in which cases it is better to
 use mean square error for splitting nodes and when mean absolute error
 should be preferred.
 I am now using the default (MSE) version and I can see that the obtained
 optimal tree is very different from the tree with the least mean absolute
 error.
 
 Thanks in advance,
   Luba
 

-- 
View this message in context: 
http://www.nabble.com/regression-trees%3A-mean-square-vs.-absolute-errors-tp16274094p16286639.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] running balance down a dataframe referring back to previous row

2008-03-19 Thread David Katz

Try:

 cs - with(txns,cumsum(cr - dr))

You could if needed adjust the starting value to zero by concatenating a
zero in front and dropping the last entry.

txns$running.bal - c(0,cs[seq(length(cs) - 1)])

Good luck.


seanpor wrote:
 
 Good morning, I've searched high and low and I've tried many different
 ways
 of doing this, but I can't seem to get it to work.
 
 I'm looking for a way of vectorising a running balance; i.e. the value
 in
 the first row of the dataframe is zero, and following rows add to this
 running balance.  This is easy to write in a loop, but I can't seem to get
 it working in vectorised code.  Hopefully the example below will explain
 what I'm trying to do...
 
 Many thanks in advance,
 
 Best regards,
 Sean O'Riordain
 
 

-- 
View this message in context: 
http://www.nabble.com/running-balance-down-a-dataframe-referring-back-to-previous-row-tp16142263p16145133.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in random forest

2008-03-08 Thread David Katz

I've had the same problem and solved it by removing the cases with the new
levels - they need to be handled some other way, either by building a new
model or reassigning the factor level to one in the training set.



Nagu wrote:
 
 Hi,
 
 I get the following error when I try to predict the probabilities of a
 test sample:
 
 Error in predict.randomForest(fit.EBA.OM.rf.50, x.OM, type = prob) :
   New factor levels not present in the training data
 
 I have about 630 predictor variables in the dataset x.OM (25 factor
 variables and the remaining are continuous variables). Any ideas on
 how to trace it?
 
 Thank you,
 Nagu
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/error-in-random-forest-tp15904235p15922797.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest() for regression produces offset predictions

2007-12-20 Thread David Katz

I would expect this regression towards the mean behavior on a new or hold out
dataset, not on the training data. In RF terminology, this means that the
model prediction from predict is the in-bag estimate, but the out-of-bag
estimate is what you want for prediction. In Joshua's example,
rf.rf$predicted is an out-of-bag estimate, but since newdata is given, it
appears that the result is the in-bag estimate, which still needs an
adjustment like Joshua's  (and perhaps a more complex one might be needed in
some cases). This is a bit confusing since predict() usually matches what's
in model$fitted.values. I imagine that's why the author used predicted as
the component name instead of the standard fitted.values.

The documentation for predict.randomForest explains:

newdata - a data frame or matrix containing new data. (Note: If not given,
the out-of-bag prediction in object is returned.  



Patrick Burns wrote:
 
 What I see is the predictions being less extreme than the
 actual values -- predictions for large actual values are smaller
 than the actual, and predictions for small actual values are
 larger than the actual.  That makes sense to me.  The object
 is to maximize out-of-sample predictive power, not in-sample
 predictive power.
 
 Or am I missing something in what you are saying?
 
 
 Patrick Burns
 [EMAIL PROTECTED]
 +44 (0)20 8525 0696
 http://www.burns-stat.com
 (home of S Poetry and A Guide for the Unwilling S User)
 
 
 Joshua Knowles wrote:
 
Hi all,
 
I have observed that when using the randomForest package to do regression,
the 
predicted values of the dependent variable given by a trained forest are
not 
centred and have the wrong slope when plotted against the true values.
 
This means that the R^2 value obtained by squaring the Pearson correlation
are 
better than those obtained by computing the coefficient of determination 
directly. The R^2 value obtained by squaring the Pearson can, however, be 
exactly reproduced by the coeff. of det. if the predicted values are first 
linearly transformed (using lm() to find the required intercept and
slope).
 
Does anyone know why the randomForest behaves in this way - producing
offset 
predictions? Does anyone know a fix for the problem?
 
(By the way, the feature is there even if the original dependent variable 
values are initially transformed to have zero mean and unit variance.)
 
As an example, here is some simple R code that uses the available swiss 
dataset to show the effect I am observing.

Thanks for any help.
 
--
 EXAMPLE OF RANDOM FOREST REGRESSION
 
library(randomForest)
data(swiss)
swiss
 
#Build the random forest to predict Infant Mortality
rf.rf-randomForest(Infant.Mortality ~ ., data=swiss)
 
#And predict the training set again
pred-c(predict(rf.rf,swiss))
actual-swiss$Infant.Mortality
 
#Plotting predicted against actual values shows the effect (uncomment to
see
this)
#plot(pred,actual)
#abline(0,1)
 
# calculate R^2 as pearson coefficient squared
R2one-cor(pred,actual)^2
 
# calculate R^2 value as fraction of variance explained
residOpt-(actual-pred)
residnone-(actual-mean(actual))
R2two-1-var(residOpt,na.rm=TRUE)/var(residnone, na.rm=TRUE)
 
# now fit a line through the predicted and true values and
# use this to normalize the data before calculating R^2
 
fit-lm(actual ~ pred)
coef(fit)
pred2-pred*coef(fit)[2]+coef(fit)[1]
residOpt-(actual-pred2)
R2three-1-var(residOpt,na.rm=TRUE)/var(residnone, na.rm=TRUE)
 
cat(Pearson squared = ,R2one,\n)
cat(Coeff of determination = , R2two, \n)
cat(Coeff of determination after linear fitting = , R2three, \n)
 
## END
 

  

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/randomForest%28%29-for-regression-produces-offset-predictions-tp14415517p14447468.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help google group archive

2007-12-04 Thread David Katz

Also see www.nabble.com for a very nice interface to current and archived
posts.



vince-28 wrote:
 
 I made a google group archive of current and future R-help posts at
 http://groups.google.com/group/r-help-archive
 
 If you are signed-up for the R-help mailing list with a gmail account
 you can post/reply through the google group pages. Note that this is
 not a separate mailing-list, just a copy of the original. Only posts
 after December 2nd 2007 will be available.
 
 I assume there are no objections to this. In case I am wrong please
 let me know.
 
 Vincent
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/R-help-google-group-archive-tf494.html#a14152914
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R as server application

2007-11-25 Thread David Katz

This sounds useful, but can you give more info on forward X Thanks.


Scionforbai wrote:
 
 Do you need something more than a simple ssh connection to a remote
 host in which you run R (trivial when the server is Linux)?
 
 My advice is to run R in a screen session on the remote host (it
 protects from sudden disconnections). Then you have a window on your
 screen with the R command line, which you can copy/paste your scripts
 to (from whichever editor you want) as if it was running locally. Of
 course, on-screen graphics works (if you forward X... if you see what
 I mean) but it depends on connection speed (in LAN no problem, through
 internet it can be a pain, I usually don't use it then) and you need
 an X server running locally (if the 'client' is windows, cygwin highly
 recommended).
 
 A real R 'server' could be very useful though.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/R-as-server-application-tf4849719.html#a13944110
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Largest N Values Efficiently?

2007-11-12 Thread David Katz

x is a 1XN sparse matrix of numerics. I am using the Matrix package to
represent as a sparse matrix; the representation has a numeric vector
representing the positions within the matrix. My goal is find the columns
with the n largest values, here positive  correlations. Part of my strategy
is to only sort the nonzeros which are available as a numeric vector. 

Thanks for your interest and input.



Prof Brian Ripley wrote:
 
 What is 'x' here?  What type?  Does it contain NAs?  Are there ties?  R's 
 ordering functions are rather general, and you can gain efficiency by 
 ruling some of these out.
 
 See ?sort, look at the 'partial' argument, including the comments in the 
 Details.  And also look at ?sort.list.
 
 sort.int(x) is more efficient than x[order(x)], and x[order(x)[1:n]] is 
 more efficient than x[order(x)][1:n] for most types.
 
 Finally, does efficiency matter?  As the examples in ?sort show, R can 
 sort a vector of length 2000 is well under 1ms, and 1e7 random normals in 
 less time than they take to generate.  There are not many tasks where 
 gaining efficiency over x[order(x)][1:n] will be important.  E.g.
 
 system.time(x - rnorm(1e6))
 user  system elapsed
 0.440.000.44
 system.time(x[order(x)][1:4])
 user  system elapsed
 1.720.001.72
 system.time(x2 - sort.int(x, method = quick)[1:4])
 user  system elapsed
 0.310.000.32
 system.time(min(x))
 user  system elapsed
 0.020.000.02
 system.time(x2 - sort.int(x, partial=1)[1])
 user  system elapsed
 0.070.000.07
 
 and do savings of tenths of a second matter?  (There is also 
 quantreg::kselect, if you work out how to use it, which apparently is 
 a bit faster at partial sorting on MacOS X but not elsewhere.)
 
 
 On Sun, 11 Nov 2007, David Katz wrote:
 

 What is the most efficient alternative to x[order(x)][1:n] where
 length(x)n?
 
 That is the smallest n values, pace your subject line.
 
 I also need the positions of the mins/maxs perhaps by preserving names.

 Thanks for any suggestions.

 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Largest-N-Values-Efficiently--tf4788033.html#a13708965
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.