[R] MOOC on Statistical Learning with R
Rob Tibshirani and I are offering a MOOC in January on Statistical Learning. This massive open online course" is free, and is based entirely on our new book An Introduction to Statistical Learning with Applications in R (James, Witten, Hastie, Tibshirani 2013, Springer). http://www-bcf.usc.edu/~gareth/ISL/ The pdf of the book will also be free. The course, hosted on Open edX, consists of video lecture segments, quizzes, video R sessions, interviews with famous statisticians, lecture notes, and more. The course starts on January 22 and runs for 10 weeks. Please consult the course webpage http://statlearning.class.stanford.edu/ to enroll and for for further details. ---- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Some improvements in gam package
I have posted a new version of the gam package: gam_1.09 to CRAN. Thus update improved the step.gam function considerably, and gives it a parallel option. I am posting this update announcement along with the original package announcement below, which may be of interest to those new to the list Trevor Hastie Begin forwarded message: > From: "Trevor Hastie" > Subject: gam --- a new contributed package > Date: August 6, 2004 10:35:36 AM PDT > To: > > I have contributed a "gam" library to CRAN, > which implements "Generalized Additive Models". > > This implementation follows closely the description in > the GAM chapter 7 of the "white" book "Statistical Models in S" > (Chambers & Hastie (eds), 1992, Wadsworth), as well as the philosophy > in "Generalized Additive Models" (Hastie & Tibshirani 1990, Chapman and > Hall). Hence it behaves pretty much like the Splus version of GAM. > > Note: this gam library and functions therein are different from the > gam function in package mgcv, and both libraries should not be used > simultaneously. > > The gam library allows both local regression (loess) and smoothing > spline smoothers, and uses backfitting and local scoring to fit gams. > It also allows users to supply their own smoothing methods which can > then be included in gam fits. > > The gam function in mgcv uses only smoothing spline smoothers, with a > focus on automatic parameter selection via gcv. > > Some of the features of the gam library: > > * full compatibility with the R functions glm and lm - a fitted gam > inherits from class "glm" and "lm" > > * print, summary, anova, predict and plot methods are provided, as > well as the usual extractor methods like coefficients, residuals etc > > * the method step.gam provides a flexible and customizable approach to > model selection. > > Some differences with the Splus version of gam: > > * predictions with new data are improved, without need for the > "safe.predict.gam" function. This was partly facilitated by > the improved prediction strategy used in R for GLMs and LMs > > * Currently the only backfitting algorithm is all.wam. In the earlier > versions of gam, dedicated fortran routines fit models that had only > smoothing spline terms (s.wam) or all local regression terms > (lo.wam), which in fact made calls back to Splus to update the > working response and weights. These were designed for efficiency. It > seems now with much faster computers this efficiency is no longer > needed, and all.wam is modular and "visible" > Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -- [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] glmnet webinar Friday May 3 at 10am PDT
I will be giving a webinar on glmnet on Friday May 3, 2013 at 10am PDT (pacific daylight time) The one-hour webinar will consist of: - Intro to lasso and elastic net regularization, and coefficient paths - Why is glmnet so efficient and flexible - New features of the latest version of glmnet - Live glmnet demonstration - Question and Answer period To sign up for the webinar, please go to https://www3.gotomeeting.com/register/77950 The webinar is hosted by the Orange County R User Group., and will be moderated by its president Ray DiGiacomo Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -- [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] softImpute_1.0 uploaded to CRAN
SoftImpute is a new package for matrix completion - i.e. for imputing missing values in matrices. SoftImpute was written by myself and Rahul Mazumder. softImpute uses uses squared-error loss with nuclear norm regularization - one can think of it as the "lasso" for matrix approximation - to find a low-rank approximation to the observed entries in the matrix. This low-rank approximation is then used to impute the missing entries. softImpute works in a kind of "EM" fashion. Given a current guess, it fills in the missing entries. Then it computes a soft-thresholded SVD of this complete matrix, which yields the next guess. These steps are iterated till convergence to the solution of the convex-optimation problem. The algorithm can work with large matrices, such as the "netflix" matrix (400K x 20K) by making heavy use of sparse-matrix methods in the Matrix package. It creates new S4 classes such as "Incomplete" for storing the large data matrix, and "SparseplusLowRank" for representing the completed matrix. SVD computations are done using a specially built block-alternating algorithm, svd.als, that exploits these structures and uses warm starts. Some of the methods used are described in Rahul Mazumder, Trevor Hastie and Rob Tibshirani: Spectral Regularization Algorithms for Learning Large Incomplete Matrices. JMLR 2010 11 2287-2322 Other newer and more efficient methods that inter-weave the alternating block algorithm steps with imputation steps will be described in a forthcoming article. Some of the features of softImpute are * works with large matrices using sparse matrix methods, or smaller matrices using standard svd methods. * one can control the maximum rank of the solution, to avoid overly expensive operations. * warm starts can be used to move from one solution to a new solution with a different value for the nuclear-norm regularization parameter lambda (and/or a different rank) * with lambda=0 and a specified rank, one automatically gets an implementation of "hardImpute" - iterative svd imputation * softImpute has an option "type" which can be "svd" or "als" (alternating least squares), for specifying which of the two approaches above should be used. *included in the package is svd.als, an efficient rank-restricted svd algorithm that can exploit sparsity and other special structure, and accept warm starts. * a function biScale is provided, for centering and scaling both rows and columns of matrix to have means zero and variance 1. The centering and scaling constants are stored on the object. For sparse matrices with centering, the centered object is stored in "SparseplusLowRank" form to preserve its special structure * prediction functions impute and complete are provided. Trevor Hastie Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] glmnet 1.9-3 uploaded to CRAN (with intercept option)
This update adds an intercept option (by popular request) - now one can fit a model without an intercept Glmnet is a package that fits the regularization path for a number of generalized linear models, with with "elastic net" regularization (tunable mixture of L1 and L2 penalties). Glmnet uses pathwise coordinate descent, and is very fast. The current list of models covered are: least squares linear regression binary logistic regression multinomial logistic regression (grouped and ungrouped) poisson regression multi-response linear regression (grouped) Cox proportinal-hazards model Some of the features of glmnet: * By default it computes the path at 100 uniformly spaced (on the log scale) values of the regularization parameter lambda. Alternatively users can provide their own values of lambda * Recognizes and exploits sparse input matrices (ala Matrix package; this feature not yet implemented for Cox family). * Coefficient matrices are output in sparse matrix representation. * Penalty is (1-a)*||\beta||_2^2 +a*||beta||_1 where a is between 0 and 1; a=0 is the Lasso penalty, a=1 is the ridge penalty. For many correlated predictors, a=.95 or thereabouts improves the performance of the lasso. * Convenient predict, plot, print, and coef methods * Variable-wise penalty modulation allows each variable to be penalized by a scalable amount; if zero that variable always enters * Some variables can be excluded (a convenience option) * Glmnet uses a symmetric parametrization for multinomial, with constraints enforced by the penalization. When the "grouped" option is used, it selects in or out all the class coefficients for a variable together. * A comprehensive set of cross-validation routines are provided for all models and several error measures; These include deviance, mean absolute error, misclassification error and "auc" for logistic or multinomial models. * Offsets and weights can be provided for all models * Upper and lower bounds can be imposed on each of the coefficients * An intercept option allows for models to be fit with or without intercepts. * A standardize option allows for variable standardization * A number of control parameters can be set in the calling function. In addition, a function glmnet.control allows users to set some internal control variables for the entire session. * Uses strong rules for speeding up convergence (by temporarily limiting the active set). Examples of glmnet speed trials: Newsgroup data: N=11,000, p= 0.75 Million, two class logistic. 100 values along lasso path. Time = 2mins 14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along lasso path. Time = 30secs Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani and Noah Simon References: Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent http://www.stanford.edu/~hastie/Papers/glmnet.pdf> Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 http://www.jstatsoft.org/v33/i01/ Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5) 1-13 http://www.jstatsoft.org/v39/i05/ Tibshirani, Robert., Bien, J., Friedman, J.,Hastie, T.,Simon, N.,Taylor, J. and Tibshirani, Ryan. (2010) Strong Rules for Discarding Predictors in Lasso-type Problems, http://www-stat.stanford.edu/~tibs/ftp/strong.pdf -------- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -- [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] glmnet_1.9-1 submitted to CRAN
This new version of glmnet has some bug fixes, and some new features * new arguments lower.limits=-Inf and upper.limits=Inf (defaults shown) for all the coefficients in glmnet. Users can provide limits on coefficients. See the documentation for glmnet. Typical usage: glmnet(x,y,lower=0) Here the argument is abbreviated, and by giving a single value, this uses the same value for all parameters. This fits a positive lasso * new function glmnet.control() allows one to set internal parameters in glmnet, previously not under user control. These are for knowledgeable users. Once changed, the settings persist for the session. glmnet.control has a useful factory=TRUE argument, which will reset the "factory" defaults. * a memory bug in coxnet has been fixed. Trevor Hastie ---- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] glmnet_1.8-4 on CRAN
This version has some minor bug fixes, plus some new features. * The exact=TRUE option in predict and coef methods now works. In earlier versions of glmnet, if you supplied a value of s different from the sequence of lambdas used to compute the fit, predict used interpolation. This is exact for lasso (alpha=1) and family="gaussian", and an approximation otherwise. Outside the range it used the closest member in the range. The most frequent value requested was typically s=0, and that was a) never in the range, and b) always a little off. Now predict.glmnet returns the exact values In case you missed earlier announcements, glmnet now has additional families. * "mgaussian" is a multi-response gaussian model, that uses a group lasso penalty for the set of coefficients for each predictor. * For the type="multinomial" family, there is an additional argument type.multinomial=c("ungrouped","grouped") For the grouped cases, again a group lasso penalty is used on the set of class coefficients for a predictor. Trevor Hastie -------- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Glmnet_1.8 uploaded to CRAN
This is a major revision, with two additional models included. 1) Multiresponse regression - family="mgaussian" Here we have a matrix of M responses, and we fit a series of linear models in parallel. We use a group-lasso penalty on the set of M coefficients for each variable. This means they are all in or out together 2) family="multinomial, type.multinomial="grouped" Same story = multinomial regression, but now the group lasso penalty ensures all the coefficients are in or out for each class at the same time. We have left the default as type.multinomial="ungrouped" because currently this grouped version is about 10 times slower. We will be looking to improve this aspect. Thanks to Noah Simon for his work on developing the algorithms for both these options. A report is in the works. -------- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmnet_1.7.4
A new version of glmnet is uploaded to CRAN. This should take care of the problem on PCs that caused it to fail there. Many thanks to B. Narasimhan for his stoic efforts in debugging this problem, which was a real nasty idiosyncrasy in the gfortran compiler that exists on windows but not on linux or MacOS platforms. Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmnet_1.7.3 on windows
We are aware that glmnet_1.7.3 does not pass for windows and are looking into the problem. It has something to do with the gcc compiler being slightly different on windows versus linux/mac platforms. As soon as we have resolved the issue, we will post a new version to CRAN Trevor Hastie Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] sparsenet: a new package for sparse model selection
We have put a new package sparsenet on CRAN. Sparsenet fits regularization paths for sparse model selection via coordinate descent, using a penalized least-squares framework and a non-convex penalty. The package is based on our JASA paper Rahul Mazumder, Jerome Friedman and Trevor Hastie: SparseNet : Coordinate Descent with Non-Convex Penalties. (JASA 2011) http://www.stanford.edu/~hastie/Papers/Sparsenet/jasa_MFH_final.pdf We use Zhang's MC+ penalty to impose sparsity in model selection. This penalty parametrizes a family ranging between L1 and L0 regularization. One nice feature of this family is that the single-coordinate optimization problems are convex, making it ideal for coordinate descent. The package fits the regularization surface for each parameter - a surface over the two-dimensional space of tuning parameters. The concavity parameter gamma indexes the member of the family, and lambda is the usual Lagrange penalty parameter which determines the strength of the penalty. Sparsenet is extremely fast. For example, with 10K variables and 1K samples, the entire surface with 10 values of gamma and 50 values of lambda takes under a second on a Macbook Pro. The package includes functions for fitting, plotting and cross-validation of the models, as well as methods for prediction. Trevor Hastie, with Jerome Friedman and Rahul Mazumder ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] differences between 1.7 and 1.7.1 glmnet versions
I have just started using changelogs, and am clearly not disciplined enough at it. The big change that occurred was the convergence criterion, which would account for the difference. At some point will put up details of this. Trevor Hastie On Dec 26, 2011, at 11:55 PM, Damjan Krstajic wrote: > > Dear All, > > I have found differences between glmnet versions 1.7 and 1.7.1 which, in > my opinion, are not cosmetic and do not appear in the ChangeLog. If I am > not mistaken, glmnet appears to return different number of selected > input variables, i.e. nonzeroCoef(fit$beta[[1]]) differes between > versions. The code below is the same for 1.7.1 and 1.7, but you can see > that outputs differ. I would automatically use the latest version, but > by looking at the ChangeLog I wonder if this is a bug or expected > behaviour, as this change is not documented. > > Thanks in advance. > DK > >> # glmnet 1.7.1 >> library(glmnet) > Loading required package: Matrix > Loading required package: lattice > Loaded glmnet 1.7.1 >> set.seed(1) >> x=matrix(rnorm(40*500),40,500) >> g4=sample(1:7,40,replace=TRUE) >> fit=glmnet(x,g4,family="multinomial",alpha=0.1) >> dgcBeta<- fit$beta[[1]] >> which=nonzeroCoef(dgcBeta) >> which > [1] 1 12 15 17 19 20 34 39 42 58 60 62 63 65 71 72 > 73 77 > [19] 80 82 85 86 95 97 98 99 106 110 113 114 119 120 123 124 > 128 130 > [37] 136 138 139 143 148 149 151 160 161 162 173 174 175 176 177 183 > 186 187 > [55] 188 190 193 194 195 198 199 204 206 218 224 238 239 240 241 245 > 247 250 > [73] 252 255 256 258 265 266 270 277 278 281 287 293 294 296 297 300 > 306 308 > [91] 311 316 317 321 326 329 336 337 341 349 354 356 363 365 368 374 > 376 377 > [109] 379 384 385 389 397 398 400 403 404 407 415 417 418 423 424 430 > 432 437 > [127] 440 442 446 450 451 454 456 459 463 467 470 472 474 478 481 488 > 496 497 > [145] 498 500 >> # just to check that inputs to glmnet are the same >> g4 > [1] 5 4 5 3 2 6 1 6 6 1 3 6 1 2 6 3 7 2 6 7 6 7 5 1 3 2 2 3 2 3 3 1 5 > 6 7 4 6 3 > [39] 2 7 >> x[,1] > [1] -0.62645381 0.18364332 -0.83562861 1.59528080 0.32950777 > -0.82046838 > [7] 0.48742905 0.73832471 0.57578135 -0.30538839 1.51178117 > 0.38984324 > [13] -0.62124058 -2.21469989 1.12493092 -0.04493361 -0.01619026 0.94383621 > [19] 0.82122120 0.59390132 0.91897737 0.78213630 0.07456498 -1.98935170 > [25] 0.61982575 -0.05612874 -0.15579551 -1.47075238 -0.47815006 0.41794156 > [31] 1.35867955 -0.10278773 0.38767161 -0.05380504 -1.37705956 -0.41499456 > [37] -0.39428995 -0.05931340 1.10002537 0.76317575 >> > >> glmnet 1.7 >> library(glmnet) > Loading required package: Matrix > Loading required package: lattice > Loaded glmnet 1.7 >> set.seed(1) >> x=matrix(rnorm(40*500),40,500) >> g4=sample(1:7,40,replace=TRUE) >> fit=glmnet(x,g4,family="multinomial",alpha=0.1) >> dgcBeta<- fit$beta[[1]] >> which=nonzeroCoef(dgcBeta) >> which > [1] 1 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 > 18 19 > [19] 20 21 22 23 24 25 26 27 28 30 31 32 33 34 35 36 > 37 38 > [37] 39 41 42 43 44 45 46 47 48 50 51 52 53 54 55 56 > 57 58 > [55] 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 > 75 76 > [73] 77 78 79 80 81 82 83 84 85 86 87 88 89 91 93 94 > 95 97 > [91] 98 99 100 101 102 104 105 106 107 109 110 111 112 113 114 115 > 116 119 > [109] 120 121 122 123 124 126 127 128 130 131 132 133 134 135 136 137 > 138 139 > [127] 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 156 > 157 159 > [145] 160 161 162 163 164 165 167 168 170 171 172 173 174 175 176 177 > 178 179 > [163] 180 181 182 183 184 185 186 187 188 189 190 191 193 194 195 196 > 197 198 > [181] 199 200 203 204 205 206 207 208 209 211 212 213 215 216 217 218 > 219 220 > [199] 221 222 223 224 225 226 227 228 229 231 232 233 234 235 236 237 > 238 239 > [217] 240 241 242 243 244 245 246 247 248 249 250 251 252 253 255 256 > 257 258 > [235] 259 261 262 263 264 265 266 268 269 270 271 272 273 274 275 276 > 277 278 > [253] 279 280 281 282 283 285 286 287 288 289 290 291 292 293 294 295 > 296 297 > [271] 298 299 300 301 302 304 305 306 307 308 309 310 311 312 313 314 > 315 316 > [289] 317 318 319 321 323 324 325 326 327 328 329 330 331 332 333 334 > 336 337 > [307] 338 339 341 342 343 344 345 346 347 348 349 350 351 352 353 354 > 355 356 > [325] 357 358 361 362 363 364 365 366 367 368 369 370 371 372 373 374 > 375 376 > [343] 377 378 379 380 381 382 384 38
[R] [R-pkgs] svmpath_0.95 uploaded to CRAN
This new version includes a plot method for plotting a particular instance along the path. Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 Fax: (650) 725-8977 URL: http://www.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] glmnet_1.6 uploaded to CRAN
We have submitted glmnet_1.6 to CRAN This version has an improved convergence criterion, and it also uses a variable screening algorithm that dramatically reduces the time to convergence (while still producing the exact solutions). The speedups in some cases are by a factors of 20 to 50, depending on the particular problem and loss function. See our paper http://www-stat.stanford.edu/~tibs/ftp/strong.pdf "Strong Rules for Discarding Predictors in Lasso-type Problems" for details of this screening method. --- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmnet_1.5.1 uploaded to CRAN
In glmnet_1.5 a poor default was set for the argument type which caused the program to be very slow or even crash when nvar (p) is very large. The argument type (now called type.gaussian) has two options, "covariance" or "naive", and is used for the default family="gaussion" model (squared error loss). When type.gaussian="covariance", all inner-products between variables in the active set and all other variables are cached, and can cause considerable speedup when nobs is large. However, when nvar is large (>500) the matrix to be stored gets large, and this strategy becomes counterproductive. In addition, when nvar is very large, glmnet tries to allocate a storage space for this matrix that can exceed the machine's memory. When type.gaussian="naive", nothing is cached, and inner products (loop over nobs) are computed whenever needed. In this minor upgrade, the default is "covariance" if nvar<500, else it is "naive". We established this rule after conducting extensive simulations. In addition, the argument was renamed so as not to collide with the argument type to cv.glmnet, which is now renamed to type.measure. In both cases, abbreviations work. --- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmnet_1.5 uploaded to CRAN
This is a new version of glmnet, that incorporates some bug fixes and speedups. * a new convergence criterion which which offers 10x or more speedups for saturated fits (mainly effects logistic, Poisson and Cox) * one can now predict directly from a cv.object - see the help files for cv.glmnet and predict.cv.glmnet * other new methods are deviance() for "glmnet" and coef() for "cv.glmnet" Here is the description of the package. glmnet is a package that fits the regularization path for linear, two- and multi-class logistic regression models, poisson regression and the Cox model, with "elastic net" regularization (tunable mixture of L1 and L2 penalties). glmnet uses pathwise coordinate descent, and is very fast. Some of the features of glmnet: * by default it computes the path at 100 uniformly spaced (on the log scale) values of the regularization parameter * glmnet is very fast, even for large data sets. * recognizes and exploits sparse input matrices (ala Matrix package). Coefficient matrices are output in sparse matrix representation. * penalty is (1-a)*||\beta||_2^2 +a*||beta||_1 where a is between 0 and 1; a=0 is the Lasso penalty, a=1 is the ridge penalty. For many correlated predictors, a=.95 or thereabouts improves the performance of the lasso. * convenient predict, plot, print, and coef methods * variable-wise penalty modulation allows each variable to be penalized by a scalable amount; if zero that variable always enters * glmnet uses a symmetric parametrization for multinomial, with constraints enforced by the penalization. * a comprehensive set of cross-validation routines are provided for all models and several error measures * offsets and weights can be provided for all models Examples of glmnet speed trials: Newsgroup data: N=11,000, p= 0.75 Million, two class logistic. 100 values along lasso path. Time = 2mins 14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along lasso path. Time = 30secs Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani. See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for implementation details, and comparisons with other related software. ------- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Statistical Learning and Datamining Course October 2010 Washington DC
Short course: Statistical Learning and Data Mining III: Ten Hot Ideas for Learning from Data Trevor Hastie and Robert Tibshirani, Stanford University Georgetown University Conference Center Washington DC, October 11-12, 2010. This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips. In this course we emphasize the tools useful for tackling modern-day data analysis problems. From the vast array of tools available, we have selected what we consider are the most relevant and exciting. Our top-ten list of topics are: * Regression and Logistic Regression (two golden oldies), * Lasso and Related Methods, * Support Vector and Kernel Methodology, * Principal Components (SVD) and Variations: sparse SVD, supervised PCA, Nonnegative Matrix Factorization * Boosting, Random Forests and Ensemble Methods, * Rule based methods (PRIM), * Graphical Models, * Cross-Validation, * Bootstrap, * Feature Selection, False Discovery Rates and Permutation Tests. Our earlier courses are not a prerequisite for this new course. Although there is some overlap with past courses, our new course contains many topics not covered by us before. The material is based on recent papers by the authors and other researchers, as well as the new second edition of our best selling book: Statistical Learning: data mining, inference and prediction Hastie, Tibshirani & Friedman, Springer-Verlag, 2009 http://www-stat.stanford.edu/ElemStatLearn/ A copy of this book will be given to all attendees. The lectures will consist of video-projected presentations and discussion. Go to the site http://www-stat.stanford.edu/~hastie/sldm.html for more information and online registration. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help needed with help
I installed R version 2.11.0 (2010-04-22) on may macbook (snow leopard) and run R from within emacs Now when I try to get help, I get > ?lm (in the new "help" window) Error in help("lm", htmlhelp = FALSE) : unused argument(s) (htmlhelp = FALSE) Help! p.s. I am running: This is GNU Emacs 22.2.50.1 (i386-apple-darwin9.4.0, Carbon Version 1.6.0) of 2008-07-17 on seijiz.local ----------- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New package for ICA uploaded to CRA
I have uploaded a new package to CRAN called ProDenICA. This fits ICA models directly via product-density estimation of the source densities. This package was promised on page 567 in the 2nd edition of our book 'Elements of Statistical Learning' (Hastie, Tibshirani and Friedman, 2009, Springer) . Apologies that it is so late. The method fits each source density by a tilted gaussian density, where the log of the tilting function is modeled by a smoothing spline. This function is then used as a contrast function for computing the negentropy measure for this source component. The estimation is achieved by fitting a poisson GAM model for each component, with the log-gaussian as an offset. The method was first described in Hastie, T. and Tibshirani, R. (2003). Independent component analysis through product density estimation, in S. T. S. Becker and K. Obermayer (eds), Advances in Neural Information Processing Systems 15, MIT Press, Cambridge, MA, pp. 649-656. --- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Major glmnet upgrade on CRAN
glmnet_1.2 has been uploaded to CRAN. This is a major upgrade, with the following additional features: * poisson family, with dense or sparse x * Cox proportional hazards family, for dense x * wide range of cross-validation features. All models have several criteria for cross-validation. These include deviance, mean absolute error, misclassification error and "auc" for logistic or multinomial models. Observation weights are incorporated. * offset is allowed in fitting the model Here is the description of the package. glmnet is a package that fits the regularization path for linear, two- and multi-class logistic regression models, poisson regression and the Cox model, with "elastic net" regularization (tunable mixture of L1 and L2 penalties). glmnet uses pathwise coordinate descent, and is very fast. Some of the features of glmnet: * by default it computes the path at 100 uniformly spaced (on the log scale) values of the regularization parameter * glmnet appears to be faster than any of the packages that are freely available, in some cases by two orders of magnitude. * recognizes and exploits sparse input matrices (ala Matrix package). Coefficient matrices are output in sparse matrix representation. * penalty is (1-a)*||\beta||_2^2 +a*||beta||_1 where a is between 0 and 1; a=0 is the Lasso penalty, a=1 is the ridge penalty. For many correlated predictors, a=.95 or thereabouts improves the performance of the lasso. * convenient predict, plot, print, and coef methods * variable-wise penalty modulation allows each variable to be penalized by a scalable amount; if zero that variable always enters * glmnet uses a symmetric parametrization for multinomial, with constraints enforced by the penalization. * a comprehensive set of cross-validation routines are provided for all models and several error measures * offsets and weights can be provided for all models Examples of glmnet speed trials: Newsgroup data: N=11,000, p= 0.75 Million, two class logistic. 100 values along lasso path. Time = 2mins 14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along lasso path. Time = 30secs Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani. See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for implementation details, and comparisons with other related software. ------- Trevor Hastie has...@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] New version of package mda
mda 0.1-4 is on CRAN Many thanks to Friedrich Leisch, Kurt Hornik and Brian Ripley for their early work in porting the mda package into R, and to Kurt for maintaining the package. I have "taken back" mda and will maintain it from now on. The package fits flexible, penalized and mixture discriminant models. For a brief introduction, see Sections 12.4-7 of "Elements of Statistical Learning" (first or second edition). This new version has documentation for the plot method, and has improved functionality for the regression method "gen.ridge". The "laplacian" penalty works and is documented, for implementing penalized discriminant analysis via the function fda(). The mars function has not changed, but users are encouraged to use the "earth" package of Stephen Milborrow, which fits "MARS" models; in particular, earth works as a regression method for fda() and mda(). Trevor Hastie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] new version of glmnet
glmnet _1.1-4 is on CRAN now. This version includes cross.validation functions to assist in picking a good value for "lambda" These functions are preliminary, in that they can only handle gaussian or logistic models for binary data. The complete range will appear in the future. For those unfamiliar with glmnet, here is the original blurb: glmnet fits lasso and elastic net regularization paths for squared error, binomial and multinomial models via coordinate descent. It is extremely fast and can work on large scale problems. See the paper: "Regularized Paths for Generalized Linear Models via Coordinate Descent" by Friedman, Hastie, Tibshirani on my website for details. Glmnet can accommodate sparse data matrices efficiently, and thereby handle even larger problems. For example for a two class logistic model with 11K obs and 750K variables (with > 99% zeros in X matrix), glmnet takes less than two minutes to fit the entire regularization path on a grid of 100 values of the reg. parameter lambda. For a 14-class gene expression dataset (144 obs, 16K vars, not sparse), it takes 15 seconds to fit the path at 100 values of lambda Trevor Hastie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Austria, September, 2009: Statistical Learning and Data Mining Course
Short course: Statistical Learning and Data Mining III: Ten Hot Ideas for Learning from Data Trevor Hastie and Robert Tibshirani, Stanford University Danube University Krems, Austria 25-26 September 2009 This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips. In this course we emphasize the tools useful for tackling modern-day data analysis problems. From the vast array of tools available, we have selected what we consider are the most relevant and exciting. Our top-ten list of topics are: * Regression and Logistic Regression (two golden oldies), * Lasso and Related Methods, * Support Vector and Kernel Methodology, * Principal Components (SVD) and Variations: sparse SVD, supervised PCA, Multidimensional Scaling and Isomap, Nonnegative Matrix Factorization, and Local Linear Embedding, * Boosting, Random Forests and Ensemble Methods, * Rule based methods (PRIM), * Graphical Models, * Cross-Validation, * Bootstrap, * Feature Selection, False Discovery Rates and Permutation Tests. The material is based on recent papers by ourselves and other researchers, as well as the new second edition of our book: Elements of Statistical Learning: data mining, inference and prediction Hastie, Tibshirani & Friedman, Springer-Verlag, 2009 (second edition) http://www-stat.stanford.edu/ElemStatLearn/ A copy of this book will be given to all attendees. The lectures will consist of video-projected presentations and discussion. Visit http://www-stat.stanford.edu/~hastie/SLDM/Austria.htm for more information and registration instructions. --- Trevor Hastie has...@stanford.edu Professor & Chair, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Austria, September, 2009: Statistical Learning and Data Mining Course
Short course: Statistical Learning and Data Mining III: Ten Hot Ideas for Learning from Data Trevor Hastie and Robert Tibshirani, Stanford University Danube University Krems, Austria 25-26 September 2009 This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips. In this course we emphasize the tools useful for tackling modern-day data analysis problems. From the vast array of tools available, we have selected what we consider are the most relevant and exciting. Our top-ten list of topics are: * Regression and Logistic Regression (two golden oldies), * Lasso and Related Methods, * Support Vector and Kernel Methodology, * Principal Components (SVD) and Variations: sparse SVD, supervised PCA, Multidimensional Scaling and Isomap, Nonnegative Matrix Factorization, and Local Linear Embedding, * Boosting, Random Forests and Ensemble Methods, * Rule based methods (PRIM), * Graphical Models, * Cross-Validation, * Bootstrap, * Feature Selection, False Discovery Rates and Permutation Tests. The material is based on recent papers by ourselves and other researchers, as well as the new second edition of our book: Elements of Statistical Learning: data mining, inference and prediction Hastie, Tibshirani & Friedman, Springer-Verlag, 2009 (second edition) http://www-stat.stanford.edu/ElemStatLearn/ A copy of this book will be given to all attendees. The lectures will consist of video-projected presentations and discussion. This European edition of our course is organized by Prof. Michael G. Schimek , who has been teaching in this field for about 10 years at various universities in Europe. Visit http://www-stat.stanford.edu/~hastie/SLDM/Austria.htm for more information and registration instructions. -- ---- Trevor Hastie has...@stanford.edu Professor & Chair, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] new version of glmnet
glmnet _1.1-3 is on CRAN now. glmnet fits lasso and elastic net regularization paths for squared error, binomial and multinomial models via coordinate descent. It is extremely fast and can work on large scale problems. See the paper: "Regularized Paths for Generalized Linear Models via Coordinate Descent" by Friedman, Hastie, Tibshirani on my website for details. Glmnet can accommodate sparse data matrices efficiently, and thereby handle even larger problems. For example for a two class logistic model with 11K obs and 750K variables (with > 99% zeros in X matrix), glmnet takes less than two minutes to fit the entire regularization path on a grid of 100 values of the reg. parameter lambda. For a 14-class gene expression dataset (144 obs, 16K vars, not sparse), it takes 15 seconds to fit the path at 100 values of lambda Several minor fixes, as well as two more serious fixes: 1) predict( ...,type="class") was returning flipped labels for a two class logistic model. 2) if a weight argument was supplied to binomial/multinomial model, with some zero weight entries, the program bombed with an unhelpful message. Now it works as expected. Thanks to many users, esp. Tim Hesterberg, for notifying us of the errors. Trevor Hastie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] New Statistical Learning and Data Mining Course
Short course: Statistical Learning and Data Mining III: Ten Hot Ideas for Learning from Data Trevor Hastie and Robert Tibshirani, Stanford University Sheraton Hotel Palo Alto, CA March 16-17, 2009 This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips. In this course we emphasize the tools useful for tackling modern-day data analysis problems. From the vast array of tools available, we have selected what we consider are the most relevant and exciting. Our top-ten list of topics are: * Regression and Logistic Regression (two golden oldies), * Lasso and Related Methods, * Support Vector and Kernel Methodology, * Principal Components (SVD) and Variations: sparse SVD, supervised PCA, Multidimensional Scaling and Isomap, Nonnegative Matrix Factorization, and Local Linear Embedding, * Boosting, Random Forests and Ensemble Methods, * Rule based methods (PRIM), * Graphical Models, * Cross-Validation, * Bootstrap, * Feature Selection, False Discovery Rates and Permutation Tests. Our earlier courses are not a prerequisite for this new course. Although there is some overlap with past courses, our new course contains many topics not covered by us before. The material is based on recent papers by the authors and other researchers, as well as the new second edition of our best selling book: Statistical Learning: data mining, inference and prediction Hastie, Tibshirani & Friedman, Springer-Verlag, 2008 http://www-stat.stanford.edu/ElemStatLearn/ A copy of this book will be given to all attendees. ### The lectures will consist of video-projected presentations and discussion. Go to the site http://www-stat.stanford.edu/~hastie/sldm.html for more information and online registration. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New glmnet package on CRAN
glmnet is a package that fits the regularization path for linear, two- and multi-class logistic regression models with "elastic net" regularization (tunable mixture of L1 and L2 penalties). glmnet uses pathwise coordinate descent, and is very fast. Some of the features of glmnet: * by default it computes the path at 100 uniformly spaced (on the log scale) values of the regularization parameter * glmnet appears to be faster than any of the packages that are freely available, in some cases by two orders of magnitude. * recognizes and exploits sparse input matrices (ala Matrix package). Coefficient matrices are output in sparse matrix representation. * penalty is (1-a)*||\beta||_2^2 +a*||beta||_1 where a is between 0 and 1; a=0 is the Lasso penalty, a=1 is the ridge penalty. For many correlated predictors, a=.95 or thereabouts improves the performance of the lasso. * convenient predict, plot, print, and coef methods * variable-wise penalty modulation allows each variable to be penalized by a scalable amount; if zero that variable always enters * glmnet uses a symmetric parametrization for multinomial, with constraints enforced by the penalization. Other families such as poisson might appear in later versions of glmnet. Examples of glmnet speed trials: Newsgroup data: N=11,000, p=4 Million, two class logistic. 100 values along lasso path. Time = 2mins 14 Class cancer data: N=144, p=16K, 14 class multinomial, 100 values along lasso path. Time = 30secs Authors: Jerome Friedman, Trevor Hastie, Rob Tibshirani. See our paper http://www-stat.stanford.edu/~hastie/Papers/glmnet.pdf for implementation details, and comparisons with other related software. -- ---- Trevor Hastie [EMAIL PROTECTED] Professor & Chair, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correction: Short Course: Statistical Learning and Data Mining
Apologies, my last email announcing this course had the wrong dates. Here is the corrected header: Short course: Statistical Learning and Data Mining II: tools for tall and wide data Trevor Hastie and Robert Tibshirani, Stanford University Sheraton Hotel, Palo Alto, California, March 6-7, 2008 -- Trevor Hastie [EMAIL PROTECTED] Professor & Chair, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Short Course: Statistical Learning and Data Mining
Short course: Statistical Learning and Data Mining II: tools for tall and wide data Trevor Hastie and Robert Tibshirani, Stanford University Sheraton Hotel, Palo Alto, California, April 3-4, 2006. This two-day course gives a detailed overview of statistical models for data mining, inference and prediction. With the rapid developments in internet technology, genomics, financial risk modeling, and other high-tech industries, we rely increasingly more on data analysis and statistical models to exploit the vast amounts of data at our fingertips. This course is the third in a series, and follows our popular past offerings "Modern Regression and Classification", and "Statistical Learning and Data Mining". The two earlier courses are not a prerequisite for this new course. In this course we emphasize the tools useful for tackling modern-day data analysis problems. We focus on both "tall" data ( N>p where N=#cases, p=#features) and "wide" data (p>N). The tools include gradient boosting, SVMs and kernel methods, random forests, lasso and LARS, ridge regression and GAMs, supervised principal components, and cross-validation. We also present some interesting case studies in a variety of application areas. All our examples are developed using the S language, and most of the procedures we discuss are implemented in publicly available R packages. Please visit the site http://www-stat.stanford.edu/~hastie/sldm.html for more information and registration details. -- -------- Trevor Hastie [EMAIL PROTECTED] Professor & Chair, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.