On Wed, Nov 09, 2011 at 10:05:53AM -0500, josef.p...@gmail.com wrote: > graph_lasso(X,....) takes the data array as an argument, but except > calculating the empirical_covariance at the beginning X is not used > anymore, as far as I could see.
> The algorithm looks very interesting, but I would have cases where I > need to calculate the empirical_covariance myself (e.g. long run > covariance which is a weighted average of covariance and covariance > with lags). > Would it be possible to use an empirical covariance instead of X as > the main argument, or would you get design inconsistencies? That's a very good remark, and there are other situations in it arises. Indeed, the empirical covariance matrix is a sufficient statistic for the population covariance matrix in the case of Gaussian models, so there are many models in which the situation arises, for instance the oracle approximate shrinkage. On the other hand, some models don't rely on the Gaussian assumption. Therefore, they use the full X data, and not just the empirical covariance. For instance the Ledoit-Wolf estimator. My gut feeling is that the estimator object should really take X by default, but I don't see why the function itself could not take a covariance matrix as an input. Of course, people can misuse it, and put in a shrunk covariance matrix (my guess it that they will), and we just have to accept it. Actually, I would almost favor an optional argument to the estimator so that it can take a covariance matrix as an input. This would be similar to the behavior of the kernel PCA with kernel='precomputed'. I used to have a 'data_is_cov' boolean keyword argument in my codebase. I could turn it into a 'X_is_cov' one. There are situations in which I would be interested in using the estimator object and, like you, I cannot afford carrying around the full time series. This can be useful for instance to use the cross-validated estimator, which carries a fair amount of logic to do the parameter search, or to compare different estimators. This sort of breaks the cross validation in the scikit, but not completely, as tricks can be used passing in lists of empirical covariances. What do people think? Should I: 1. change graph_lasso to take the empirical covariance as an input 2. add an 'X_is_cov' parameter to the estimators Gael PS: As noted by Joseph: cov_init doesn't answer this usecase. ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general