[R] Training and testing on Unbalanced Data Set

2014-07-04 Thread Vijay goel
I used SMOTE algorithm in R for class balancing. My data size has
13000 rows, I had 7% minority class in my sample now I used SMOTE(
Synthetic Minority Oversampling Technique) for class balancing such
that I raised the ration of minority class to 42 % and number of rows
in data sample becomes 12655, Now I need to fit a logistic regression
on my data set for that I need to divide the sample for cross
validation and testing. I tried two approach :

a.) train my data on sample obtained after SMOTE and tested on the
original sample having 13000 rows.

b.) divide the sample obtained after SMOTE into train and test and do
the fitting and testing on this data set only

In first approach my results might get skewed so which approach should
I take and Why ?
-- 
Vijay Goel
*+91-7501378852*

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fisher Scoring v/s Coordinate Descent for MLE in R

2014-07-03 Thread Vijay goel
R base function glm() uses Fishers Scoring for MLE, while the glmnet uses the
coordinate descent method to solve the same equation ? Coordinate descent is
more time efficient than Fisher Scoring as fisher scoring calculates the
second order derivative matrix and some other matrix operation which makes
it space and time expensive, while coordinate descent can do the same task
in O(np) time.

Why R base function uses Fisher Scoring or this method has advantage over
other optimization methods? What will be comparison between coordinate
descent and Fisher Scoring ? I am relatively new to do this field so any
help or resource will be helpful

Regards
Vij

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.