> > Dear Scikit-Learn team, > > Feature engineering is a big task ahead of building machine learning > models. It involves imputation of missing values, encoding of categorical > variables, discretisation, variable transformation etc. > > Sklearn includes some functionality for feature engineering, which is > useful, but it has a few limitations: > > 1) it does not allow for feature specification - it will do the same > process on all variables, for example SimpleImputer. Typically, we want > to impute different columns with different values. > 2) It does not capture information from the training set, this is it does > not learn, therefore, it is not able to perpetuate the values learnt from > the train set, to unseen data. > > The 2 limitations above apply to all the feature transformers in sklearn, > I believe. > > Therefore, if these transformers are used as part of a pipeline, we could > end up doing different transformations to train and test, depending on the > characteristics of the datasets. For business purposes, this is not a > desired option. > > I think that building transformers that learn from the train set would be > of much use for the community. > > To this end, I built a python package called feature engine > <https://pypi.org/project/feature-engine/> which expands the sklearn-api > with additional feature engineering techniques, and the functionality that > allows the transformer to learn from data and store the parameters learnt. > > The techniques included have been used worldwide, both in business and in > data competitions, and reported in kdd reports and other articles. I also > cover them in an udemy course > <https://www.udemy.com/feature-engineering-for-machine-learning> which > has enrolled several thousand students. > > The package capitalises on the use of pandas to capture the features, but > I am confident that the columns names could be captured and the df > transformed to a numpy array to comply with sklearn requirements. > > I wondered whether it would be of interest to include the functionality of > this package within sklearn? > If you would consider extending the sklearn api to include these > transformers, I would be happy to help. > > Alternatively, would you consider to add the package to your website? > where you mention the libaries that extend sklearn functionality? > > All feedback is welcome. > > Many thanks and I look forward to hearing from you > > Thank you so much fur such an awesome contribution through the sklearn api > > Kind regards > > Sole > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn