On Wed, Apr 5, 2017 at 12:55 PM, Aaron Halfaker <[email protected]> wrote:
> Link to code? > > No code yet, although there is proof of concept code which this will inform this work at stat1002.eqiad.wmnet:/a/ebernhardson/spark_feature_log/code > "ltr" means "left to right" to me. Maybe you could do something like > "ltrank" > > Sounds like LTR is out as the term is already used elsewhere and is more widely known. LTRank isn't a bad compromise with spelling out the whole thing. > On Wed, Apr 5, 2017 at 2:28 PM, Erik Bernhardson < > [email protected]> wrote: > >> We seem to have some consensus that for the upcoming learning to rank >> work we will build out a python library to handle the bulk of the backend >> data plumbing work. The library will primarily be code integrating with >> pyspark to do various pieces such as: >> >> # Sampling from the click logs to generate the set of queries + page's >> that will be labeled with click models >> # Distributing the work of running click models against those sampled >> data sets >> # Pushing queries we use for feature generation into kafka, and reading >> back the resulting feature vectors (the other end of this will run those >> generated queries against either the hot-spare elasticsearch cluster or the >> relforge cluster to get feature scores) >> # Merging feature vectors with labeled data, splitting into >> test/train/validate sets, and writing out files formatted for whichever >> training library we decide on (xgboost, lightgbm and ranklib are in the >> running currently) >> # Whatever plumbing is necessary to run the actual model training and do >> hyper parameter optimization >> # Converting the resulting models into a format suitable for use with the >> elasticsearch learn to rank plugin >> # Reporting on the quality of models vs some baseline >> >> The high level goal is that we would have relatively simple python >> scripts in our analytics repository that are called from oozie, those >> scripts would know the appropriate locations to load/store data and pass >> into this library for the bulk of the processing. There will also be some >> script, probably within the library, that combines many of these steps for >> feature engineering purposes to take some set of features and run the whole >> thing. >> >> So, what do we call this thing? Horrible first attempts: >> >> * ltr-pipeline >> * learn-to-rank-pipeline >> * bob >> * cirrussearch-ltr >> * ??? >> >> >> _______________________________________________ >> AI mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/ai >> >> > > _______________________________________________ > AI mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/ai > >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
