How about Horace? I heard that it's a name that isn't being used much anymore by parents naming their kids. It could be:
Horace Learns to Rank joking, but not really :) -- deb tankersley irc: debt Product Manager, Discovery Wikimedia Foundation On Wed, Apr 5, 2017 at 2:53 PM, Chris Koerner <[email protected]> wrote: > SnakePipe - get it? Python and 'plumbing'? > > > Yours, > Chris Koerner > Community Liaison - Discovery > Wikimedia Foundation > > On Wed, Apr 5, 2017 at 2:28 PM, Erik Bernhardson < > [email protected]> wrote: > >> We seem to have some consensus that for the upcoming learning to rank >> work we will build out a python library to handle the bulk of the backend >> data plumbing work. The library will primarily be code integrating with >> pyspark to do various pieces such as: >> >> # Sampling from the click logs to generate the set of queries + page's >> that will be labeled with click models >> # Distributing the work of running click models against those sampled >> data sets >> # Pushing queries we use for feature generation into kafka, and reading >> back the resulting feature vectors (the other end of this will run those >> generated queries against either the hot-spare elasticsearch cluster or the >> relforge cluster to get feature scores) >> # Merging feature vectors with labeled data, splitting into >> test/train/validate sets, and writing out files formatted for whichever >> training library we decide on (xgboost, lightgbm and ranklib are in the >> running currently) >> # Whatever plumbing is necessary to run the actual model training and do >> hyper parameter optimization >> # Converting the resulting models into a format suitable for use with the >> elasticsearch learn to rank plugin >> # Reporting on the quality of models vs some baseline >> >> The high level goal is that we would have relatively simple python >> scripts in our analytics repository that are called from oozie, those >> scripts would know the appropriate locations to load/store data and pass >> into this library for the bulk of the processing. There will also be some >> script, probably within the library, that combines many of these steps for >> feature engineering purposes to take some set of features and run the whole >> thing. >> >> So, what do we call this thing? Horrible first attempts: >> >> * ltr-pipeline >> * learn-to-rank-pipeline >> * bob >> * cirrussearch-ltr >> * ??? >> >> >> _______________________________________________ >> discovery mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/discovery >> >> > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
