Build-A-Rank Workshop? Rank-Pipe?
Omg-look-at-that-great-search-result-page-pipeline? (OLAT-GrSERPP) On Wed, Apr 5, 2017 at 12:33 PM, Trey Jones <[email protected]> wrote: > You can't call it Bob for historical reasons > <https://en.wikipedia.org/wiki/Microsoft_Bob>! I don't > think cirrussearch-ltr is too bad. (Though "LTR" always makes me think > we're neglecting RTL languages somehow.) > > Trey Jones > Software Engineer, Discovery > Wikimedia Foundation > > On Wed, Apr 5, 2017 at 3:28 PM, Erik Bernhardson < > [email protected]> wrote: > >> We seem to have some consensus that for the upcoming learning to rank >> work we will build out a python library to handle the bulk of the backend >> data plumbing work. The library will primarily be code integrating with >> pyspark to do various pieces such as: >> >> # Sampling from the click logs to generate the set of queries + page's >> that will be labeled with click models >> # Distributing the work of running click models against those sampled >> data sets >> # Pushing queries we use for feature generation into kafka, and reading >> back the resulting feature vectors (the other end of this will run those >> generated queries against either the hot-spare elasticsearch cluster or the >> relforge cluster to get feature scores) >> # Merging feature vectors with labeled data, splitting into >> test/train/validate sets, and writing out files formatted for whichever >> training library we decide on (xgboost, lightgbm and ranklib are in the >> running currently) >> # Whatever plumbing is necessary to run the actual model training and do >> hyper parameter optimization >> # Converting the resulting models into a format suitable for use with the >> elasticsearch learn to rank plugin >> # Reporting on the quality of models vs some baseline >> >> The high level goal is that we would have relatively simple python >> scripts in our analytics repository that are called from oozie, those >> scripts would know the appropriate locations to load/store data and pass >> into this library for the bulk of the processing. There will also be some >> script, probably within the library, that combines many of these steps for >> feature engineering purposes to take some set of features and run the whole >> thing. >> >> So, what do we call this thing? Horrible first attempts: >> >> * ltr-pipeline >> * learn-to-rank-pipeline >> * bob >> * cirrussearch-ltr >> * ??? >> >> >> _______________________________________________ >> discovery mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/discovery >> >> > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
