Build-A-Rank Workshop?

Rank-Pipe?

Omg-look-at-that-great-search-result-page-pipeline? (OLAT-GrSERPP)


On Wed, Apr 5, 2017 at 12:33 PM, Trey Jones <[email protected]> wrote:

> You can't call it Bob for historical reasons
> <https://en.wikipedia.org/wiki/Microsoft_Bob>! I don't
> think cirrussearch-ltr is too bad. (Though "LTR" always makes me think
> we're neglecting RTL languages somehow.)
>
> Trey Jones
> Software Engineer, Discovery
> Wikimedia Foundation
>
> On Wed, Apr 5, 2017 at 3:28 PM, Erik Bernhardson <
> [email protected]> wrote:
>
>> We seem to have some consensus that for the upcoming learning to rank
>> work we will build out a python library to handle the bulk of the backend
>> data plumbing work. The library will primarily be code integrating with
>> pyspark to do various pieces such as:
>>
>> # Sampling from the click logs to generate the set of queries + page's
>> that will be labeled with click models
>> # Distributing the work of running click models against those sampled
>> data sets
>> # Pushing queries we use for feature generation into kafka, and reading
>> back the resulting feature vectors (the other end of this will run those
>> generated queries against either the hot-spare elasticsearch cluster or the
>> relforge cluster to get feature scores)
>> # Merging feature vectors with labeled data, splitting into
>> test/train/validate sets, and writing out files formatted for whichever
>> training library we decide on (xgboost, lightgbm and ranklib are in the
>> running currently)
>> # Whatever plumbing is necessary to run the actual model training and do
>> hyper parameter optimization
>> # Converting the resulting models into a format suitable for use with the
>> elasticsearch learn to rank plugin
>> # Reporting on the quality of models vs some baseline
>>
>> The high level goal is that we would have relatively simple python
>> scripts in our analytics repository that are called from oozie, those
>> scripts would know the appropriate locations to load/store data and pass
>> into this library for the bulk of the processing. There will also be some
>> script, probably within the library, that combines many of these steps for
>> feature engineering purposes to take some set of features and run the whole
>> thing.
>>
>> So, what do we call this thing? Horrible first attempts:
>>
>> * ltr-pipeline
>> * learn-to-rank-pipeline
>> * bob
>> * cirrussearch-ltr
>> * ???
>>
>>
>> _______________________________________________
>> discovery mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/discovery
>>
>>
>
> _______________________________________________
> discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to