How about Horace? I heard that it's a name that isn't being used much
anymore by parents naming their kids. It could be:

Horace Learns to Rank

joking, but not really :)

--
deb tankersley
irc: debt
Product Manager, Discovery
Wikimedia Foundation

On Wed, Apr 5, 2017 at 2:53 PM, Chris Koerner <[email protected]>
wrote:

> SnakePipe - get it? Python and 'plumbing'?
>
>
> Yours,
> Chris Koerner
> Community Liaison - Discovery
> Wikimedia Foundation
>
> On Wed, Apr 5, 2017 at 2:28 PM, Erik Bernhardson <
> [email protected]> wrote:
>
>> We seem to have some consensus that for the upcoming learning to rank
>> work we will build out a python library to handle the bulk of the backend
>> data plumbing work. The library will primarily be code integrating with
>> pyspark to do various pieces such as:
>>
>> # Sampling from the click logs to generate the set of queries + page's
>> that will be labeled with click models
>> # Distributing the work of running click models against those sampled
>> data sets
>> # Pushing queries we use for feature generation into kafka, and reading
>> back the resulting feature vectors (the other end of this will run those
>> generated queries against either the hot-spare elasticsearch cluster or the
>> relforge cluster to get feature scores)
>> # Merging feature vectors with labeled data, splitting into
>> test/train/validate sets, and writing out files formatted for whichever
>> training library we decide on (xgboost, lightgbm and ranklib are in the
>> running currently)
>> # Whatever plumbing is necessary to run the actual model training and do
>> hyper parameter optimization
>> # Converting the resulting models into a format suitable for use with the
>> elasticsearch learn to rank plugin
>> # Reporting on the quality of models vs some baseline
>>
>> The high level goal is that we would have relatively simple python
>> scripts in our analytics repository that are called from oozie, those
>> scripts would know the appropriate locations to load/store data and pass
>> into this library for the bulk of the processing. There will also be some
>> script, probably within the library, that combines many of these steps for
>> feature engineering purposes to take some set of features and run the whole
>> thing.
>>
>> So, what do we call this thing? Horrible first attempts:
>>
>> * ltr-pipeline
>> * learn-to-rank-pipeline
>> * bob
>> * cirrussearch-ltr
>> * ???
>>
>>
>> _______________________________________________
>> discovery mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/discovery
>>
>>
>
> _______________________________________________
> discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to