[BangPypers] Question on making hive python UDF object persistent

Pramod R Wed, 23 Jan 2019 09:08:15 -0800

Hi All

I'm trying to code a Hive UDF in python, which loads a pickle object
(basically a set of linear model weights). These weights that are read from
the pickle, are used to score a set of observations from a hive table. Once
I have computed the scores, I would also want to update the weights, based
on the truth value that I receive from the same Hive table, so that the
next observation is scored on the updated weights.


Something like this:

Python UDF code:

import pickle

import sys

import numpy as np

betas = pickle.load(open('B.pkl','rb'))

for line in sys.stdin:

    data = line.strip().split('\t')

    X = np.array(data[:-1])

    y = np.array(data[-1])

    ycap = sigmoid(np.dot(betas,X))

    new_beta = np.dot(np.dot(np.linalg.inv(np.dot(X.T,X)),X.T),y)

I did read about making a python object in hive udf persistent across all
the cores (stateful udtf). Can anyone help me with a sample code?

Thanks in advance!

Pramod
_______________________________________________
BangPypers mailing list
BangPypers@python.org
https://mail.python.org/mailman/listinfo/bangpypers

[BangPypers] Question on making hive python UDF object persistent

Reply via email to