Hi All I'm trying to code a Hive UDF in python, which loads a pickle object (basically a set of linear model weights). These weights that are read from the pickle, are used to score a set of observations from a hive table. Once I have computed the scores, I would also want to update the weights, based on the truth value that I receive from the same Hive table, so that the next observation is scored on the updated weights.
Something like this: Python UDF code: import pickle import sys import numpy as np betas = pickle.load(open('B.pkl','rb')) for line in sys.stdin: data = line.strip().split('\t') X = np.array(data[:-1]) y = np.array(data[-1]) ycap = sigmoid(np.dot(betas,X)) new_beta = np.dot(np.dot(np.linalg.inv(np.dot(X.T,X)),X.T),y) I did read about making a python object in hive udf persistent across all the cores (stateful udtf). Can anyone help me with a sample code? Thanks in advance! Pramod _______________________________________________ BangPypers mailing list BangPypers@python.org https://mail.python.org/mailman/listinfo/bangpypers