Stopping criteria for gradient descent

2015-09-17 Thread nishanthps
Hi, I am running LogisticRegressionWithSGD in spark 1.4.1 and it always takes 100 iterations to train (which is the default). It never meets the convergence criteria, shouldn't the convergence criteria for SGD be based on difference in logloss or the difference in accuracy on a held out test set

Hive Custom Transform Scripts (read from stdin and print to stdout) in Spark

2015-06-10 Thread nishanthps
What is the best way to reuse hive custom transform scripts written in python or awk or c++ which process data from stdin and print to stdout in spark. These scripts are typically using the Transform Syntax in Hive https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform --

How to use BigInteger for userId and productId in collaborative Filtering?

2015-01-09 Thread nishanthps
Hi, The userId's and productId's in my data are bigInts, what is the best way to run collaborative filtering on this data. Should I modify MLlib's implementation to support more types? or is there an easy way. Thanks!, Nishanth -- View this message in context: