Hi,
I am running LogisticRegressionWithSGD in spark 1.4.1 and it always takes
100 iterations to train (which is the default). It never meets the
convergence criteria, shouldn't the convergence criteria for SGD be based on
difference in logloss or the difference in accuracy on a held out test set
What is the best way to reuse hive custom transform scripts written in python
or awk or c++ which process data from stdin and print to stdout in spark.
These scripts are typically using the Transform Syntax in Hive
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform
--
Hi,
The userId's and productId's in my data are bigInts, what is the best way to
run collaborative filtering on this data. Should I modify MLlib's
implementation to support more types? or is there an easy way.
Thanks!,
Nishanth
--
View this message in context: