Dear Apache Community, I am looking to perform a linear regression on a rather large amount of data in my hadoop cluster. It is part of my master's thesis at harvard university.
After perusing the docs on the Mahout site, it seems like the following algorithms havent been implemented yet- Locally-Weighted Linear Regression Linear Regression Logistic Regression Basically, there is a stock market phenomenon which I'm trying to predict. It is called a short squeeze. I have about 16,000 data points - stocks and a point in time where the phenomenon has occurred. I'm trying to develop a predictive model in a hadoop cluster. The accuracy of the model doesn't matter much at this point, the goal and what would make my prof happy is to see the cluster grinding away, doing some relevant but perhaps not totally correct mathematical operations. Read: If its a linear regression i'll be happy, but if it isn't possible yet I dont mind. Can anyone suggest something I can use? I've downloaded Mahout 0.2 and searched through it, but nothing for performing regressions has jumped out at me. Thank you. Best, Rajat
