Re: [MLLib] Logistic Regression and standadization

2018-04-13 Thread Yanbo Liang
Hi Filipp, MLlib’s LR implementation did the same way as R’s glmnet for standardization. Actually you don’t need to care about the implementation detail, as the coefficients are always returned on the original scale, so it should be return the same result as other popular ML libraries. Could

Re: Live Stream Code Reviews :)

2018-04-13 Thread Holden Karau
Thank you :) Just a reminder this is going to start in under 20 minutes. If anyone has a PR they'd live reviewed please respond and I'll add it to the list (otherwise I'll go stick to the normal list of folks who have opted in to live reviews). On Thu, Apr 12, 2018 at 2:08 PM, Gourav Sengupta

Re: Sorting on a streaming dataframe

2018-04-13 Thread Hemant Bhanawat
Well, we want to assign snapshot ids (incrementing counters) to the incoming records. For that, we are zipping the streaming rdds with that counter using a modified version of ZippedWithIndexRDD. We are ok if the records in the streaming dataframe gets counters in random order but the counter

Re: Sorting on a streaming dataframe

2018-04-13 Thread Reynold Xin
Can you describe your use case more? On Thu, Apr 12, 2018 at 11:12 PM Hemant Bhanawat wrote: > Hi Guys, > > Why is sorting on streaming dataframes not supported(unless it is complete > mode)? My downstream needs me to sort the streaming dataframe. > > Hemant >

Sorting on a streaming dataframe

2018-04-13 Thread Hemant Bhanawat
Hi Guys, Why is sorting on streaming dataframes not supported(unless it is complete mode)? My downstream needs me to sort the streaming dataframe. Hemant