Does RDD.cartesian involve shuffling?

2015-08-03 Thread Meihua Wu
Does RDD.cartesian involve shuffling? Thanks! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Does RDD.cartesian involve shuffling?

2015-08-04 Thread Meihua Wu
is relative small to fit)? On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher rmarsc...@localytics.com wrote: Yes it does, in fact it's probably going to be one of the more expensive shuffles you could trigger. On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: Does

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Meihua Wu
I think in the SGD algorithm, the mini batch sample is done without replacement. So with fraction=1, then all the rows will be sampled exactly once to form the miniBatch, resulting to the deterministic/classical case. On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang fli...@databricks.com wrote:

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Meihua Wu
at 11:16 AM, Meihua Wu rotationsymmetr...@gmail.com wrote: I think in the SGD algorithm, the mini batch sample is done without replacement. So with fraction=1, then all the rows will be sampled exactly once to form the miniBatch, resulting to the deterministic/classical case. On Fri, Aug 7

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Meihua Wu
please add "setFitIntercept(false)" to your LinearRegression. LinearRegression by default includes an intercept in the model, e.g. label = intercept + features dot weight To get the result you want, you need to force the intercept to be zero. Just curious, are you trying to solve systems of

Re: Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
>>> >>> DB Tsai >>> ------ >>> Web: https://www.dbtsai.com >>> PGP Key ID: 0xAF08DF8D >>> >>> >>> On Mon, Oct 26, 2015 at 11:42 AM, Meihua Wu >>> <rotationsymmetr...@gma

Re: Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
gt; Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Mon, Oct 26, 2015 at 11:42 AM, Meihua Wu > <rotationsymmetr...@gmail.com> wrote: >> Hi Spark User/Dev, >

Re: Spark Implementation of XGBoost

2015-10-27 Thread Meihua Wu
ore than > shrinkage). > > Thanks. > > Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Mon, Oct 26, 2015 at 8:37 PM, Meihua Wu <rotationsymmetr...@gmail.com> > wrote: >> Hi DB

Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
Hi Spark User/Dev, Inspired by the success of XGBoost, I have created a Spark package for gradient boosting tree with 2nd order approximation of arbitrary user-defined loss functions. https://github.com/rotationsymmetry/SparkXGBoost Currently linear (normal) regression, binary classification,