[jira] [Commented] (SPARK-8547) xgboost exploration

2015-11-02 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986288#comment-14986288 ] Meihua Wu commented on SPARK-8547: -- I have created a Spark Package to implement the XGBoost algorithm

Re: Spark Implementation of XGBoost

2015-10-27 Thread Meihua Wu
ore than > shrinkage). > > Thanks. > > Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Mon, Oct 26, 2015 at 8:37 PM, Meihua Wu <rotationsymmetr...@gmail.com> > wrote: >> Hi DB

Re: Spark Implementation of XGBoost

2015-10-27 Thread Meihua Wu
ore than > shrinkage). > > Thanks. > > Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Mon, Oct 26, 2015 at 8:37 PM, Meihua Wu <rotationsymmetr...@gmail.com> > wrote: >> Hi DB

Re: Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
>>> >>> DB Tsai >>> ------ >>> Web: https://www.dbtsai.com >>> PGP Key ID: 0xAF08DF8D >>> >>> >>> On Mon, Oct 26, 2015 at 11:42 AM, Meihua Wu >>> <rotationsymmetr...@gma

Re: Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
>>> >>> DB Tsai >>> ------ >>> Web: https://www.dbtsai.com >>> PGP Key ID: 0xAF08DF8D >>> >>> >>> On Mon, Oct 26, 2015 at 11:42 AM, Meihua Wu >>> <rotationsymmetr...@gma

Re: Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
gt; Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Mon, Oct 26, 2015 at 11:42 AM, Meihua Wu > <rotationsymmetr...@gmail.com> wrote: >> Hi Spark User/Dev, >

Re: Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
gt; Sincerely, > > DB Tsai > -- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Mon, Oct 26, 2015 at 11:42 AM, Meihua Wu > <rotationsymmetr...@gmail.com> wrote: >> Hi Spark User/Dev, >

Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
Hi Spark User/Dev, Inspired by the success of XGBoost, I have created a Spark package for gradient boosting tree with 2nd order approximation of arbitrary user-defined loss functions. https://github.com/rotationsymmetry/SparkXGBoost Currently linear (normal) regression, binary classification,

Spark Implementation of XGBoost

2015-10-26 Thread Meihua Wu
Hi Spark User/Dev, Inspired by the success of XGBoost, I have created a Spark package for gradient boosting tree with 2nd order approximation of arbitrary user-defined loss functions. https://github.com/rotationsymmetry/SparkXGBoost Currently linear (normal) regression, binary classification,

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Meihua Wu
please add "setFitIntercept(false)" to your LinearRegression. LinearRegression by default includes an intercept in the model, e.g. label = intercept + features dot weight To get the result you want, you need to force the intercept to be zero. Just curious, are you trying to solve systems of

Flaky Jenkins tests?

2015-10-12 Thread Meihua Wu
Hi Spark Devs, I recently encountered several cases that the Jenkin failed tests that are supposed to be unrelated to my patch. For example, I made a patch to Spark ML Scala API but some Scala RDD tests failed due to timeout, or the java_gateway in PySpark fails. Just wondering if these are

Re: Flaky Jenkins tests?

2015-10-12 Thread Meihua Wu
on, Oct 12, 2015 at 1:36 PM, Ted Yu <yuzhih...@gmail.com> wrote: > You can go to: > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN > > and see if the test failure(s) you encountered appeared there. > > FYI > > On Mon, Oct 12, 2015 at 1:24 PM,

[jira] [Commented] (SPARK-7129) Add generic boosting algorithm to spark.ml

2015-10-05 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942971#comment-14942971 ] Meihua Wu commented on SPARK-7129: -- Currently I am not aware of a straightforward way to impose the weak

[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2015-10-04 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942754#comment-14942754 ] Meihua Wu commented on SPARK-9478: -- [~pcrenshaw] Are you working on this? If not, I can send a PR based

[jira] [Commented] (SPARK-7129) Add generic boosting algorithm to spark.ml

2015-09-26 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909512#comment-14909512 ] Meihua Wu commented on SPARK-7129: -- [~sethah] Thank you for your comments! I have updated the design doc

[jira] [Commented] (SPARK-7129) Add generic boosting algorithm to spark.ml

2015-09-25 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908395#comment-14908395 ] Meihua Wu commented on SPARK-7129: -- [~josephkb] [~sethah] I have compile a doc for AdaBoost. https

[jira] [Commented] (SPARK-7129) Add generic boosting algorithm to spark.ml

2015-09-19 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877275#comment-14877275 ] Meihua Wu commented on SPARK-7129: -- [~josephkb] As weighting has been added to logistic regression

[jira] [Comment Edited] (SPARK-10706) Add java wrapper for random vector rdd

2015-09-19 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877220#comment-14877220 ] Meihua Wu edited comment on SPARK-10706 at 9/19/15 6:48 PM: [~mengxr] I

[jira] [Commented] (SPARK-10706) Add java wrapper for random vector rdd

2015-09-19 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877220#comment-14877220 ] Meihua Wu commented on SPARK-10706: --- I will work on this. > Add java wrapper for random vector

[jira] [Commented] (SPARK-9834) Normal equation solver and summary statistics for ordinary least squares

2015-09-02 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727568#comment-14727568 ] Meihua Wu commented on SPARK-9834: -- I would like to work on this. > Normal equation solver and summ

[jira] [Commented] (SPARK-9834) Normal equation solver and summary statistics for ordinary least squares

2015-09-02 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728065#comment-14728065 ] Meihua Wu commented on SPARK-9834: -- [~mengxr] Sure. Just let me know if there is anything I can help

[jira] [Commented] (SPARK-8518) Log-linear models for survival analysis

2015-09-01 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725935#comment-14725935 ] Meihua Wu commented on SPARK-8518: -- For the reference implementations, recommend we consider this R

[jira] [Commented] (SPARK-9642) LinearRegression should supported weighted data

2015-08-30 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721801#comment-14721801 ] Meihua Wu commented on SPARK-9642: -- [~sethah] Thank you for your help. I worked

[jira] [Commented] (SPARK-8518) Log-linear models for survival analysis

2015-08-19 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702676#comment-14702676 ] Meihua Wu commented on SPARK-8518: -- [~mengxr] [~yanbo] Either way works for me. In R

[jira] [Commented] (SPARK-9245) DistributedLDAModel predict top topic per doc-term instance

2015-08-18 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702155#comment-14702155 ] Meihua Wu commented on SPARK-9245: -- [~josephkb] Thank you for clarifying my question. I

[jira] [Commented] (SPARK-8518) Log-linear models for survival analysis

2015-08-17 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700302#comment-14700302 ] Meihua Wu commented on SPARK-8518: -- [~yanbo] Thank you very much for the update

[jira] [Commented] (SPARK-8520) Improve GLM's scalability on number of features

2015-08-17 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700357#comment-14700357 ] Meihua Wu commented on SPARK-8520: -- For 1, how about migrate to treeReduce

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Meihua Wu
I think in the SGD algorithm, the mini batch sample is done without replacement. So with fraction=1, then all the rows will be sampled exactly once to form the miniBatch, resulting to the deterministic/classical case. On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang fli...@databricks.com wrote:

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Meihua Wu
at 11:16 AM, Meihua Wu rotationsymmetr...@gmail.com wrote: I think in the SGD algorithm, the mini batch sample is done without replacement. So with fraction=1, then all the rows will be sampled exactly once to form the miniBatch, resulting to the deterministic/classical case. On Fri, Aug 7

[jira] [Created] (SPARK-9642) LinearRegression should supported weighted data

2015-08-05 Thread Meihua Wu (JIRA)
Meihua Wu created SPARK-9642: Summary: LinearRegression should supported weighted data Key: SPARK-9642 URL: https://issues.apache.org/jira/browse/SPARK-9642 Project: Spark Issue Type: New

[jira] [Updated] (SPARK-9642) LinearRegression should supported weighted data

2015-08-05 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meihua Wu updated SPARK-9642: - Description: In many modeling application, data points are not necessarily sampled with equal

How to help for 1.5 release?

2015-08-04 Thread Meihua Wu
I think the team is preparing for the 1.5 release. Anything to help with the QA, testing etc? Thanks, MW

Re: Does RDD.cartesian involve shuffling?

2015-08-04 Thread Meihua Wu
is relative small to fit)? On Tue, Aug 4, 2015 at 8:23 AM, Richard Marscher rmarsc...@localytics.com wrote: Yes it does, in fact it's probably going to be one of the more expensive shuffles you could trigger. On Mon, Aug 3, 2015 at 12:56 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: Does

Does RDD.cartesian involve shuffling?

2015-08-03 Thread Meihua Wu
Does RDD.cartesian involve shuffling? Thanks! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

[jira] [Commented] (SPARK-9245) DistributedLDAModel predict top topic per doc-term instance

2015-08-02 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651319#comment-14651319 ] Meihua Wu commented on SPARK-9245: -- [~josephkb]: would like to confirm: (using notation

[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meihua Wu updated SPARK-9530: - Summary: ScalaDoc should not indicate LDAModel.describeTopics

[jira] [Created] (SPARK-9530) ScalaDoc should not indicate LDAModel.descripeTopic and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Meihua Wu (JIRA)
Meihua Wu created SPARK-9530: Summary: ScalaDoc should not indicate LDAModel.descripeTopic and DistributedLDAModel.topDocumentsPerTopic as approximate. Key: SPARK-9530 URL: https://issues.apache.org/jira/browse/SPARK

[jira] [Updated] (SPARK-9530) ScalaDoc should not indicate LDAModel.describeTopics and DistributedLDAModel.topDocumentsPerTopic as approximate.

2015-08-01 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meihua Wu updated SPARK-9530: - Description: Currently the ScalaDoc for LDAModel.describeTopics

[jira] [Commented] (SPARK-9246) DistributedLDAModel predict top docs per topic

2015-07-29 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646312#comment-14646312 ] Meihua Wu commented on SPARK-9246: -- Cool. I see. will keep updating about the progress

[jira] [Commented] (SPARK-9246) DistributedLDAModel predict top docs per topic

2015-07-29 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646570#comment-14646570 ] Meihua Wu commented on SPARK-9246: -- Got it. Thanks! DistributedLDAModel predict top

[jira] [Commented] (SPARK-9246) DistributedLDAModel predict top docs per topic

2015-07-29 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646571#comment-14646571 ] Meihua Wu commented on SPARK-9246: -- Got it. Thanks! DistributedLDAModel predict top

[jira] [Issue Comment Deleted] (SPARK-9246) DistributedLDAModel predict top docs per topic

2015-07-29 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meihua Wu updated SPARK-9246: - Comment: was deleted (was: Got it. Thanks!) DistributedLDAModel predict top docs per topic

[jira] [Issue Comment Deleted] (SPARK-9246) DistributedLDAModel predict top docs per topic

2015-07-29 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meihua Wu updated SPARK-9246: - Comment: was deleted (was: Got it. Thanks!) DistributedLDAModel predict top docs per topic

[jira] [Commented] (SPARK-9246) DistributedLDAModel predict top docs per topic

2015-07-29 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647096#comment-14647096 ] Meihua Wu commented on SPARK-9246: -- I have submitted a PR including the code, ScalaDoc

[jira] [Commented] (SPARK-9246) DistributedLDAModel predict top docs per topic

2015-07-28 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645201#comment-14645201 ] Meihua Wu commented on SPARK-9246: -- [~josephkb] I would like to work

Re: Rebase and Squash Commits to Revise PR?

2015-07-28 Thread Meihua Wu
think it would help clean up your intent, but, often it's clearer to leave the review and commit history of your branch since the review comments go along with it. On Tue, Jul 28, 2015 at 9:46 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: I am planning to update my PR to incorporate

Rebase and Squash Commits to Revise PR?

2015-07-28 Thread Meihua Wu
I am planning to update my PR to incorporate comments from reviewers. Do I need to rebase/squash the commits into a single one? Thanks! -MW - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands,

[jira] [Commented] (SPARK-9225) LDASuite needs unit tests for empty documents

2015-07-23 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638992#comment-14638992 ] Meihua Wu commented on SPARK-9225: -- working on this. LDASuite needs unit tests

[jira] [Commented] (SPARK-8518) Log-linear models for survival analysis

2015-07-21 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634621#comment-14634621 ] Meihua Wu commented on SPARK-8518: -- [~mengxr] [~yanboliang] Sounds like to plan. We would

[jira] [Commented] (SPARK-8518) Log-linear models for survival analysis

2015-07-19 Thread Meihua Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632965#comment-14632965 ] Meihua Wu commented on SPARK-8518: -- Hi [~mengxr] and [~yanboliang], For the log-linear

[jira] [Created] (SPARK-9175) BLAS.gemm fails to update matrix C when alpha==0 and beta!=1

2015-07-18 Thread Meihua Wu (JIRA)
Meihua Wu created SPARK-9175: Summary: BLAS.gemm fails to update matrix C when alpha==0 and beta!=1 Key: SPARK-9175 URL: https://issues.apache.org/jira/browse/SPARK-9175 Project: Spark Issue