[jira] [Commented] (SPARK-27352) Apply for translation of the Chinese version, I hope to get authorization!

2019-04-07 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811835#comment-16811835 ] Teng Peng commented on SPARK-27352: --- I would say go ahead and send a PR to add the link to the doc.

[jira] [Commented] (SPARK-27352) Apply for translation of the Chinese version, I hope to get authorization!

2019-04-07 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811765#comment-16811765 ] Teng Peng commented on SPARK-27352: --- Correct me if I am wrong. I do not think any authorization are

[jira] [Created] (SPARK-24907) Migrate JDBC data source to DataSource API v2

2018-07-24 Thread Teng Peng (JIRA)
Teng Peng created SPARK-24907: - Summary: Migrate JDBC data source to DataSource API v2 Key: SPARK-24907 URL: https://issues.apache.org/jira/browse/SPARK-24907 Project: Spark Issue Type: New

[jira] [Updated] (SPARK-23507) Migrate file-based data sources to data source v2

2018-06-23 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-23507: -- Issue Type: Umbrella (was: Improvement) > Migrate file-based data sources to data source v2 >

[jira] [Issue Comment Deleted] (SPARK-18755) Add Randomized Grid Search to Spark ML

2018-06-23 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-18755: -- Comment: was deleted (was: [~yuhaoyan] Is this what you are looking for: after we build the grid, we

[jira] [Updated] (SPARK-22911) Migrate structured streaming sources to new DataSourceV2 APIs

2018-06-21 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-22911: -- Issue Type: Umbrella (was: Improvement) > Migrate structured streaming sources to new DataSourceV2

[jira] [Commented] (SPARK-24516) PySpark Bindings for K8S - make Python 3 the default

2018-06-11 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508945#comment-16508945 ] Teng Peng commented on SPARK-24516: --- +1. > PySpark Bindings for K8S - make Python 3 the default >

[jira] [Updated] (SPARK-21199) Its not possible to impute Vector types

2018-06-10 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-21199: -- Component/s: (was: Spark Core) ML > Its not possible to impute Vector types >

[jira] [Issue Comment Deleted] (SPARK-22943) OneHotEncoder supports manual specification of categorySizes

2018-06-10 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-22943: -- Comment: was deleted (was: This issue looks quiet interesting, but can you be more specific about

[jira] [Commented] (SPARK-24431) wrong areaUnderPR calculation in BinaryClassificationEvaluator

2018-06-06 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503217#comment-16503217 ] Teng Peng commented on SPARK-24431: --- [~Ben2018] The article makes sense to me. It seems the current

[jira] [Comment Edited] (SPARK-24431) wrong areaUnderPR calculation in BinaryClassificationEvaluator

2018-06-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499109#comment-16499109 ] Teng Peng edited comment on SPARK-24431 at 6/2/18 6:48 PM: --- I am trying to

[jira] [Comment Edited] (SPARK-24431) wrong areaUnderPR calculation in BinaryClassificationEvaluator

2018-06-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499109#comment-16499109 ] Teng Peng edited comment on SPARK-24431 at 6/2/18 6:47 PM: --- I am trying to

[jira] [Comment Edited] (SPARK-24431) wrong areaUnderPR calculation in BinaryClassificationEvaluator

2018-06-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499109#comment-16499109 ] Teng Peng edited comment on SPARK-24431 at 6/2/18 6:47 PM: --- I am trying to

[jira] [Comment Edited] (SPARK-24431) wrong areaUnderPR calculation in BinaryClassificationEvaluator

2018-06-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499109#comment-16499109 ] Teng Peng edited comment on SPARK-24431 at 6/2/18 6:46 PM: --- I am trying to

[jira] [Commented] (SPARK-24431) wrong areaUnderPR calculation in BinaryClassificationEvaluator

2018-06-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499109#comment-16499109 ] Teng Peng commented on SPARK-24431: --- I am trying to understand this description. What's your

[jira] [Issue Comment Deleted] (SPARK-24391) to_json/from_json should support arrays of primitives, and more generally all JSON

2018-05-26 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-24391: -- Comment: was deleted (was: My plan is to follow the Spark-19849 & Spark-21513 to support more

[jira] [Commented] (SPARK-24391) to_json/from_json should support arrays of primitives, and more generally all JSON

2018-05-26 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491745#comment-16491745 ] Teng Peng commented on SPARK-24391: --- My plan is to follow the Spark-19849 & Spark-21513 to support more

[jira] [Commented] (SPARK-24269) Infer nullability rather than declaring all columns as nullable

2018-05-20 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482156#comment-16482156 ] Teng Peng commented on SPARK-24269: --- Yes, I understand this rationale too. However, I am inclined to a

[jira] [Commented] (SPARK-24269) Infer nullability rather than declaring all columns as nullable

2018-05-19 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481812#comment-16481812 ] Teng Peng commented on SPARK-24269: --- Does it make sense to infer nullability from JSON and CSV?  >

[jira] [Commented] (SPARK-22943) OneHotEncoder supports manual specification of categorySizes

2018-05-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461006#comment-16461006 ] Teng Peng commented on SPARK-22943: --- This issue looks quiet interesting, but can you be more specific

[jira] [Commented] (SPARK-23180) RFormulaModel should have labels member

2018-05-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460967#comment-16460967 ] Teng Peng commented on SPARK-23180: --- Can you give me an example for 1. the current workaround 2. the

[jira] [Updated] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans

2018-04-28 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-23171: -- Issue Type: Improvement (was: Umbrella) > Reduce the time costs of the rule runs that do not change

[jira] [Updated] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans

2018-04-28 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-23171: -- Issue Type: Umbrella (was: Improvement) > Reduce the time costs of the rule runs that do not change

[jira] [Commented] (SPARK-24024) Fix deviance calculations in GLM to handle corner cases

2018-04-19 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443892#comment-16443892 ] Teng Peng commented on SPARK-24024: --- I will first reproduce the issue he has, check how does R handle

[jira] [Created] (SPARK-24024) Fix deviance calculations in GLM to handle corner cases

2018-04-19 Thread Teng Peng (JIRA)
Teng Peng created SPARK-24024: - Summary: Fix deviance calculations in GLM to handle corner cases Key: SPARK-24024 URL: https://issues.apache.org/jira/browse/SPARK-24024 Project: Spark Issue

[jira] [Commented] (SPARK-23740) Add FPGrowth Param for filtering out very common items

2018-03-25 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413199#comment-16413199 ] Teng Peng commented on SPARK-23740: --- I suppose `beforehand` means before itemsets been generated,

[jira] [Comment Edited] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2018-03-20 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407451#comment-16407451 ] Teng Peng edited comment on SPARK-19208 at 3/21/18 4:44 AM: [~timhunter] Has

[jira] [Commented] (SPARK-19208) MultivariateOnlineSummarizer performance optimization

2018-03-20 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407451#comment-16407451 ] Teng Peng commented on SPARK-19208: --- [~timhunter] Has the Jira ticket been opened? I believe this would

[jira] [Commented] (SPARK-23537) Logistic Regression without standardization

2018-03-05 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387276#comment-16387276 ] Teng Peng commented on SPARK-23537: --- This is a quiet interesting question and I do not have answer yet:

[jira] [Created] (SPARK-23578) Add multicolumn support for Binarizer

2018-03-03 Thread Teng Peng (JIRA)
Teng Peng created SPARK-23578: - Summary: Add multicolumn support for Binarizer Key: SPARK-23578 URL: https://issues.apache.org/jira/browse/SPARK-23578 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-20133) User guide for spark.ml.stat.ChiSquareTest

2017-11-20 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260095#comment-16260095 ] Teng Peng commented on SPARK-20133: --- I believe the documentation, including user guide and example

[jira] [Resolved] (SPARK-22449) Add BIC for GLM

2017-11-20 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng resolved SPARK-22449. --- Resolution: Later > Add BIC for GLM > --- > > Key: SPARK-22449 >

[jira] [Issue Comment Deleted] (SPARK-22359) Improve the test coverage of window functions

2017-11-06 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-22359: -- Comment: was deleted (was: [~jiangxb] If I can have 1 test as reference, I will figure out the rests

[jira] [Commented] (SPARK-22359) Improve the test coverage of window functions

2017-11-06 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241392#comment-16241392 ] Teng Peng commented on SPARK-22359: --- [~jiangxb] If I can have 1 test as reference, I will figure out

[jira] [Issue Comment Deleted] (SPARK-11502) Word2VecSuite needs appropriate checks

2017-11-05 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-11502: -- Comment: was deleted (was: I am interested in this one. My plan is to compare the test against 1.

[jira] [Commented] (SPARK-20077) Documentation for ml.stats.Correlation

2017-11-05 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239846#comment-16239846 ] Teng Peng commented on SPARK-20077: --- [~srowen] On this

[jira] [Updated] (SPARK-22449) Add BIC for GLM

2017-11-04 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-22449: -- Description: Currently, we only have AIC for GLM. BIC is another "similar" criterion widely used and

[jira] [Created] (SPARK-22449) Add BIC for GLM

2017-11-04 Thread Teng Peng (JIRA)
Teng Peng created SPARK-22449: - Summary: Add BIC for GLM Key: SPARK-22449 URL: https://issues.apache.org/jira/browse/SPARK-22449 Project: Spark Issue Type: Improvement Components: ML

[jira] [Commented] (SPARK-18755) Add Randomized Grid Search to Spark ML

2017-11-04 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239389#comment-16239389 ] Teng Peng commented on SPARK-18755: --- [~yuhaoyan] Is this what you are looking for: after we build the

[jira] [Commented] (SPARK-22433) Linear regression R^2 train/test terminology related

2017-11-03 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237884#comment-16237884 ] Teng Peng commented on SPARK-22433: --- Thanks for the quick response, Sean. I am glad this issue is

[jira] [Commented] (SPARK-22433) Linear regression R^2 train/test terminology related

2017-11-03 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237774#comment-16237774 ] Teng Peng commented on SPARK-22433: --- What I agree with you: be coherent, and we prefer ML-oreinted

[jira] [Updated] (SPARK-22433) Linear regression R^2 train/test terminology related

2017-11-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-22433: -- Description: Traditional statistics is traditional statistics. Their goal, framework, and

[jira] [Updated] (SPARK-22433) Linear regression R^2 train/test terminology related

2017-11-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-22433: -- Description: Traditional statistics is traditional statistics. Their goal, framework, and

[jira] [Updated] (SPARK-22433) Linear regression R^2 train/test terminology related

2017-11-02 Thread Teng Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teng Peng updated SPARK-22433: -- Description: Traditional statistics is traditional statistics. Their goal, framework, and

[jira] [Created] (SPARK-22433) Linear regression R^2 train/test terminology related

2017-11-02 Thread Teng Peng (JIRA)
Teng Peng created SPARK-22433: - Summary: Linear regression R^2 train/test terminology related Key: SPARK-22433 URL: https://issues.apache.org/jira/browse/SPARK-22433 Project: Spark Issue Type: