[jira] [Commented] (SPARK-22284) Code of class \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" grows beyond 64 KB

2017-10-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207183#comment-16207183 ] Liang-Chi Hsieh commented on SPARK-22284: - Btw, we have used {{UnsafeProjection}} in many places

[jira] [Comment Edited] (SPARK-8515) Improve ML attribute API

2017-10-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206770#comment-16206770 ] Liang-Chi Hsieh edited comment on SPARK-8515 at 10/16/17 11:28 PM: --- I'm

[jira] [Commented] (SPARK-8515) Improve ML attribute API

2017-10-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206770#comment-16206770 ] Liang-Chi Hsieh commented on SPARK-8515: I'm not sure if SPARK-2008 is related to metadata in ML?

[jira] [Commented] (SPARK-22276) Unnecessary repartitioning

2017-10-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206745#comment-16206745 ] Liang-Chi Hsieh commented on SPARK-22276: - I think this issue is already resolved by a recent fix

[jira] [Commented] (SPARK-22276) Unnecessary repartitioning

2017-10-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205400#comment-16205400 ] Liang-Chi Hsieh commented on SPARK-22276: - Can you provide an simple example to reproduce this

[jira] [Comment Edited] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199789#comment-16199789 ] Liang-Chi Hsieh edited comment on SPARK-22231 at 10/15/17 4:02 AM: ---

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199987#comment-16199987 ] Liang-Chi Hsieh commented on SPARK-9: - Is any common components among the RDMA shuffle engine

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199789#comment-16199789 ] Liang-Chi Hsieh commented on SPARK-22231: - [~Jeremy Smith] Thanks for the context. Regarding the

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198716#comment-16198716 ] Liang-Chi Hsieh commented on SPARK-9: - As we have the pluggable mechanism to set up external

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198146#comment-16198146 ] Liang-Chi Hsieh commented on SPARK-22231: - Btw, the capacity to work on nested data types looks

[jira] [Updated] (SPARK-8515) Improve ML attribute API

2017-10-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-8515: --- Attachment: (was: SPARK-8515.pdf) > Improve ML attribute API > >

[jira] [Updated] (SPARK-8515) Improve ML attribute API

2017-10-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-8515: --- Attachment: SPARK-8515.pdf > Improve ML attribute API > > >

[jira] [Comment Edited] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198096#comment-16198096 ] Liang-Chi Hsieh edited comment on SPARK-22231 at 10/10/17 3:28 AM: ---

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198096#comment-16198096 ] Liang-Chi Hsieh commented on SPARK-22231: - Looks like `mapItems` is an API can work on any Array

[jira] [Comment Edited] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198082#comment-16198082 ] Liang-Chi Hsieh edited comment on SPARK-22231 at 10/10/17 3:18 AM: --- I

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198082#comment-16198082 ] Liang-Chi Hsieh commented on SPARK-22231: - I think there is a typo in the second example to add a

[jira] [Comment Edited] (SPARK-8515) Improve ML attribute API

2017-10-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194074#comment-16194074 ] Liang-Chi Hsieh edited comment on SPARK-8515 at 10/6/17 4:52 AM: - I'm

[jira] [Updated] (SPARK-8515) Improve ML attribute API

2017-10-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-8515: --- Attachment: SPARK-8515.pdf > Improve ML attribute API > > >

[jira] [Commented] (SPARK-19141) VectorAssembler metadata causing memory issues

2017-10-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194075#comment-16194075 ] Liang-Chi Hsieh commented on SPARK-19141: - I'm working on a new ML attribute API (SPARK-8515)

[jira] [Commented] (SPARK-8515) Improve ML attribute API

2017-10-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194074#comment-16194074 ] Liang-Chi Hsieh commented on SPARK-8515: I'm working on a new ML attribute API which is supposed

[jira] [Created] (SPARK-22206) gapply in R can't work on empty grouping columns

2017-10-04 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22206: --- Summary: gapply in R can't work on empty grouping columns Key: SPARK-22206 URL: https://issues.apache.org/jira/browse/SPARK-22206 Project: Spark Issue

[jira] [Commented] (SPARK-22137) Failed to insert VectorUDT to hive table with DataFrameWriter.insertInto(tableName: String)

2017-10-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192414#comment-16192414 ] Liang-Chi Hsieh commented on SPARK-22137: - Ideally I think a UDT should be able casted to/from

[jira] [Comment Edited] (SPARK-22137) Failed to insert VectorUDT to hive table with DataFrameWriter.insertInto(tableName: String)

2017-09-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185937#comment-16185937 ] Liang-Chi Hsieh edited comment on SPARK-22137 at 9/29/17 3:04 PM: --

[jira] [Comment Edited] (SPARK-22137) Failed to insert VectorUDT to hive table with DataFrameWriter.insertInto(tableName: String)

2017-09-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185937#comment-16185937 ] Liang-Chi Hsieh edited comment on SPARK-22137 at 9/29/17 2:56 PM: --

[jira] [Commented] (SPARK-22137) Failed to insert VectorUDT to hive table with DataFrameWriter.insertInto(tableName: String)

2017-09-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185937#comment-16185937 ] Liang-Chi Hsieh commented on SPARK-22137: - Actually that is because we only allow to cast between

[jira] [Commented] (SPARK-22113) Dataset shows in Hive is inconsistent with JDBC

2017-09-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180273#comment-16180273 ] Liang-Chi Hsieh commented on SPARK-22113: - I can reproduce it. Currently Spark can't work with

[jira] [Commented] (SPARK-22113) Dataset shows in Hive is inconsistent with JDBC

2017-09-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180117#comment-16180117 ] Liang-Chi Hsieh commented on SPARK-22113: - Simply said, we don't have custom JDBC dialect for

[jira] [Created] (SPARK-22124) Sample and Limit should also defer input evaluation under codegen

2017-09-25 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22124: --- Summary: Sample and Limit should also defer input evaluation under codegen Key: SPARK-22124 URL: https://issues.apache.org/jira/browse/SPARK-22124 Project:

[jira] [Commented] (SPARK-13030) Change OneHotEncoder to Estimator

2017-09-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180090#comment-16180090 ] Liang-Chi Hsieh commented on SPARK-13030: - Yes. I can do it. > Change OneHotEncoder to

[jira] [Comment Edited] (SPARK-13030) Change OneHotEncoder to Estimator

2017-09-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180090#comment-16180090 ] Liang-Chi Hsieh edited comment on SPARK-13030 at 9/26/17 2:00 AM: -- Yes.

[jira] [Commented] (SPARK-22113) Dataset shows in Hive is inconsistent with JDBC

2017-09-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178698#comment-16178698 ] Liang-Chi Hsieh commented on SPARK-22113: - Hmm, actually we have the API {{def select(col:

[jira] [Comment Edited] (SPARK-22112) Add missing method to pyspark api: spark.read.csv(Dataset)

2017-09-24 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178458#comment-16178458 ] Liang-Chi Hsieh edited comment on SPARK-22112 at 9/25/17 2:36 AM: -- cc

[jira] [Commented] (SPARK-22112) Add missing method to pyspark api: spark.read.csv(Dataset)

2017-09-24 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178458#comment-16178458 ] Liang-Chi Hsieh commented on SPARK-22112: - cc [~jmchung] or [~goldmedal]] Maybe any of you will

[jira] [Commented] (SPARK-22081) Generalized Reduced Error Logistic Regression

2017-09-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176090#comment-16176090 ] Liang-Chi Hsieh commented on SPARK-22081: - Btw, looks like RELR is patented:

[jira] [Resolved] (SPARK-21653) Complement SQL expression document

2017-09-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-21653. - Resolution: Fixed > Complement SQL expression document >

[jira] [Created] (SPARK-22088) Incorrect scalastyle comment causes wrong styles in stringExpressions

2017-09-21 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22088: --- Summary: Incorrect scalastyle comment causes wrong styles in stringExpressions Key: SPARK-22088 URL: https://issues.apache.org/jira/browse/SPARK-22088 Project:

[jira] [Comment Edited] (SPARK-21653) Complement SQL expression document

2017-09-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174514#comment-16174514 ] Liang-Chi Hsieh edited comment on SPARK-21653 at 9/21/17 9:55 AM: -- I

[jira] [Commented] (SPARK-21653) Complement SQL expression document

2017-09-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174514#comment-16174514 ] Liang-Chi Hsieh commented on SPARK-21653: - I just go through all SQL expressions. Looks like we

[jira] [Updated] (SPARK-22086) Add expression description for CASE WHEN

2017-09-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22086: Issue Type: Sub-task (was: Documentation) Parent: SPARK-21653 > Add expression

[jira] [Created] (SPARK-22086) Add expression description for CASE WHEN

2017-09-21 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22086: --- Summary: Add expression description for CASE WHEN Key: SPARK-22086 URL: https://issues.apache.org/jira/browse/SPARK-22086 Project: Spark Issue Type:

[jira] [Created] (SPARK-22001) ImputerModel can do withColumn for all input columns at one pass

2017-09-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22001: --- Summary: ImputerModel can do withColumn for all input columns at one pass Key: SPARK-22001 URL: https://issues.apache.org/jira/browse/SPARK-22001 Project:

[jira] [Commented] (SPARK-21990) QueryPlanConstraints misses some constraints that can be recursively inferred

2017-09-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165639#comment-16165639 ] Liang-Chi Hsieh commented on SPARK-21990: - After inspecting it, because

[jira] [Updated] (SPARK-21990) QueryPlanConstraints misses some constraints that can be recursively inferred

2017-09-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21990: Description: When I inspected the latest change of SPARK-21979, I found we could miss few

[jira] [Resolved] (SPARK-21990) QueryPlanConstraints misses some constraints that can be recursively inferred

2017-09-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-21990. - Resolution: Not A Problem > QueryPlanConstraints misses some constraints that can be

[jira] [Updated] (SPARK-21990) QueryPlanConstraints misses some constraints that can be recursively inferred

2017-09-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21990: Summary: QueryPlanConstraints misses some constraints that can be recursively inferred

[jira] [Created] (SPARK-21990) QueryPlanConstraints misses some constraints can be recursively inferred

2017-09-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-21990: --- Summary: QueryPlanConstraints misses some constraints can be recursively inferred Key: SPARK-21990 URL: https://issues.apache.org/jira/browse/SPARK-21990

[jira] [Created] (SPARK-21954) JacksonUtils should verify MapType's value type instead of key type

2017-09-08 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-21954: --- Summary: JacksonUtils should verify MapType's value type instead of key type Key: SPARK-21954 URL: https://issues.apache.org/jira/browse/SPARK-21954 Project:

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2017-09-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150091#comment-16150091 ] Liang-Chi Hsieh commented on SPARK-21658: - Thanks [~hyukjin.kwon] and [~jerryshao]. > Adds the

[jira] [Commented] (SPARK-21885) HiveMetastoreCatalog.InferIfNeeded too slow when caseSensitiveInference enabled

2017-08-31 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149973#comment-16149973 ] Liang-Chi Hsieh commented on SPARK-21885: - I tend to agree that when we don't actually need the

[jira] [Comment Edited] (SPARK-21847) Where is the lit() function in pyspark 2.2.0?

2017-08-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143343#comment-16143343 ] Liang-Chi Hsieh edited comment on SPARK-21847 at 8/28/17 4:09 AM: --

[jira] [Commented] (SPARK-21847) Where is the lit() function in pyspark 2.2.0?

2017-08-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143343#comment-16143343 ] Liang-Chi Hsieh commented on SPARK-21847: - {code} >>> from pyspark.sql.functions import lit,nanvl

[jira] [Issue Comment Deleted] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21835: Comment: was deleted (was: Submitted PR at https://github.com/apache/spark/pull/19050) >

[jira] [Commented] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-24 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141233#comment-16141233 ] Liang-Chi Hsieh commented on SPARK-21835: - Submitted PR at

[jira] [Created] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-24 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-21835: --- Summary: RewritePredicateSubquery should not produce unresolved query plans Key: SPARK-21835 URL: https://issues.apache.org/jira/browse/SPARK-21835 Project:

[jira] [Commented] (SPARK-21807) The getAliasedConstraints function in LogicalPlan will take a long time when number of expressions is greater than 100

2017-08-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137819#comment-16137819 ] Liang-Chi Hsieh commented on SPARK-21807: - This is a known issue. Currently we provide a SQL conf

[jira] [Commented] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136175#comment-16136175 ] Liang-Chi Hsieh commented on SPARK-21799: - So I think the problem is you shouldn't do

[jira] [Issue Comment Deleted] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21799: Comment: was deleted (was: Yeah, that looks right direction. {{df.storageLevel}} is not

[jira] [Commented] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136164#comment-16136164 ] Liang-Chi Hsieh commented on SPARK-21799: - Hmm, I go to check ML KMeans codes where I don't find

[jira] [Commented] (SPARK-21799) KMeans performance regression (5-6x slowdown) in Spark 2.2

2017-08-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136152#comment-16136152 ] Liang-Chi Hsieh commented on SPARK-21799: - Yeah, that looks right direction. {{df.storageLevel}}

[jira] [Commented] (SPARK-21763) InferSchema option does not infer the correct schema (timestamp) from xlsx file.

2017-08-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131890#comment-16131890 ] Liang-Chi Hsieh commented on SPARK-21763: - I'm afraid that the inferSchema is done by the library

[jira] [Commented] (SPARK-21759) In.checkInputDataTypes should not wrongly report unresolved plans for IN correlated subquery

2017-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131493#comment-16131493 ] Liang-Chi Hsieh commented on SPARK-21759: - Submitted PR at

[jira] [Updated] (SPARK-21759) In.checkInputDataTypes should not wrongly report unresolved plans for IN correlated subquery

2017-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21759: Description: With the check for structural integrity proposed in SPARK-21726, I found that

[jira] [Updated] (SPARK-21759) In.checkInputDataTypes should not wrongly report unresolved plans for IN correlated subquery

2017-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21759: Summary: In.checkInputDataTypes should not wrongly report unresolved plans for IN

[jira] [Updated] (SPARK-21759) PullupCorrelatedPredicates should not produce unresolved plan

2017-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21759: Summary: PullupCorrelatedPredicates should not produce unresolved plan (was:

[jira] [Created] (SPARK-21759) PullupCorrelatedPredicates can produce unresolved plan

2017-08-16 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-21759: --- Summary: PullupCorrelatedPredicates can produce unresolved plan Key: SPARK-21759 URL: https://issues.apache.org/jira/browse/SPARK-21759 Project: Spark

[jira] [Commented] (SPARK-21726) Check for structural integrity of the plan in QO in test mode

2017-08-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128327#comment-16128327 ] Liang-Chi Hsieh commented on SPARK-21726: - Submitted a PR at

[jira] [Commented] (SPARK-21657) Spark has exponential time complexity to explode(array of structs)

2017-08-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128242#comment-16128242 ] Liang-Chi Hsieh commented on SPARK-21657: - [~maropu] I've noticed that change. There is a hotfix

[jira] [Commented] (SPARK-21726) Check for structural integrity of the plan in QO in test mode

2017-08-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127288#comment-16127288 ] Liang-Chi Hsieh commented on SPARK-21726: - [~rxin] Thanks for pinging me! Yes, I'm interested in

[jira] [Commented] (SPARK-21721) Memory leak in org.apache.spark.sql.hive.execution.InsertIntoHiveTable

2017-08-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125217#comment-16125217 ] Liang-Chi Hsieh commented on SPARK-21721: - Submitted a PR at

[jira] [Created] (SPARK-21717) Decouple the generated codes of consuming rows in operators under whole-stage codegen

2017-08-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-21717: --- Summary: Decouple the generated codes of consuming rows in operators under whole-stage codegen Key: SPARK-21717 URL: https://issues.apache.org/jira/browse/SPARK-21717

[jira] [Comment Edited] (SPARK-21657) Spark has exponential time complexity to explode(array of structs)

2017-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123050#comment-16123050 ] Liang-Chi Hsieh edited comment on SPARK-21657 at 8/11/17 8:55 AM: -- Maybe

[jira] [Commented] (SPARK-21657) Spark has exponential time complexity to explode(array of structs)

2017-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123050#comment-16123050 ] Liang-Chi Hsieh commented on SPARK-21657: - Maybe not very related to this issue. But I'm

[jira] [Commented] (SPARK-21677) json_tuple throws NullPointException when column is null as string type.

2017-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121740#comment-16121740 ] Liang-Chi Hsieh commented on SPARK-21677: - As a given field name {{null}} can't be matched with

[jira] [Commented] (SPARK-21684) df.write double escaping all the already escaped characters except the first one

2017-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121244#comment-16121244 ] Liang-Chi Hsieh commented on SPARK-21684: - Would you mind provide a small codes to reproduce it?

[jira] [Updated] (SPARK-21679) KMeans Clustering is Not Deterministic

2017-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21679: Issue Type: Improvement (was: Bug) > KMeans Clustering is Not Deterministic >

[jira] [Commented] (SPARK-21679) KMeans Clustering is Not Deterministic

2017-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121213#comment-16121213 ] Liang-Chi Hsieh commented on SPARK-21679: - Old MLlib {{org.apache.spark.mllib.clustering.KMeans}}

[jira] [Commented] (SPARK-21686) spark.sql.hive.convertMetastoreOrc is causing NullPointerException while reading ORC tables

2017-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121187#comment-16121187 ] Liang-Chi Hsieh commented on SPARK-21686: - I saw the affect version is 1.6.1. So the more recent

[jira] [Commented] (SPARK-21677) json_tuple throws NullPointException when column is null as string type.

2017-08-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119928#comment-16119928 ] Liang-Chi Hsieh commented on SPARK-21677: - [~hyukjin.kwon] Thanks! Definitely we are interested

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2017-08-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118271#comment-16118271 ] Liang-Chi Hsieh commented on SPARK-21658: - [~mgaido] Appreciate your understanding! Thanks. >

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2017-08-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118044#comment-16118044 ] Liang-Chi Hsieh commented on SPARK-21658: - [~mgaido] If you don't mind, I'm mentoring a beginner

[jira] [Comment Edited] (SPARK-21631) Building Spark with SBT unsuccessful when source code in Mllib is modified, But with MVN is ok

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117780#comment-16117780 ] Liang-Chi Hsieh edited comment on SPARK-21631 at 8/8/17 3:10 AM: -

[jira] [Commented] (SPARK-21631) Building Spark with SBT unsuccessful when source code in Mllib is modified, But with MVN is ok

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117780#comment-16117780 ] Liang-Chi Hsieh commented on SPARK-21631: - [~ibingoogle] I saw you did {{export

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117718#comment-16117718 ] Liang-Chi Hsieh commented on SPARK-21658: - I will mentor a beginner to work on this. Thanks

[jira] [Commented] (SPARK-21653) Complement SQL expression document

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116282#comment-16116282 ] Liang-Chi Hsieh commented on SPARK-21653: - [~hyukjin.kwon] oh, yeah, looks like it's. As

[jira] [Commented] (SPARK-21653) Complement SQL expression document

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116241#comment-16116241 ] Liang-Chi Hsieh commented on SPARK-21653: - [~sowen] I made a detailed description now for this.

[jira] [Updated] (SPARK-21653) Complement SQL expression document

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21653: Description: We have {{ExpressionDescription}} for SQL expressions. The expression

[jira] [Commented] (SPARK-21653) Complement SQL expression document

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116232#comment-16116232 ] Liang-Chi Hsieh commented on SPARK-21653: - We have {{ExpressionDescription}} for SQL expressions.

[jira] [Updated] (SPARK-21654) Complement predicates expression description

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21654: Issue Type: Sub-task (was: Improvement) Parent: SPARK-21653 > Complement

[jira] [Created] (SPARK-21654) Complement predicates expression description

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-21654: --- Summary: Complement predicates expression description Key: SPARK-21654 URL: https://issues.apache.org/jira/browse/SPARK-21654 Project: Spark Issue

[jira] [Created] (SPARK-21653) Complement SQL expression document

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-21653: --- Summary: Complement SQL expression document Key: SPARK-21653 URL: https://issues.apache.org/jira/browse/SPARK-21653 Project: Spark Issue Type:

[jira] [Updated] (SPARK-21653) Complement SQL expression document

2017-08-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21653: Issue Type: Umbrella (was: Improvement) > Complement SQL expression document >

[jira] [Commented] (SPARK-21629) OR nullability is incorrect

2017-08-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115702#comment-16115702 ] Liang-Chi Hsieh commented on SPARK-21629: - No problem. Mentees still learn something. > OR

[jira] [Commented] (SPARK-20025) Driver fail over will not work, if SPARK_LOCAL* env is set.

2017-08-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115696#comment-16115696 ] Liang-Chi Hsieh commented on SPARK-20025: - I think there is an unsolved issue SPARK-12963 which

[jira] [Commented] (SPARK-21629) OR nullability is incorrect

2017-08-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115626#comment-16115626 ] Liang-Chi Hsieh commented on SPARK-21629: - [~hvanhovell] As this is not a problem, but I can't

[jira] [Commented] (SPARK-21610) Corrupt records are not handled properly when creating a dataframe from a file

2017-08-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115622#comment-16115622 ] Liang-Chi Hsieh commented on SPARK-21610: - I'm mentoring one beginner working on this. > Corrupt

[jira] [Commented] (SPARK-21629) OR nullability is incorrect

2017-08-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115405#comment-16115405 ] Liang-Chi Hsieh commented on SPARK-21629: - Yap, Thanks a lot [~hvanhovell]. > OR nullability is

[jira] [Updated] (SPARK-21629) OR nullability is incorrect

2017-08-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-21629: Labels: Starter (was: ) > OR nullability is incorrect > --- > >

[jira] [Commented] (SPARK-21629) OR nullability is incorrect

2017-08-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115395#comment-16115395 ] Liang-Chi Hsieh commented on SPARK-21629: - [~hvanhovell] Sorry, I'm mentoring the few local

[jira] [Comment Edited] (SPARK-21629) OR nullability is incorrect

2017-08-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115395#comment-16115395 ] Liang-Chi Hsieh edited comment on SPARK-21629 at 8/5/17 1:12 PM: -

[jira] [Commented] (SPARK-21631) Building Spark with SBT unsuccessful when source code in Mllib is modified, But with MVN is ok

2017-08-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115186#comment-16115186 ] Liang-Chi Hsieh commented on SPARK-21631: - I've tried. {{NOLINT_ON_COMPILE=1 build/sbt "testOnly

<    1   2   3   4   5   6   7   8   9   10   >