[jira] [Commented] (SPARK-23291) SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1

2018-03-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418599#comment-16418599 ] Liang-Chi Hsieh commented on SPARK-23291: - Because it is related to behavior change, I'm hesitant

[jira] [Commented] (SPARK-23784) Cannot use custom Aggregator with groupBy/agg

2018-03-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418558#comment-16418558 ] Liang-Chi Hsieh commented on SPARK-23784: - I think your question is already replied on

[jira] [Resolved] (SPARK-23784) Cannot use custom Aggregator with groupBy/agg

2018-03-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-23784. - Resolution: Not A Problem > Cannot use custom Aggregator with groupBy/agg >

[jira] [Commented] (SPARK-23734) InvalidSchemaException While Saving ALSModel

2018-03-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410914#comment-16410914 ] Liang-Chi Hsieh commented on SPARK-23734: - I use the latest master branch and can't reproduce the

[jira] [Updated] (SPARK-23614) Union produces incorrect results when caching is used

2018-03-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23614: Component/s: (was: Spark Core) SQL > Union produces incorrect results

[jira] [Created] (SPARK-23661) Implement treeAggregate on Dataset API

2018-03-12 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-23661: --- Summary: Implement treeAggregate on Dataset API Key: SPARK-23661 URL: https://issues.apache.org/jira/browse/SPARK-23661 Project: Spark Issue Type: New

[jira] [Commented] (SPARK-22446) Optimizer causing StringIndexerModel's indexer UDF to throw "Unseen label" exception incorrectly for filtered data.

2018-03-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387191#comment-16387191 ] Liang-Chi Hsieh commented on SPARK-22446: - Yeah, sounds good. > Optimizer causing

[jira] [Commented] (SPARK-22446) Optimizer causing StringIndexerModel's indexer UDF to throw "Unseen label" exception incorrectly for filtered data.

2018-03-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384618#comment-16384618 ] Liang-Chi Hsieh commented on SPARK-22446: - This fix uses an new API  {{asNondeterministic}} of

[jira] [Commented] (SPARK-23471) RandomForestClassificationModel save() - incorrect metadata

2018-02-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379516#comment-16379516 ] Liang-Chi Hsieh commented on SPARK-23471: - I can't reproduce this. With `fit`, the params are

[jira] [Commented] (SPARK-23448) Dataframe returns wrong result when column don't respect datatype

2018-02-24 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375439#comment-16375439 ] Liang-Chi Hsieh commented on SPARK-23448: - In fact this is exactly the JSON parser's behavior,

[jira] [Commented] (SPARK-23390) Flaky Test Suite: FileBasedDataSourceSuite in Spark 2.3/hadoop 2.7

2018-02-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372360#comment-16372360 ] Liang-Chi Hsieh commented on SPARK-23390: - {{FileBasedDataSourceSuite}} seems still flaky.  

[jira] [Updated] (SPARK-23448) Dataframe returns wrong result when column don't respect datatype

2018-02-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23448: Component/s: (was: Spark Core) SQL > Dataframe returns wrong result

[jira] [Commented] (SPARK-23455) Default Params in ML should be saved separately

2018-02-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368116#comment-16368116 ] Liang-Chi Hsieh commented on SPARK-23455: - Currently, {{DefaultParamsWriter}} saves the following

[jira] [Created] (SPARK-23455) Default Params in ML should be saved separately

2018-02-16 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-23455: --- Summary: Default Params in ML should be saved separately Key: SPARK-23455 URL: https://issues.apache.org/jira/browse/SPARK-23455 Project: Spark Issue

[jira] [Commented] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363367#comment-16363367 ] Liang-Chi Hsieh commented on SPARK-23377: - I agree with what [~mlnick] said. > Bucketizer with

[jira] [Commented] (SPARK-23403) java.lang.ArrayIndexOutOfBoundsException: 10

2018-02-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362030#comment-16362030 ] Liang-Chi Hsieh commented on SPARK-23403: - Have you checked the content of the csv file? Is there

[jira] [Commented] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361724#comment-16361724 ] Liang-Chi Hsieh commented on SPARK-23377: - For now, I think neither 3rd option or my current

[jira] [Commented] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361718#comment-16361718 ] Liang-Chi Hsieh commented on SPARK-23377: - I have no objection to [~josephkb]'s proposal (first

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358048#comment-16358048 ] Liang-Chi Hsieh commented on SPARK-2: - Currently I think we don't have API in Dataset to just

[jira] [Commented] (SPARK-22446) Optimizer causing StringIndexerModel's indexer UDF to throw "Unseen label" exception incorrectly for filtered data.

2018-02-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356525#comment-16356525 ] Liang-Chi Hsieh commented on SPARK-22446: - 2.0 and 2.1 also have this issue. > Optimizer causing

[jira] [Commented] (SPARK-22446) Optimizer causing StringIndexerModel's indexer UDF to throw "Unseen label" exception incorrectly for filtered data.

2018-02-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356506#comment-16356506 ] Liang-Chi Hsieh commented on SPARK-22446: - Yes, this is an issue in Spark 2.2. For earlier

[jira] [Created] (SPARK-23284) Document several get API of ColumnVector's behavior when accessing null slot

2018-01-31 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-23284: --- Summary: Document several get API of ColumnVector's behavior when accessing null slot Key: SPARK-23284 URL: https://issues.apache.org/jira/browse/SPARK-23284

[jira] [Commented] (SPARK-23273) Spark Dataset withColumn - schema column order isn't the same as case class paramether order

2018-01-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346148#comment-16346148 ] Liang-Chi Hsieh commented on SPARK-23273: - The {{name}} column will be added after {{age}} in

[jira] [Commented] (SPARK-23224) union all will throw gramma exception

2018-01-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340746#comment-16340746 ] Liang-Chi Hsieh commented on SPARK-23224: - I'd close this for now. You can reopen it if you find

[jira] [Resolved] (SPARK-23224) union all will throw gramma exception

2018-01-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-23224. - Resolution: Not A Problem > union all will throw gramma exception >

[jira] [Commented] (SPARK-23224) union all will throw gramma exception

2018-01-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340741#comment-16340741 ] Liang-Chi Hsieh commented on SPARK-23224: -

[jira] [Commented] (SPARK-23220) broadcast hint not applied in a streaming left anti join

2018-01-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340495#comment-16340495 ] Liang-Chi Hsieh commented on SPARK-23220: - I can't re-produce it locally. I join a stream with a

[jira] [Commented] (SPARK-23173) from_json can produce nulls for fields which are marked as non-nullable

2018-01-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334005#comment-16334005 ] Liang-Chi Hsieh commented on SPARK-23173: - +1 for 1 too. > from_json can produce nulls for

[jira] [Comment Edited] (SPARK-22935) Dataset with Java Beans for java.sql.Date throws CompileException

2018-01-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16330207#comment-16330207 ] Liang-Chi Hsieh edited comment on SPARK-22935 at 1/18/18 7:53 AM: -- Can

[jira] [Comment Edited] (SPARK-22935) Dataset with Java Beans for java.sql.Date throws CompileException

2018-01-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16330207#comment-16330207 ] Liang-Chi Hsieh edited comment on SPARK-22935 at 1/18/18 7:52 AM: -- Can

[jira] [Commented] (SPARK-22935) Dataset with Java Beans for java.sql.Date throws CompileException

2018-01-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16330207#comment-16330207 ] Liang-Chi Hsieh commented on SPARK-22935: - Can > Dataset with Java Beans for java.sql.Date

[jira] [Commented] (SPARK-23021) AnalysisBarrier should not cut off the explain output for Parsed Logical Plan

2018-01-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323910#comment-16323910 ] Liang-Chi Hsieh commented on SPARK-23021: - To override {{innerChildren}} sounds good to me.

[jira] [Created] (SPARK-23042) Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier

2018-01-11 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-23042: --- Summary: Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier Key: SPARK-23042 URL: https://issues.apache.org/jira/browse/SPARK-23042

[jira] [Resolved] (SPARK-22898) collect_set aggregation on bucketed table causes an exchange stage

2018-01-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-22898. - Resolution: Duplicate > collect_set aggregation on bucketed table causes an exchange

[jira] [Commented] (SPARK-22898) collect_set aggregation on bucketed table causes an exchange stage

2018-01-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308970#comment-16308970 ] Liang-Chi Hsieh commented on SPARK-22898: - If no problem I will resolve this as duplicate. You

[jira] [Comment Edited] (SPARK-22898) collect_set aggregation on bucketed table causes an exchange stage

2018-01-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307591#comment-16307591 ] Liang-Chi Hsieh edited comment on SPARK-22898 at 1/2/18 1:46 AM: - I think

[jira] [Commented] (SPARK-22898) collect_set aggregation on bucketed table causes an exchange stage

2018-01-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307591#comment-16307591 ] Liang-Chi Hsieh commented on SPARK-22898: - I think this should already be fixed by SPARK-3.

[jira] [Updated] (SPARK-22856) Add wrapper for codegen output and nullability

2017-12-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22856: Affects Version/s: (was: 2.2.1) 2.3.0 > Add wrapper for codegen

[jira] [Updated] (SPARK-22856) Add wrapper for codegen output and nullability

2017-12-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22856: Description: The codegen output of {{Expression}}, {{ExprCode}}, now encapsulates only

[jira] [Created] (SPARK-22856) Add wrapper for codegen output and nullability

2017-12-20 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22856: --- Summary: Add wrapper for codegen output and nullability Key: SPARK-22856 URL: https://issues.apache.org/jira/browse/SPARK-22856 Project: Spark Issue

[jira] [Comment Edited] (SPARK-22600) Fix 64kb limit for deeply nested expressions under wholestage codegen

2017-12-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290423#comment-16290423 ] Liang-Chi Hsieh edited comment on SPARK-22600 at 12/14/17 6:59 AM: --- The

[jira] [Commented] (SPARK-22600) Fix 64kb limit for deeply nested expressions under wholestage codegen

2017-12-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290423#comment-16290423 ] Liang-Chi Hsieh commented on SPARK-22600: - The current approach proposes a new contract that

[jira] [Created] (SPARK-22772) elt should use splitExpressionsWithCurrentInputs to split expression codes

2017-12-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22772: --- Summary: elt should use splitExpressionsWithCurrentInputs to split expression codes Key: SPARK-22772 URL: https://issues.apache.org/jira/browse/SPARK-22772

[jira] [Resolved] (SPARK-22715) Reuse array in CreateNamedStruct

2017-12-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-22715. - Resolution: Not A Problem > Reuse array in CreateNamedStruct >

[jira] [Commented] (SPARK-22660) Use position() and limit() to fix ambiguity issue in scala-2.12 and JDK9

2017-11-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272283#comment-16272283 ] Liang-Chi Hsieh commented on SPARK-22660: - For the error you ping me, from the error message,

[jira] [Created] (SPARK-22600) Fix 64kb limit for deeply nested expressions under wholestage codegen

2017-11-24 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22600: --- Summary: Fix 64kb limit for deeply nested expressions under wholestage codegen Key: SPARK-22600 URL: https://issues.apache.org/jira/browse/SPARK-22600 Project:

[jira] [Created] (SPARK-22591) GenerateOrdering shouldn't change ctx.INPUT_ROW

2017-11-23 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22591: --- Summary: GenerateOrdering shouldn't change ctx.INPUT_ROW Key: SPARK-22591 URL: https://issues.apache.org/jira/browse/SPARK-22591 Project: Spark Issue

[jira] [Resolved] (SPARK-22551) Fix 64kb compile error for common expression types

2017-11-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-22551. - Resolution: Not A Problem > Fix 64kb compile error for common expression types >

[jira] [Commented] (SPARK-22551) Fix 64kb compile error for common expression types

2017-11-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263696#comment-16263696 ] Liang-Chi Hsieh commented on SPARK-22551: - After SPARK-22543 is merged, I can't reproduce this

[jira] [Updated] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22541: Component/s: (was: Documentation) > Dataframes: applying multiple filters one after

[jira] [Commented] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260240#comment-16260240 ] Liang-Chi Hsieh commented on SPARK-22541: - Since this is known behavior, I will change this from

[jira] [Updated] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22541: Issue Type: Documentation (was: Bug) > Dataframes: applying multiple filters one after

[jira] [Updated] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22541: Component/s: Documentation > Dataframes: applying multiple filters one after another using

[jira] [Comment Edited] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258873#comment-16258873 ] Liang-Chi Hsieh edited comment on SPARK-22541 at 11/20/17 7:14 AM: ---

[jira] [Commented] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258873#comment-16258873 ] Liang-Chi Hsieh commented on SPARK-22541: - Similar to the case of using python udfs with

[jira] [Comment Edited] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258868#comment-16258868 ] Liang-Chi Hsieh edited comment on SPARK-22541 at 11/20/17 7:01 AM: ---

[jira] [Commented] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258868#comment-16258868 ] Liang-Chi Hsieh commented on SPARK-22541: - Sorry, my previous reply is not completely correct.

[jira] [Updated] (SPARK-22551) Fix 64kb compile error for common expression types

2017-11-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22551: Issue Type: Sub-task (was: Bug) Parent: SPARK-22510 > Fix 64kb compile error for

[jira] [Created] (SPARK-22551) Fix 64kb compile error for common expression types

2017-11-18 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22551: --- Summary: Fix 64kb compile error for common expression types Key: SPARK-22551 URL: https://issues.apache.org/jira/browse/SPARK-22551 Project: Spark

[jira] [Commented] (SPARK-20295) when spark.sql.adaptive.enabled is enabled, have conflict with Exchange Resue

2017-11-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256570#comment-16256570 ] Liang-Chi Hsieh commented on SPARK-20295: - Btw, from the partial query plan you posted, looks

[jira] [Commented] (SPARK-20295) when spark.sql.adaptive.enabled is enabled, have conflict with Exchange Resue

2017-11-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256546#comment-16256546 ] Liang-Chi Hsieh commented on SPARK-20295: - Is this bug also in 2.2? When adaptive execution is

[jira] [Commented] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256441#comment-16256441 ] Liang-Chi Hsieh commented on SPARK-22541: - Due to query optimization, two filters are combined

[jira] [Commented] (SPARK-22491) union all can't execute parallel with group by

2017-11-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254431#comment-16254431 ] Liang-Chi Hsieh commented on SPARK-22491: - For the query without aggregation, the exchanges are

[jira] [Created] (SPARK-22527) Reuse coordinated ShuffleExchange if possible

2017-11-15 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22527: --- Summary: Reuse coordinated ShuffleExchange if possible Key: SPARK-22527 URL: https://issues.apache.org/jira/browse/SPARK-22527 Project: Spark Issue

[jira] [Commented] (SPARK-22491) union all can't execute parallel with group by

2017-11-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251194#comment-16251194 ] Liang-Chi Hsieh commented on SPARK-22491: - If the aggregation is removed, there is no shuffle

[jira] [Updated] (SPARK-22442) Schema generated by Product Encoder doesn't match case class field name when using non-standard characters

2017-11-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22442: Fix Version/s: 2.2.1 > Schema generated by Product Encoder doesn't match case class field

[jira] [Commented] (SPARK-22442) Schema generated by Product Encoder doesn't match case class field name when using non-standard characters

2017-11-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249086#comment-16249086 ] Liang-Chi Hsieh commented on SPARK-22442: - cc [~felixcheung] This will be backported to 2.2. >

[jira] [Commented] (SPARK-22442) Schema generated by Product Encoder doesn't match case class field name when using non-standard characters

2017-11-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249043#comment-16249043 ] Liang-Chi Hsieh commented on SPARK-22442: - I think it is easy. Let me prepare a backport PR for

[jira] [Commented] (SPARK-22460) Spark De-serialization of Timestamp field is Incorrect

2017-11-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245344#comment-16245344 ] Liang-Chi Hsieh commented on SPARK-22460: - FYI., there is already a reported issue in spark-avro

[jira] [Comment Edited] (SPARK-22460) Spark De-serialization of Timestamp field is Incorrect

2017-11-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245158#comment-16245158 ] Liang-Chi Hsieh edited comment on SPARK-22460 at 11/9/17 3:33 AM: -- >From

[jira] [Comment Edited] (SPARK-22460) Spark De-serialization of Timestamp field is Incorrect

2017-11-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245158#comment-16245158 ] Liang-Chi Hsieh edited comment on SPARK-22460 at 11/9/17 3:32 AM: -- >From

[jira] [Commented] (SPARK-22460) Spark De-serialization of Timestamp field is Incorrect

2017-11-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245158#comment-16245158 ] Liang-Chi Hsieh commented on SPARK-22460: - >From the output of

[jira] [Commented] (SPARK-22460) Spark De-serialization of Timestamp field is Incorrect

2017-11-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243834#comment-16243834 ] Liang-Chi Hsieh commented on SPARK-22460: - In Spark SQL, to cast a long field to timestamp field,

[jira] [Commented] (SPARK-22446) Optimizer causing StringIndexerModel's indexer UDF to throw "Unseen label" exception incorrectly for filtered data.

2017-11-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239897#comment-16239897 ] Liang-Chi Hsieh commented on SPARK-22446: - For this special case, the simplest workaround is to

[jira] [Commented] (SPARK-22427) StackOverFlowError when using FPGrowth

2017-11-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239824#comment-16239824 ] Liang-Chi Hsieh commented on SPARK-22427: - >From a rough glance, looks like the error didn't be

[jira] [Commented] (SPARK-22442) Schema generated by Product Encoder doesn't match case class field name when using non-standard characters

2017-11-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239815#comment-16239815 ] Liang-Chi Hsieh commented on SPARK-22442: - I tried on latest master branch. It can work with

[jira] [Commented] (SPARK-22398) Partition directories with leading 0s cause wrong results

2017-11-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239810#comment-16239810 ] Liang-Chi Hsieh commented on SPARK-22398: - As we can control it with the config

[jira] [Commented] (SPARK-22398) Partition directories with leading 0s cause wrong results

2017-11-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235083#comment-16235083 ] Liang-Chi Hsieh commented on SPARK-22398: - [~mgaido], I'd prefer to treat them as integer by

[jira] [Commented] (SPARK-22406) pyspark version tag is wrong on PyPi

2017-11-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233700#comment-16233700 ] Liang-Chi Hsieh commented on SPARK-22406: - cc [~holdenk] > pyspark version tag is wrong on PyPi

[jira] [Updated] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2017-10-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-22347: Issue Type: Documentation (was: Bug) > UDF is evaluated when 'F.when' condition is false

[jira] [Commented] (SPARK-11215) Add multiple columns support to StringIndexer

2017-10-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224975#comment-16224975 ] Liang-Chi Hsieh commented on SPARK-11215: - Hi [~WeichenXu123], I'd like to know if you are busy

[jira] [Commented] (SPARK-22291) Postgresql UUID[] to Cassandra: Conversion Error

2017-10-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224247#comment-16224247 ] Liang-Chi Hsieh commented on SPARK-22291: - Thanks [~hyukjin.kwon]. > Postgresql UUID[] to

[jira] [Commented] (SPARK-22291) Postgresql UUID[] to Cassandra: Conversion Error

2017-10-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224245#comment-16224245 ] Liang-Chi Hsieh commented on SPARK-22291: - [~cloud_fan] The Assignee should be [~jmchung].

[jira] [Comment Edited] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2017-10-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221612#comment-16221612 ] Liang-Chi Hsieh edited comment on SPARK-22347 at 10/27/17 2:30 PM: ---

[jira] [Comment Edited] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2017-10-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221612#comment-16221612 ] Liang-Chi Hsieh edited comment on SPARK-22347 at 10/27/17 5:06 AM: ---

[jira] [Comment Edited] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2017-10-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221612#comment-16221612 ] Liang-Chi Hsieh edited comment on SPARK-22347 at 10/27/17 5:05 AM: ---

[jira] [Comment Edited] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2017-10-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221612#comment-16221612 ] Liang-Chi Hsieh edited comment on SPARK-22347 at 10/27/17 2:44 AM: ---

[jira] [Commented] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2017-10-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221612#comment-16221612 ] Liang-Chi Hsieh commented on SPARK-22347: - Under the current execution mode of Python UDFs, I

[jira] [Comment Edited] (SPARK-22335) Union for DataSet uses column order instead of types for union

2017-10-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218547#comment-16218547 ] Liang-Chi Hsieh edited comment on SPARK-22335 at 10/25/17 1:18 PM: ---

[jira] [Commented] (SPARK-22335) Union for DataSet uses column order instead of types for union

2017-10-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218547#comment-16218547 ] Liang-Chi Hsieh commented on SPARK-22335: - IMHO, the concept of {{union}} API in Dataset is tied

[jira] [Commented] (SPARK-22335) Union for DataSet uses column order instead of types for union

2017-10-24 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218027#comment-16218027 ] Liang-Chi Hsieh commented on SPARK-22335: - [~CBribiescas] The column position in the schema of a

[jira] [Created] (SPARK-22348) The table cache providing ColumnarBatch should also do partition batch pruning

2017-10-24 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22348: --- Summary: The table cache providing ColumnarBatch should also do partition batch pruning Key: SPARK-22348 URL: https://issues.apache.org/jira/browse/SPARK-22348

[jira] [Comment Edited] (SPARK-22335) Union for DataSet uses column order instead of types for union

2017-10-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216248#comment-16216248 ] Liang-Chi Hsieh edited comment on SPARK-22335 at 10/24/17 3:36 AM: ---

[jira] [Commented] (SPARK-22335) Union for DataSet uses column order instead of types for union

2017-10-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216248#comment-16216248 ] Liang-Chi Hsieh commented on SPARK-22335: - Can't {{unionByName}} solve it? {code] scala>

[jira] [Commented] (SPARK-22291) Postgresql UUID[] to Cassandra: Conversion Error

2017-10-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212154#comment-16212154 ] Liang-Chi Hsieh commented on SPARK-22291: - Could you send a PR for this? > Postgresql UUID[] to

[jira] [Commented] (SPARK-22296) CodeGenerator - failed to compile when constructor has scala.collection.mutable.Seq vs. scala.collection.Seq

2017-10-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212123#comment-16212123 ] Liang-Chi Hsieh commented on SPARK-22296: - Seems no problem with 2.2? {code} scala> case class

[jira] [Commented] (SPARK-13030) Change OneHotEncoder to Estimator

2017-10-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209203#comment-16209203 ] Liang-Chi Hsieh commented on SPARK-13030: - [~josephkb] I think as we add a new class, it is

[jira] [Comment Edited] (SPARK-22283) withColumn should replace multiple instances with a single one

2017-10-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208585#comment-16208585 ] Liang-Chi Hsieh edited comment on SPARK-22283 at 10/17/17 11:43 PM:

[jira] [Commented] (SPARK-22283) withColumn should replace multiple instances with a single one

2017-10-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208585#comment-16208585 ] Liang-Chi Hsieh commented on SPARK-22283: - [~kitbellew] I didn't mean you're doing select. I

[jira] [Commented] (SPARK-22283) withColumn should replace multiple instances with a single one

2017-10-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207780#comment-16207780 ] Liang-Chi Hsieh commented on SPARK-22283: - When joined result has duplicate column name, you

<    1   2   3   4   5   6   7   8   9   10   >