[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-12-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719844#comment-16719844 ] Liang-Chi Hsieh commented on SPARK-24152: - Just got reply from CRAN admin. It should be fixed

[jira] [Created] (SPARK-26337) Add benchmark for LongToUnsafeRowMap

2018-12-11 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26337: --- Summary: Add benchmark for LongToUnsafeRowMap Key: SPARK-26337 URL: https://issues.apache.org/jira/browse/SPARK-26337 Project: Spark Issue Type: Test

[jira] [Commented] (SPARK-26306) Flaky test: org.apache.spark.util.collection.SorterSuite

2018-12-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714220#comment-16714220 ] Liang-Chi Hsieh commented on SPARK-26306: - I have not noticed the evidence that this test is

[jira] [Commented] (SPARK-26305) Breakthrough the memory limitation of broadcast join

2018-12-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713917#comment-16713917 ] Liang-Chi Hsieh commented on SPARK-26305: - I'm working on SPARK-25549, which provides an API to

[jira] [Commented] (SPARK-26224) Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.

2018-12-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713494#comment-16713494 ] Liang-Chi Hsieh commented on SPARK-26224: - I think it is not specified to withColumn. withColumn

[jira] [Commented] (SPARK-26306) Flaky test: org.apache.spark.util.collection.SorterSuite

2018-12-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712970#comment-16712970 ] Liang-Chi Hsieh commented on SPARK-26306: - Besides above build, is there any build that this

[jira] [Resolved] (SPARK-26273) Add OneHotEncoderEstimator as alias to OneHotEncoder

2018-12-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-26273. - Resolution: Won't Fix > Add OneHotEncoderEstimator as alias to OneHotEncoder >

[jira] [Commented] (SPARK-26273) Add OneHotEncoderEstimator as alias to OneHotEncoder

2018-12-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710192#comment-16710192 ] Liang-Chi Hsieh commented on SPARK-26273: - For now the idea collected from the PR is we don't

[jira] [Created] (SPARK-26273) Add OneHotEncoderEstimator as alias to OneHotEncoder

2018-12-05 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26273: --- Summary: Add OneHotEncoderEstimator as alias to OneHotEncoder Key: SPARK-26273 URL: https://issues.apache.org/jira/browse/SPARK-26273 Project: Spark

[jira] [Commented] (SPARK-26215) define reserved keywords after SQL standard

2018-11-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703035#comment-16703035 ] Liang-Chi Hsieh commented on SPARK-26215: - Thanks for pinging me. Is "In Spark SQL, we are too

[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701633#comment-16701633 ] Liang-Chi Hsieh commented on SPARK-26155: - Except for Q19 of TPC-DS, can we observe performance

[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700431#comment-16700431 ] Liang-Chi Hsieh commented on SPARK-26155: - Btw, have you observed the same thing on Spark 2.4?

[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700414#comment-16700414 ] Liang-Chi Hsieh commented on SPARK-26155: - I'd also like to see more details about what is the

[jira] [Comment Edited] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700213#comment-16700213 ] Liang-Chi Hsieh edited comment on SPARK-26155 at 11/27/18 10:48 AM:

[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700228#comment-16700228 ] Liang-Chi Hsieh commented on SPARK-26155: - Is there possibly any other thing affecting the time?

[jira] [Commented] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

2018-11-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700213#comment-16700213 ] Liang-Chi Hsieh commented on SPARK-26155: - "Q19 analysis in Spark2.3 without L486 & 487.pdf"

[jira] [Created] (SPARK-26133) Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to OneHotEncoder

2018-11-20 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26133: --- Summary: Remove deprecated OneHotEncoder and rename OneHotEncoderEstimator to OneHotEncoder Key: SPARK-26133 URL: https://issues.apache.org/jira/browse/SPARK-26133

[jira] [Commented] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()

2018-11-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692722#comment-16692722 ] Liang-Chi Hsieh commented on SPARK-26019: - {{TCPServer}} begins to process requests only after

[jira] [Commented] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()

2018-11-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692504#comment-16692504 ] Liang-Chi Hsieh commented on SPARK-26019: - Isn't the server beginning to handle requests after

[jira] [Commented] (SPARK-25549) High level API to collect RDD statistics

2018-11-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690841#comment-16690841 ] Liang-Chi Hsieh commented on SPARK-25549: - I have code patch based on the design doc in local.

[jira] [Commented] (SPARK-26078) WHERE .. IN fails to filter rows when used in combination with UNION

2018-11-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689268#comment-16689268 ] Liang-Chi Hsieh commented on SPARK-26078: - I simply have a look for it, but don't have great fix

[jira] [Created] (SPARK-26085) Key attribute of primitive type under typed aggregation should be named as "key" too

2018-11-15 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26085: --- Summary: Key attribute of primitive type under typed aggregation should be named as "key" too Key: SPARK-26085 URL: https://issues.apache.org/jira/browse/SPARK-26085

[jira] [Updated] (SPARK-25942) Aggregate expressions shouldn't be resolved on AppendColumns

2018-11-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-25942: Description: Dataset.groupByKey will bring in new attributes from serializer. If key type

[jira] [Updated] (SPARK-25942) Aggregate expressions shouldn't be resolved on AppendColumns

2018-11-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-25942: Description: Dataset.groupByKey will bring in new attributes from serializer. If key type

[jira] [Updated] (SPARK-25942) Aggregate expressions shouldn't be resolved on AppendColumns

2018-11-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-25942: Summary: Aggregate expressions shouldn't be resolved on AppendColumns (was:

[jira] [Created] (SPARK-25942) Dataset.groupByKey can't work on primitive data

2018-11-05 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25942: --- Summary: Dataset.groupByKey can't work on primitive data Key: SPARK-25942 URL: https://issues.apache.org/jira/browse/SPARK-25942 Project: Spark Issue

[jira] [Commented] (SPARK-25923) SparkR UT Failure (checking CRAN incoming feasibility)

2018-11-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672885#comment-16672885 ] Liang-Chi Hsieh commented on SPARK-25923: - CRAN sysadmin replied me it should be fixed soon. >

[jira] [Comment Edited] (SPARK-25923) SparkR UT Failure (checking CRAN incoming feasibility)

2018-11-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672885#comment-16672885 ] Liang-Chi Hsieh edited comment on SPARK-25923 at 11/2/18 10:27 AM: ---

[jira] [Commented] (SPARK-25923) SparkR UT Failure (checking CRAN incoming feasibility)

2018-11-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672538#comment-16672538 ] Liang-Chi Hsieh commented on SPARK-25923: - I found the cause of this error and sent an email to

[jira] [Commented] (SPARK-25879) Schema pruning fails when a nested field and top level field are selected

2018-10-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668332#comment-16668332 ] Liang-Chi Hsieh commented on SPARK-25879: - I agreed with [~hyukjin.kwon]. > Schema pruning

[jira] [Commented] (SPARK-25879) Schema pruning fails when a nested field and top level field are selected

2018-10-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668276#comment-16668276 ] Liang-Chi Hsieh commented on SPARK-25879: - According to [~michael] 's comment, this seems to be

[jira] [Commented] (SPARK-25829) Duplicated map keys are not handled consistently

2018-10-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663539#comment-16663539 ] Liang-Chi Hsieh commented on SPARK-25829: - Although I think the inconsistent handling exists for

[jira] [Comment Edited] (SPARK-25829) Duplicated map keys are not handled consistently

2018-10-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663484#comment-16663484 ] Liang-Chi Hsieh edited comment on SPARK-25829 at 10/25/18 10:02 AM:

[jira] [Comment Edited] (SPARK-25829) Duplicated map keys are not handled consistently

2018-10-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663484#comment-16663484 ] Liang-Chi Hsieh edited comment on SPARK-25829 at 10/25/18 9:52 AM: ---

[jira] [Comment Edited] (SPARK-25829) Duplicated map keys are not handled consistently

2018-10-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663484#comment-16663484 ] Liang-Chi Hsieh edited comment on SPARK-25829 at 10/25/18 9:50 AM: ---

[jira] [Commented] (SPARK-25829) Duplicated map keys are not handled consistently

2018-10-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663484#comment-16663484 ] Liang-Chi Hsieh commented on SPARK-25829: - Besides Java/Scala, is there any related definition

[jira] [Created] (SPARK-25811) Support PyArrow's feature to raise an error for unsafe cast

2018-10-23 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25811: --- Summary: Support PyArrow's feature to raise an error for unsafe cast Key: SPARK-25811 URL: https://issues.apache.org/jira/browse/SPARK-25811 Project: Spark

[jira] [Updated] (SPARK-25040) Empty string should be disallowed for data types other than string and binary in JSON

2018-10-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-25040: Summary: Empty string should be disallowed for data types other than string and binary in

[jira] [Commented] (SPARK-25040) Empty string should be disallowed for data types except for string and binary types in JSON

2018-10-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660143#comment-16660143 ] Liang-Chi Hsieh commented on SPARK-25040: - The JIRA title is not correct now. I changed it. >

[jira] [Updated] (SPARK-25040) Empty string should be disallowed for data types except for string and binary types in JSON

2018-10-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-25040: Summary: Empty string should be disallowed for data types except for string and binary

[jira] [Commented] (SPARK-25783) Spark shell fails because of jline incompatibility

2018-10-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658452#comment-16658452 ] Liang-Chi Hsieh commented on SPARK-25783: - Looks like it is 2.14.3. > Spark shell fails because

[jira] [Commented] (SPARK-25783) Spark shell fails because of jline incompatibility

2018-10-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658440#comment-16658440 ] Liang-Chi Hsieh commented on SPARK-25783: - [~koert] Yes, from the verbose message, it looks so.

[jira] [Commented] (SPARK-25783) Spark shell fails because of jline incompatibility

2018-10-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658114#comment-16658114 ] Liang-Chi Hsieh commented on SPARK-25783: - Actually I can't reproduce this issue. From java's

[jira] [Created] (SPARK-25791) Datatype of serializers in RowEncoder should be accessible

2018-10-20 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25791: --- Summary: Datatype of serializers in RowEncoder should be accessible Key: SPARK-25791 URL: https://issues.apache.org/jira/browse/SPARK-25791 Project: Spark

[jira] [Created] (SPARK-25746) Refactoring ExpressionEncoder

2018-10-16 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25746: --- Summary: Refactoring ExpressionEncoder Key: SPARK-25746 URL: https://issues.apache.org/jira/browse/SPARK-25746 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-25643) Performance issues querying wide rows

2018-10-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641554#comment-16641554 ] Liang-Chi Hsieh commented on SPARK-25643: - {quote}predicate push down is not helping: either

[jira] [Commented] (SPARK-25587) NPE in Dataset when reading from Parquet as Product

2018-10-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637963#comment-16637963 ] Liang-Chi Hsieh commented on SPARK-25587: - Just ran few experiments. It seems caused by

[jira] [Comment Edited] (SPARK-25587) NPE in Dataset when reading from Parquet as Product

2018-10-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637963#comment-16637963 ] Liang-Chi Hsieh edited comment on SPARK-25587 at 10/4/18 9:21 AM: -- Just

[jira] [Commented] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-10-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635216#comment-16635216 ] Liang-Chi Hsieh commented on SPARK-25461: - [~hyukjin.kwon] Thanks and no problem at all! You can

[jira] [Comment Edited] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-10-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635048#comment-16635048 ] Liang-Chi Hsieh edited comment on SPARK-25461 at 10/2/18 5:27 AM: -- I've

[jira] [Commented] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-10-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635048#comment-16635048 ] Liang-Chi Hsieh commented on SPARK-25461: - I've looked more at this. We don't really check if

[jira] [Commented] (SPARK-25554) Avro logical types get ignored in SchemaConverters.toSqlType

2018-09-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631505#comment-16631505 ] Liang-Chi Hsieh commented on SPARK-25554: - hmm, I think Spark 2.4 should have comprehensive

[jira] [Commented] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-09-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629831#comment-16629831 ] Liang-Chi Hsieh commented on SPARK-25461: - That's said I'm not sure if this can be called a bug

[jira] [Commented] (SPARK-25549) High level API to collect RDD statistics

2018-09-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629702#comment-16629702 ] Liang-Chi Hsieh commented on SPARK-25549: - cc [~cloud_fan]   > High level API to collect RDD

[jira] [Commented] (SPARK-25549) High level API to collect RDD statistics

2018-09-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629700#comment-16629700 ] Liang-Chi Hsieh commented on SPARK-25549: - The design doc is at:

[jira] [Created] (SPARK-25549) High level API to collect RDD statistics

2018-09-26 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25549: --- Summary: High level API to collect RDD statistics Key: SPARK-25549 URL: https://issues.apache.org/jira/browse/SPARK-25549 Project: Spark Issue Type:

[jira] [Comment Edited] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-09-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628428#comment-16628428 ] Liang-Chi Hsieh edited comment on SPARK-25461 at 9/26/18 8:45 AM: --

[jira] [Comment Edited] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-09-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628428#comment-16628428 ] Liang-Chi Hsieh edited comment on SPARK-25461 at 9/26/18 8:42 AM: -- When

[jira] [Commented] (SPARK-25461) PySpark Pandas UDF outputs incorrect results when input columns contain None

2018-09-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628428#comment-16628428 ] Liang-Chi Hsieh commented on SPARK-25461: - When your data has not None, do you still have

[jira] [Commented] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628225#comment-16628225 ] Liang-Chi Hsieh commented on SPARK-25378: - Don't we have any decision on this yet? >

[jira] [Commented] (SPARK-25497) limit operation within whole stage codegen should not consume all the inputs

2018-09-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622996#comment-16622996 ] Liang-Chi Hsieh commented on SPARK-25497: - Yes. Thanks for pinging me. I will look into this. >

[jira] [Commented] (SPARK-19355) Use map output statistices to improve global limit's parallelism

2018-09-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621965#comment-16621965 ] Liang-Chi Hsieh commented on SPARK-19355: - [~cloud_fan] For this, I think we should first have

[jira] [Commented] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618588#comment-16618588 ] Liang-Chi Hsieh commented on SPARK-25378: - Hmm.. have we decided to include a fixing into 2.4?

[jira] [Commented] (SPARK-25374) SafeProjection supports fallback to an interpreted mode

2018-09-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617453#comment-16617453 ] Liang-Chi Hsieh commented on SPARK-25374: - I do think so. > SafeProjection supports fallback to

[jira] [Commented] (SPARK-25431) Fix function examples and unify the format of the example results.

2018-09-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614865#comment-16614865 ] Liang-Chi Hsieh commented on SPARK-25431: - Don't know why the PR link is not attached

[jira] [Comment Edited] (SPARK-25374) SafeProjection supports fallback to an interpreted mode

2018-09-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614458#comment-16614458 ] Liang-Chi Hsieh edited comment on SPARK-25374 at 9/14/18 8:03 AM: --

[jira] [Commented] (SPARK-25374) SafeProjection supports fallback to an interpreted mode

2018-09-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614458#comment-16614458 ] Liang-Chi Hsieh commented on SPARK-25374: - Though this is not a bug fix, will we consider to put

[jira] [Commented] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614200#comment-16614200 ] Liang-Chi Hsieh commented on SPARK-25378: - The fix looks like:

[jira] [Comment Edited] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613157#comment-16613157 ] Liang-Chi Hsieh edited comment on SPARK-25378 at 9/13/18 8:33 AM: -- I

[jira] [Comment Edited] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613157#comment-16613157 ] Liang-Chi Hsieh edited comment on SPARK-25378 at 9/13/18 7:59 AM: -- I

[jira] [Commented] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613157#comment-16613157 ] Liang-Chi Hsieh commented on SPARK-25378: - I think a quick fix is to use general `get` method

[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception

2018-09-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611535#comment-16611535 ] Liang-Chi Hsieh commented on SPARK-25271: - Yeah, looks like after some changes, this kind of

[jira] [Commented] (SPARK-23597) Audit Spark SQL code base for non-interpreted expressions

2018-09-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609918#comment-16609918 ] Liang-Chi Hsieh commented on SPARK-23597: - At least, I didn't find expressions that do not

[jira] [Commented] (SPARK-25378) ArrayData.toArray assume UTF8String

2018-09-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608256#comment-16608256 ] Liang-Chi Hsieh commented on SPARK-25378: - Thanks for pinging me. I agreed with [~hvanhovell].

[jira] [Commented] (SPARK-25091) UNCACHE TABLE, CLEAR CACHE, rdd.unpersist() does not clean up executor memory

2018-09-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607970#comment-16607970 ] Liang-Chi Hsieh commented on SPARK-25091: - I think this is duplicate to SPARK-24889. > UNCACHE

[jira] [Created] (SPARK-25363) Schema pruning doesn't work if nested column is used in where clause

2018-09-06 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25363: --- Summary: Schema pruning doesn't work if nested column is used in where clause Key: SPARK-25363 URL: https://issues.apache.org/jira/browse/SPARK-25363 Project:

[jira] [Created] (SPARK-25352) Perform ordered global limit when limit number is bigger than topKSortFallbackThreshold

2018-09-05 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25352: --- Summary: Perform ordered global limit when limit number is bigger than topKSortFallbackThreshold Key: SPARK-25352 URL: https://issues.apache.org/jira/browse/SPARK-25352

[jira] [Comment Edited] (SPARK-25279) Throw exception: zzcclp java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc

2018-09-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604254#comment-16604254 ] Liang-Chi Hsieh edited comment on SPARK-25279 at 9/5/18 10:34 AM: -- The

[jira] [Commented] (SPARK-25279) Throw exception: zzcclp java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc

2018-09-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604254#comment-16604254 ] Liang-Chi Hsieh commented on SPARK-25279: - The paste mode in REPL wraps pasted code as a single

[jira] [Created] (SPARK-25290) BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError

2018-08-30 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25290: --- Summary: BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError Key: SPARK-25290 URL: https://issues.apache.org/jira/browse/SPARK-25290

[jira] [Commented] (SPARK-25217) Error thrown when creating BlockMatrix

2018-08-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596992#comment-16596992 ] Liang-Chi Hsieh commented on SPARK-25217: - If no further question, I think we can close this

[jira] [Comment Edited] (SPARK-25271) Creating parquet table with all the column null throws exception

2018-08-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596949#comment-16596949 ] Liang-Chi Hsieh edited comment on SPARK-25271 at 8/30/18 12:11 AM: --- I

[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception

2018-08-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596949#comment-16596949 ] Liang-Chi Hsieh commented on SPARK-25271: - I think this is a known issue on Hive and Parquet,

[jira] [Commented] (SPARK-25217) Error thrown when creating BlockMatrix

2018-08-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596250#comment-16596250 ] Liang-Chi Hsieh commented on SPARK-25217: - I think you are mixing {{pyspark.ml.linalg.Matrix}},

[jira] [Commented] (SPARK-25236) Investigate using a logging library inside of PySpark on the workers instead of print

2018-08-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593097#comment-16593097 ] Liang-Chi Hsieh commented on SPARK-25236: - hmm, maybe dumb question, can't we use {{logging}} to

[jira] [Commented] (SPARK-25232) Support Full-Text Search in Spark SQL

2018-08-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592802#comment-16592802 ] Liang-Chi Hsieh commented on SPARK-25232: - This looks to me more like a specific datasource that

[jira] [Commented] (SPARK-25202) SQL Function Split Should Respect Limit Argument

2018-08-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590874#comment-16590874 ] Liang-Chi Hsieh commented on SPARK-25202: - [~phegstrom] No problem. Please submit a PR for this.

[jira] [Commented] (SPARK-25202) SQL Function Split Should Respect Limit Argument

2018-08-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589601#comment-16589601 ] Liang-Chi Hsieh commented on SPARK-25202: - Let me see if I have time to do this today later. >

[jira] [Commented] (SPARK-25202) SQL Function Split Should Respect Limit Argument

2018-08-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589558#comment-16589558 ] Liang-Chi Hsieh commented on SPARK-25202: - I saw Presto has this support. Is it worth adding

[jira] [Commented] (SPARK-25198) org.apache.spark.sql.catalyst.parser.ParseException: DataType json is not supported.

2018-08-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589547#comment-16589547 ] Liang-Chi Hsieh commented on SPARK-25198: - I think the {{customSchema}} here refers to Spark's

[jira] [Comment Edited] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-08-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588159#comment-16588159 ] Liang-Chi Hsieh edited comment on SPARK-25164 at 8/21/18 11:30 PM: ---

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-08-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588159#comment-16588159 ] Liang-Chi Hsieh commented on SPARK-25164: - This is easy and looks good to have. [~bersprockets]

[jira] [Comment Edited] (SPARK-24961) sort operation causes out of memory

2018-08-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584961#comment-16584961 ] Liang-Chi Hsieh edited comment on SPARK-24961 at 8/18/18 11:29 PM: ---

[jira] [Commented] (SPARK-24961) sort operation causes out of memory

2018-08-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584961#comment-16584961 ] Liang-Chi Hsieh commented on SPARK-24961: - When you run global sort, Spark will do data sampling

[jira] [Commented] (SPARK-24961) sort operation causes out of memory

2018-08-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584944#comment-16584944 ] Liang-Chi Hsieh commented on SPARK-24961: - I think it seems not common or practical to use local

[jira] [Commented] (SPARK-24961) sort operation causes out of memory

2018-08-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584943#comment-16584943 ] Liang-Chi Hsieh commented on SPARK-24961: - I saw you described {{Spark 2.3.1 in local mode}} in

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584075#comment-16584075 ] Liang-Chi Hsieh commented on SPARK-25144: - I'm not sure if there is, can you build it? >

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584037#comment-16584037 ] Liang-Chi Hsieh commented on SPARK-25144: - Have you tried on master branch? Tried with

[jira] [Updated] (SPARK-25117) Add EXEPT ALL and INTERSECT ALL support in R.

2018-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-25117: Issue Type: Improvement (was: Bug) > Add EXEPT ALL and INTERSECT ALL support in R. >

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583966#comment-16583966 ] Liang-Chi Hsieh commented on SPARK-25144: - Hmm, can't reproduce this on master branch, so it is

<    1   2   3   4   5   6   7   8   9   10   >