[jira] [Created] (SPARK-28031) Improve or remove doctest on over function of Column

2019-06-12 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28031: --- Summary: Improve or remove doctest on over function of Column Key: SPARK-28031 URL: https://issues.apache.org/jira/browse/SPARK-28031 Project: Spark

[jira] [Commented] (SPARK-28009) PipedRDD: Block not locked for reading failure

2019-06-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862180#comment-16862180 ] Liang-Chi Hsieh commented on SPARK-28009: - I think this looks like duplicate to SPARK-27666. >

[jira] [Updated] (SPARK-27984) Jenkins job spark-master-package fails due to invalid gpg option

2019-06-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27984: Description: I noticed the failures on Jenkins job {{spark-master-package}}:

[jira] [Created] (SPARK-27984) Jenkins job spark-master-package fails due to invalid gpg option

2019-06-09 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27984: --- Summary: Jenkins job spark-master-package fails due to invalid gpg option Key: SPARK-27984 URL: https://issues.apache.org/jira/browse/SPARK-27984 Project:

[jira] [Commented] (SPARK-27966) input_file_name empty when listing files in parallel

2019-06-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859451#comment-16859451 ] Liang-Chi Hsieh commented on SPARK-27966: - Can you show the output of explaining the query? >

[jira] [Commented] (SPARK-27913) Spark SQL's native ORC reader implements its own schema evolution

2019-06-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859450#comment-16859450 ] Liang-Chi Hsieh commented on SPARK-27913: - But seems the above reproducible example also doesn't

[jira] [Commented] (SPARK-27798) from_avro can modify variables in other rows in local mode

2019-06-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856362#comment-16856362 ] Liang-Chi Hsieh commented on SPARK-27798: - Is anyone working one this? If none, I will probably

[jira] [Commented] (SPARK-27873) Csv reader, adding a corrupt record column causes error if enforceSchema=false

2019-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851952#comment-16851952 ] Liang-Chi Hsieh commented on SPARK-27873: - I can prepare a PR if Marcin or Hyukjin Kwon don't

[jira] [Commented] (SPARK-27873) Csv reader, adding a corrupt record column causes error if enforceSchema=false

2019-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851948#comment-16851948 ] Liang-Chi Hsieh commented on SPARK-27873: - I guess what Marcin meant is: {code} val schema =

[jira] [Resolved] (SPARK-27832) Don't decompress and create column batch when the task is completed

2019-05-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-27832. - Resolution: Won't Fix > Don't decompress and create column batch when the task is

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849871#comment-16849871 ] Liang-Chi Hsieh commented on SPARK-27837: - Btw, I think this is not a bug but like an

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-28 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849836#comment-16849836 ] Liang-Chi Hsieh commented on SPARK-27837: - Ah, I see. MySQL disallows nonconstant argument in

[jira] [Commented] (SPARK-27855) Union failed between 2 datasets of the same type converted from different dataframes

2019-05-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849016#comment-16849016 ] Liang-Chi Hsieh commented on SPARK-27855: - If you notice, the printed schema of two Datasets is

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-27 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848752#comment-16848752 ] Liang-Chi Hsieh commented on SPARK-27837: - I don't see it makes sense. I checked few DBs, and

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848125#comment-16848125 ] Liang-Chi Hsieh commented on SPARK-27837: - Please see the analysis exception: Input argument to

[jira] [Commented] (SPARK-27836) Issue with seeded rand() function in Spark SQL

2019-05-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848124#comment-16848124 ] Liang-Chi Hsieh commented on SPARK-27836: - rand function initializes only once with given seed

[jira] [Commented] (SPARK-27837) Running rand() in SQL with seed of column results in error (rand(col1))

2019-05-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848123#comment-16848123 ] Liang-Chi Hsieh commented on SPARK-27837: - The problem isn't that val1 isn't an int, but it

[jira] [Created] (SPARK-27832) Don't decompress and create column batch when the task is completed

2019-05-24 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27832: --- Summary: Don't decompress and create column batch when the task is completed Key: SPARK-27832 URL: https://issues.apache.org/jira/browse/SPARK-27832 Project:

[jira] [Updated] (SPARK-27779) Regression when explode on map in Generate

2019-05-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27779: Description: When I ran MiscBenchmark for SPARK-27707, I found a regression regarding

[jira] [Resolved] (SPARK-21484) Wrong query plans of Dataset after persist/unpersist

2019-05-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-21484. - Resolution: Won't Fix > Wrong query plans of Dataset after persist/unpersist >

[jira] [Commented] (SPARK-21484) Wrong query plans of Dataset after persist/unpersist

2019-05-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844487#comment-16844487 ] Liang-Chi Hsieh commented on SPARK-21484: - I think this issue isn't going to be fixed, at least,

[jira] [Updated] (SPARK-27779) Regression when explode on map in Generate

2019-05-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27779: Description: When I ran MiscBenchmark for SPARK-27707, I found a regression regarding

[jira] [Updated] (SPARK-27779) Regression when explode on map in Generate

2019-05-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27779: Description: When I ran MiscBenchmark for SPARK-27707, I found a regression regarding

[jira] [Created] (SPARK-27779) Regression when explode on map in Generate

2019-05-20 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27779: --- Summary: Regression when explode on map in Generate Key: SPARK-27779 URL: https://issues.apache.org/jira/browse/SPARK-27779 Project: Spark Issue Type:

[jira] [Commented] (SPARK-27716) Complete the transactions support for part of jdbc datasource operations.

2019-05-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843649#comment-16843649 ] Liang-Chi Hsieh commented on SPARK-27716: - If the added support doesn't cover all cases, doesn't

[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840220#comment-16840220 ] Liang-Chi Hsieh commented on SPARK-27714: - In case how many joined tables there are, the

[jira] [Commented] (SPARK-27722) Remove UnsafeKeyValueSorter

2019-05-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840197#comment-16840197 ] Liang-Chi Hsieh commented on SPARK-27722: - cc [~cloud_fan] Can it be removed? Or I miss

[jira] [Created] (SPARK-27722) Remove UnsafeKeyValueSorter

2019-05-15 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27722: --- Summary: Remove UnsafeKeyValueSorter Key: SPARK-27722 URL: https://issues.apache.org/jira/browse/SPARK-27722 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-27701) Extend NestedColumnAliasing to more nested field cases

2019-05-14 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27701: --- Summary: Extend NestedColumnAliasing to more nested field cases Key: SPARK-27701 URL: https://issues.apache.org/jira/browse/SPARK-27701 Project: Spark

[jira] [Commented] (SPARK-27671) Fix error when casting from a nested null in a struct

2019-05-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839070#comment-16839070 ] Liang-Chi Hsieh commented on SPARK-27671: - [~dongjoon] Thanks for test and updating `Affects

[jira] [Created] (SPARK-27671) Analysis exception thrown when casting from a nested null in a struct

2019-05-10 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27671: --- Summary: Analysis exception thrown when casting from a nested null in a struct Key: SPARK-27671 URL: https://issues.apache.org/jira/browse/SPARK-27671 Project:

[jira] [Created] (SPARK-27633) Remove redundant aliases in NestedColumnAliasing

2019-05-04 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27633: --- Summary: Remove redundant aliases in NestedColumnAliasing Key: SPARK-27633 URL: https://issues.apache.org/jira/browse/SPARK-27633 Project: Spark Issue

[jira] [Updated] (SPARK-27629) Prevent Unpickler from intervening each unpickling

2019-05-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27629: Description: In SPARK-27612, one correctness issue was reported. When protocol 4 is used

[jira] [Updated] (SPARK-27629) Prevent Unpickler from intervening each unpickling

2019-05-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27629: Description: In SPARK-27612, one correctness issue was reported. When protocol 4 is used

[jira] [Updated] (SPARK-27629) Prevent Unpickler from intervening each unpickling

2019-05-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27629: Component/s: (was: SQL) > Prevent Unpickler from intervening each unpickling >

[jira] [Created] (SPARK-27629) Prevent Unpickler from intervening each unpickling

2019-05-03 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27629: --- Summary: Prevent Unpickler from intervening each unpickling Key: SPARK-27629 URL: https://issues.apache.org/jira/browse/SPARK-27629 Project: Spark

[jira] [Commented] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-05-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831680#comment-16831680 ] Liang-Chi Hsieh commented on SPARK-27612: - yeah, seems the issue is happened when python object

[jira] [Commented] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None

2019-05-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831600#comment-16831600 ] Liang-Chi Hsieh commented on SPARK-27612: - Yup, I can reproduce it too. No worry [~bryanc]. :)

[jira] [Commented] (SPARK-27597) RuntimeConfig should be serializable

2019-05-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830911#comment-16830911 ] Liang-Chi Hsieh commented on SPARK-27597: - I see. Please follow [~hyukjin.kwon]'s suggestion if

[jira] [Commented] (SPARK-27597) RuntimeConfig should be serializable

2019-04-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830179#comment-16830179 ] Liang-Chi Hsieh commented on SPARK-27597: - Do you want to access {{SparkSession}} in UDF?

[jira] [Commented] (SPARK-27595) Spark couldn't read partitioned(string type) Orc column correctly if the value contains Float/Double value

2019-04-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830165#comment-16830165 ] Liang-Chi Hsieh commented on SPARK-27595: - Is turning off

[jira] [Commented] (SPARK-27594) spark.sql.orc.enableVectorizedReader causes milliseconds in Timestamp to be read incorrectly

2019-04-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830160#comment-16830160 ] Liang-Chi Hsieh commented on SPARK-27594: - I can't reproduce it. Is it possibly specific to your

[jira] [Commented] (SPARK-27591) A bug in UnivocityParser prevents using UDT

2019-04-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829370#comment-16829370 ] Liang-Chi Hsieh commented on SPARK-27591: - oh, you're right. I've misread the description. Want

[jira] [Commented] (SPARK-27591) A bug in UnivocityParser prevents using UDT

2019-04-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829336#comment-16829336 ] Liang-Chi Hsieh commented on SPARK-27591: - Are you returning string in {{serialize}} in your

[jira] [Commented] (SPARK-27567) Spark Streaming consumers (from Kafka) intermittently die with 'SparkException: Couldn't find leaders for Set'

2019-04-26 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826831#comment-16826831 ] Liang-Chi Hsieh commented on SPARK-27567: - Sounds like a kafka problem, instead of Spark:

[jira] [Commented] (SPARK-27439) Use analyzed plan when explaining Dataset

2019-04-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826551#comment-16826551 ] Liang-Chi Hsieh commented on SPARK-27439: - I will look into it. Thanks [~huonw] > Use analyzed

[jira] [Commented] (SPARK-27367) Faster RoaringBitmap Serialization with v0.8.0

2019-04-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823178#comment-16823178 ] Liang-Chi Hsieh commented on SPARK-27367: - So I think the new serde API has performance

[jira] [Commented] (SPARK-27367) Faster RoaringBitmap Serialization with v0.8.0

2019-04-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822724#comment-16822724 ] Liang-Chi Hsieh commented on SPARK-27367: - I changed spark code to use the new API when

[jira] [Commented] (SPARK-27439) createOrReplaceTempView cannot update old dataset

2019-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821977#comment-16821977 ] Liang-Chi Hsieh commented on SPARK-27439: - One possible issue I'm aware of is, {{df.explain}}

[jira] [Comment Edited] (SPARK-27439) createOrReplaceTempView cannot update old dataset

2019-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821824#comment-16821824 ] Liang-Chi Hsieh edited comment on SPARK-27439 at 4/19/19 10:36 AM: ---

[jira] [Comment Edited] (SPARK-27439) createOrReplaceTempView cannot update old dataset

2019-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821824#comment-16821824 ] Liang-Chi Hsieh edited comment on SPARK-27439 at 4/19/19 10:35 AM: ---

[jira] [Comment Edited] (SPARK-27439) createOrReplaceTempView cannot update old dataset

2019-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821824#comment-16821824 ] Liang-Chi Hsieh edited comment on SPARK-27439 at 4/19/19 10:28 AM: ---

[jira] [Commented] (SPARK-27439) createOrReplaceTempView cannot update old dataset

2019-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821824#comment-16821824 ] Liang-Chi Hsieh commented on SPARK-27439: - The review is resolved during analysis stage when we

[jira] [Commented] (SPARK-27429) [SQL] to_timestamp function with additional argument flag that will allow exception if value could not be cast

2019-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821763#comment-16821763 ] Liang-Chi Hsieh commented on SPARK-27429: - Generally I think you can always know which are the

[jira] [Comment Edited] (SPARK-27367) Faster RoaringBitmap Serialization with v0.8.0

2019-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821653#comment-16821653 ] Liang-Chi Hsieh edited comment on SPARK-27367 at 4/19/19 8:01 AM: -- I do

[jira] [Comment Edited] (SPARK-27367) Faster RoaringBitmap Serialization with v0.8.0

2019-04-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821653#comment-16821653 ] Liang-Chi Hsieh edited comment on SPARK-27367 at 4/19/19 4:32 AM: -- I do

[jira] [Commented] (SPARK-27367) Faster RoaringBitmap Serialization with v0.8.0

2019-04-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821653#comment-16821653 ] Liang-Chi Hsieh commented on SPARK-27367: - I do upgrade it in local. But seems the performance

[jira] [Created] (SPARK-27502) Update nested schema benchmark result for Orc V2

2019-04-18 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27502: --- Summary: Update nested schema benchmark result for Orc V2 Key: SPARK-27502 URL: https://issues.apache.org/jira/browse/SPARK-27502 Project: Spark Issue

[jira] [Created] (SPARK-27476) Refactoring SchemaPruning rule to remove duplicate code

2019-04-16 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27476: --- Summary: Refactoring SchemaPruning rule to remove duplicate code Key: SPARK-27476 URL: https://issues.apache.org/jira/browse/SPARK-27476 Project: Spark

[jira] [Commented] (SPARK-27332) Filter Pushdown duplicates expensive ScalarSubquery (discarding result)

2019-04-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809720#comment-16809720 ] Liang-Chi Hsieh commented on SPARK-27332: - I think this should be already fixed by SPARK-25482.

[jira] [Comment Edited] (SPARK-27385) use mapPartitionsWIthIndex in Dataframe

2019-04-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809706#comment-16809706 ] Liang-Chi Hsieh edited comment on SPARK-27385 at 4/4/19 10:39 AM: -- I

[jira] [Commented] (SPARK-27385) use mapPartitionsWIthIndex in Dataframe

2019-04-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809706#comment-16809706 ] Liang-Chi Hsieh commented on SPARK-27385: - I don't really see

[jira] [Commented] (SPARK-27375) cache not working after discretizer.fit(df).transform(df)

2019-04-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809704#comment-16809704 ] Liang-Chi Hsieh commented on SPARK-27375: - This is the physical plan when using current master

[jira] [Updated] (SPARK-27329) Pruning nested field in map of map key and value from object serializers

2019-03-31 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27329: Issue Type: Sub-task (was: Improvement) Parent: SPARK-25603 > Pruning nested

[jira] [Created] (SPARK-27329) Pruning nested field in map of map key and value from object serializers

2019-03-31 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27329: --- Summary: Pruning nested field in map of map key and value from object serializers Key: SPARK-27329 URL: https://issues.apache.org/jira/browse/SPARK-27329

[jira] [Created] (SPARK-27288) Pruning nested field in complex map key from object serializers

2019-03-26 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27288: --- Summary: Pruning nested field in complex map key from object serializers Key: SPARK-27288 URL: https://issues.apache.org/jira/browse/SPARK-27288 Project: Spark

[jira] [Created] (SPARK-27268) Add map_keys and map_values support in nested schema pruning

2019-03-25 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27268: --- Summary: Add map_keys and map_values support in nested schema pruning Key: SPARK-27268 URL: https://issues.apache.org/jira/browse/SPARK-27268 Project: Spark

[jira] [Updated] (SPARK-27241) Add map_keys and map_values support to SelectedField in nested schema pruning

2019-03-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27241: Summary: Add map_keys and map_values support to SelectedField in nested schema pruning

[jira] [Updated] (SPARK-27241) Add map_keys and map_values support to SelectedField in nested schema pruning

2019-03-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-27241: Description: {{SelectedField}} in nested schema pruning doesn't support map_keys and

[jira] [Created] (SPARK-27241) Add map keys support to SelectedField in nested schema pruning

2019-03-22 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27241: --- Summary: Add map keys support to SelectedField in nested schema pruning Key: SPARK-27241 URL: https://issues.apache.org/jira/browse/SPARK-27241 Project: Spark

[jira] [Commented] (SPARK-27191) union of dataframes depends on order of the columns in 2.4.0

2019-03-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795154#comment-16795154 ] Liang-Chi Hsieh commented on SPARK-27191: - Thanks for pining me and giving the answer

[jira] [Created] (SPARK-27126) Consolidate Scala and Java type deserializerFor

2019-03-11 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27126: --- Summary: Consolidate Scala and Java type deserializerFor Key: SPARK-27126 URL: https://issues.apache.org/jira/browse/SPARK-27126 Project: Spark Issue

[jira] [Commented] (SPARK-27039) toPandas with Avro swallows maxResultSize errors

2019-03-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783247#comment-16783247 ] Liang-Chi Hsieh commented on SPARK-27039: - Not sure if I miss it, but I don't see avro usage.

[jira] [Created] (SPARK-27043) Nested schema pruning benchmark for ORC

2019-03-04 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27043: --- Summary: Nested schema pruning benchmark for ORC Key: SPARK-27043 URL: https://issues.apache.org/jira/browse/SPARK-27043 Project: Spark Issue Type:

[jira] [Created] (SPARK-27034) Nested schema pruning for ORC

2019-03-02 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-27034: --- Summary: Nested schema pruning for ORC Key: SPARK-27034 URL: https://issues.apache.org/jira/browse/SPARK-27034 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-26847) Pruning nested serializers from object serializers: MapType support

2019-02-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-26847: Summary: Pruning nested serializers from object serializers: MapType support (was: Prune

[jira] [Commented] (SPARK-26847) Prune nested serializers from object serializers: MapType support

2019-02-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763211#comment-16763211 ] Liang-Chi Hsieh commented on SPARK-26847: - I will finish this once SPARK-26837 is merged. >

[jira] [Created] (SPARK-26847) Prune nested serializers from object serializers: MapType support

2019-02-07 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26847: --- Summary: Prune nested serializers from object serializers: MapType support Key: SPARK-26847 URL: https://issues.apache.org/jira/browse/SPARK-26847 Project:

[jira] [Created] (SPARK-26837) Pruning nested fields from object serializers

2019-02-06 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26837: --- Summary: Pruning nested fields from object serializers Key: SPARK-26837 URL: https://issues.apache.org/jira/browse/SPARK-26837 Project: Spark Issue

[jira] [Resolved] (SPARK-26808) Pruned schema should not change nullability

2019-02-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-26808. - Resolution: Won't Fix > Pruned schema should not change nullability >

[jira] [Created] (SPARK-26808) Pruned schema should not change nullability

2019-01-31 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26808: --- Summary: Pruned schema should not change nullability Key: SPARK-26808 URL: https://issues.apache.org/jira/browse/SPARK-26808 Project: Spark Issue

[jira] [Commented] (SPARK-26727) CREATE OR REPLACE VIEW query fails with TableAlreadyExistsException

2019-01-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752822#comment-16752822 ] Liang-Chi Hsieh commented on SPARK-26727: - When it asks if a table exists, doesn't it only check

[jira] [Commented] (SPARK-26727) CREATE OR REPLACE VIEW query fails with TableAlreadyExistsException

2019-01-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752277#comment-16752277 ] Liang-Chi Hsieh commented on SPARK-26727: - I can't reproduce it even continuing running the SQL

[jira] [Created] (SPARK-26702) Create a test trait for Parquet and Orc test

2019-01-23 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26702: --- Summary: Create a test trait for Parquet and Orc test Key: SPARK-26702 URL: https://issues.apache.org/jira/browse/SPARK-26702 Project: Spark Issue

[jira] [Commented] (SPARK-26646) Flaky test: pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction

2019-01-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745685#comment-16745685 ] Liang-Chi Hsieh commented on SPARK-26646: - Thanks for pinging me [~hyukjin.kwon]! Sure, let me

[jira] [Created] (SPARK-26604) Register channel for stream request

2019-01-11 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26604: --- Summary: Register channel for stream request Key: SPARK-26604 URL: https://issues.apache.org/jira/browse/SPARK-26604 Project: Spark Issue Type:

[jira] [Created] (SPARK-26559) ML image can't work with numpy versions prior to 1.9

2019-01-07 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26559: --- Summary: ML image can't work with numpy versions prior to 1.9 Key: SPARK-26559 URL: https://issues.apache.org/jira/browse/SPARK-26559 Project: Spark

[jira] [Commented] (SPARK-26558) java.util.NoSuchElementException while saving data into HDFS using Spark

2019-01-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735605#comment-16735605 ] Liang-Chi Hsieh commented on SPARK-26558: - This looks like an issue at spark-greenplum connector

[jira] [Commented] (SPARK-26534) Closure Cleaner Bug

2019-01-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735214#comment-16735214 ] Liang-Chi Hsieh commented on SPARK-26534: - I think the only difference is using Dataset or RDD.

[jira] [Commented] (SPARK-26534) Closure Cleaner Bug

2019-01-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735130#comment-16735130 ] Liang-Chi Hsieh commented on SPARK-26534: - I do a test as below: {code:java} object Test { val

[jira] [Created] (SPARK-26551) Selecting one complex field and having is null predicate on another complex field can cause error

2019-01-05 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26551: --- Summary: Selecting one complex field and having is null predicate on another complex field can cause error Key: SPARK-26551 URL:

[jira] [Commented] (SPARK-26436) Dataframe resulting from a GroupByKey and flatMapGroups operation throws java.lang.UnsupportedException when groupByKey is applied on it.

2019-01-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734282#comment-16734282 ] Liang-Chi Hsieh commented on SPARK-26436: - Sorry I don't know what you mean "should groupByKey 

[jira] [Commented] (SPARK-26436) Dataframe resulting from a GroupByKey and flatMapGroups operation throws java.lang.UnsupportedException when groupByKey is applied on it.

2019-01-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734178#comment-16734178 ] Liang-Chi Hsieh commented on SPARK-26436: - This issue is caused by how you create the row:

[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732569#comment-16732569 ] Liang-Chi Hsieh commented on SPARK-25591: - I can make backport PRs if you need. [~dongjoon] >

[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732567#comment-16732567 ] Liang-Chi Hsieh commented on SPARK-25591: - This is bug fixing, so I think it makes sense to

[jira] [Created] (SPARK-26517) Avoid duplicate test in ParquetSchemaPruningSuite

2019-01-02 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26517: --- Summary: Avoid duplicate test in ParquetSchemaPruningSuite Key: SPARK-26517 URL: https://issues.apache.org/jira/browse/SPARK-26517 Project: Spark

[jira] [Commented] (SPARK-26511) java.lang.ClassCastException error when loading Spark MLlib model from parquet file

2019-01-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731592#comment-16731592 ] Liang-Chi Hsieh commented on SPARK-26511: - The model you provided has a different column order

[jira] [Created] (SPARK-26435) Support creating partitioned table using Hive CTAS by specifying partition column names

2018-12-24 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-26435: --- Summary: Support creating partitioned table using Hive CTAS by specifying partition column names Key: SPARK-26435 URL: https://issues.apache.org/jira/browse/SPARK-26435

[jira] [Commented] (SPARK-26405) OOM

2018-12-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725691#comment-16725691 ] Liang-Chi Hsieh commented on SPARK-26405: - I think the exception message is somehow clear:

[jira] [Commented] (SPARK-26408) java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347)

2018-12-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725683#comment-16725683 ] Liang-Chi Hsieh commented on SPARK-26408: - Looks like an issue at

<    1   2   3   4   5   6   7   8   9   10   >