[jira] [Commented] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937495#comment-16937495 ] Liang-Chi Hsieh commented on SPARK-29239: - I added SPARK-29221 to the title of the PR. >

[jira] [Commented] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937486#comment-16937486 ] Liang-Chi Hsieh commented on SPARK-29239: - Yes. > Subquery should not cause NPE when

[jira] [Created] (SPARK-29239) Subquery should not cause NPE when eliminating subexpression

2019-09-25 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29239: --- Summary: Subquery should not cause NPE when eliminating subexpression Key: SPARK-29239 URL: https://issues.apache.org/jira/browse/SPARK-29239 Project: Spark

[jira] [Resolved] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-29181. - Resolution: Duplicate > Cache preferred locations of checkpointed RDD >

[jira] [Assigned] (SPARK-29182) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29182: --- Assignee: Liang-Chi Hsieh > Cache preferred locations of checkpointed RDD >

[jira] [Commented] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933955#comment-16933955 ] Liang-Chi Hsieh commented on SPARK-29181: - [~dongjoon] Thanks. Not aware of creating duplicate

[jira] [Created] (SPARK-29182) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29182: --- Summary: Cache preferred locations of checkpointed RDD Key: SPARK-29182 URL: https://issues.apache.org/jira/browse/SPARK-29182 Project: Spark Issue

[jira] [Created] (SPARK-29181) Cache preferred locations of checkpointed RDD

2019-09-19 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29181: --- Summary: Cache preferred locations of checkpointed RDD Key: SPARK-29181 URL: https://issues.apache.org/jira/browse/SPARK-29181 Project: Spark Issue

[jira] [Commented] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932784#comment-16932784 ] Liang-Chi Hsieh commented on SPARK-29042: - [~hyukjin.kwon] Am I setting the fix versions and

[jira] [Updated] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-29042: Fix Version/s: 2.4.5 > Sampling-based RDD with unordered input should be INDETERMINATE >

[jira] [Resolved] (SPARK-22796) Add multiple column support to PySpark QuantileDiscretizer

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-22796. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25812

[jira] [Assigned] (SPARK-22796) Add multiple column support to PySpark QuantileDiscretizer

2019-09-18 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-22796: --- Assignee: Huaxin Gao > Add multiple column support to PySpark QuantileDiscretizer

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930827#comment-16930827 ] Liang-Chi Hsieh commented on SPARK-28927: - Regarding to AUC unstable issue, the nondeterministic

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930787#comment-16930787 ] Liang-Chi Hsieh commented on SPARK-26205: - [~cloud_fan]. I see now. Created SPARK-29100. >

[jira] [Assigned] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29100: --- Assignee: Liang-Chi Hsieh > Codegen with switch in InSet expression causes

[jira] [Updated] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-29100: Description: SPARK-26205 adds an optimization to InSet that generates Java switch

[jira] [Created] (SPARK-29100) Codegen with switch in InSet expression causes compilation error

2019-09-16 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29100: --- Summary: Codegen with switch in InSet expression causes compilation error Key: SPARK-29100 URL: https://issues.apache.org/jira/browse/SPARK-29100 Project:

[jira] [Comment Edited] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930658#comment-16930658 ] Liang-Chi Hsieh edited comment on SPARK-28927 at 9/16/19 3:36 PM: --

[jira] [Comment Edited] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930658#comment-16930658 ] Liang-Chi Hsieh edited comment on SPARK-28927 at 9/16/19 3:35 PM: --

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-16 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930658#comment-16930658 ] Liang-Chi Hsieh commented on SPARK-28927: - Because you are using 2.2.1,

[jira] [Assigned] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-14 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-28927: --- Assignee: Liang-Chi Hsieh > ArrayIndexOutOfBoundsException and Not-stable AUC

[jira] [Assigned] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-29042: --- Assignee: Liang-Chi Hsieh > Sampling-based RDD with unordered input should be

[jira] [Resolved] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-29042. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25751

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-13 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929334#comment-16929334 ] Liang-Chi Hsieh commented on SPARK-26205: - [~cloud_fan] I ran a simple test, seems no failure

[jira] [Commented] (SPARK-26205) Optimize InSet expression for bytes, shorts, ints, dates

2019-09-12 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928649#comment-16928649 ] Liang-Chi Hsieh commented on SPARK-26205: - Yeah, I will look at it. > Optimize InSet expression

[jira] [Created] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-10 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29042: --- Summary: Sampling-based RDD with unordered input should be INDETERMINATE Key: SPARK-29042 URL: https://issues.apache.org/jira/browse/SPARK-29042 Project: Spark

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-10 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926814#comment-16926814 ] Liang-Chi Hsieh commented on SPARK-28927: - Hi [~JerryHouse], do you use any non-deterministic

[jira] [Assigned] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2019-09-09 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-23265: --- Assignee: Huaxin Gao > Update multi-column error handling logic in

[jira] [Resolved] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2019-09-09 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-23265. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 20442

[jira] [Created] (SPARK-29013) Structurally equivalent subexpression elimination

2019-09-06 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-29013: --- Summary: Structurally equivalent subexpression elimination Key: SPARK-29013 URL: https://issues.apache.org/jira/browse/SPARK-29013 Project: Spark

[jira] [Updated] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28933: Fix Version/s: 3.0.0 > Reduce unnecessary shuffle in ALS when initializing factors >

[jira] [Commented] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920584#comment-16920584 ] Liang-Chi Hsieh commented on SPARK-28933: - This issue was resolved by 

[jira] [Resolved] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-28933. - Resolution: Resolved > Reduce unnecessary shuffle in ALS when initializing factors >

[jira] [Commented] (SPARK-28935) Document SQL metrics for Details for Query Plan

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920550#comment-16920550 ] Liang-Chi Hsieh commented on SPARK-28935: - Thanks! [~smilegator] It should be helpful. >

[jira] [Commented] (SPARK-28927) ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-09-01 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920480#comment-16920480 ] Liang-Chi Hsieh commented on SPARK-28927: - Does this only happen on 2.2.1? How about current

[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23519: Component/s: (was: Spark Core) > Create View Commands Fails with The view output

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920302#comment-16920302 ] Liang-Chi Hsieh commented on SPARK-23519: - This was closed and then reopened and fixed. The

[jira] [Updated] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-31 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23519: Labels: (was: bulk-closed) > Create View Commands Fails with The view output

[jira] [Commented] (SPARK-28935) Document SQL metrics for Details for Query Plan

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919975#comment-16919975 ] Liang-Chi Hsieh commented on SPARK-28935: - Thanks for pinging me! I will look into this. >

[jira] [Resolved] (SPARK-28926) CLONE - ArrayIndexOutOfBoundsException and Not-stable AUC metrics in ALS for datasets with 12 billion instances

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-28926. - Resolution: Duplicate I think this is duplicate to SPARK-28927. > CLONE -

[jira] [Assigned] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-08-30 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh reassigned SPARK-28933: --- Assignee: Liang-Chi Hsieh > Reduce unnecessary shuffle in ALS when initializing

[jira] [Created] (SPARK-28933) Reduce unnecessary shuffle in ALS when initializing factors

2019-08-30 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28933: --- Summary: Reduce unnecessary shuffle in ALS when initializing factors Key: SPARK-28933 URL: https://issues.apache.org/jira/browse/SPARK-28933 Project: Spark

[jira] [Created] (SPARK-28920) Set up java version for github workflow

2019-08-29 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28920: --- Summary: Set up java version for github workflow Key: SPARK-28920 URL: https://issues.apache.org/jira/browse/SPARK-28920 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-26 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915809#comment-16915809 ] Liang-Chi Hsieh commented on SPARK-23519: - I test with Hive 2.1. It doesn't support duplicate

[jira] [Resolved] (SPARK-25549) High level API to collect RDD statistics

2019-08-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-25549. - Resolution: Won't Fix > High level API to collect RDD statistics >

[jira] [Commented] (SPARK-25549) High level API to collect RDD statistics

2019-08-25 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915362#comment-16915362 ] Liang-Chi Hsieh commented on SPARK-25549: - Close this as it is not needed now. > High level API

[jira] [Created] (SPARK-28866) Persist item factors RDD when checkpointing in ALS

2019-08-25 Thread Liang-Chi Hsieh (Jira)
Liang-Chi Hsieh created SPARK-28866: --- Summary: Persist item factors RDD when checkpointing in ALS Key: SPARK-28866 URL: https://issues.apache.org/jira/browse/SPARK-28866 Project: Spark

[jira] [Commented] (SPARK-24666) Word2Vec generate infinity vectors when numIterations are large

2019-08-24 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914950#comment-16914950 ] Liang-Chi Hsieh commented on SPARK-24666: - I tried to run word2vec with Quora Question Pairs

[jira] [Commented] (SPARK-23519) Create View Commands Fails with The view output (col1,col1) contains duplicate column name

2019-08-22 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913915#comment-16913915 ] Liang-Chi Hsieh commented on SPARK-23519: - Thanks for pinging me. I am going on a flight soon.

[jira] [Commented] (SPARK-28672) [UDF] Duplicate function creation should not allow

2019-08-19 Thread Liang-Chi Hsieh (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911007#comment-16911007 ] Liang-Chi Hsieh commented on SPARK-28672: - Is there any rule in Hive regarding this? like

[jira] [Commented] (SPARK-28761) spark.driver.maxResultSize only applies to compressed data

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909420#comment-16909420 ] Liang-Chi Hsieh commented on SPARK-28761: - If you do it at SparkPlan.scala#L344, isn't it just

[jira] [Commented] (SPARK-28732) org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' when storing

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909409#comment-16909409 ] Liang-Chi Hsieh commented on SPARK-28732: - As {{count}} return type is LongType, I think it is

[jira] [Comment Edited] (SPARK-28732) org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' when st

2019-08-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909409#comment-16909409 ] Liang-Chi Hsieh edited comment on SPARK-28732 at 8/16/19 9:19 PM: -- As

[jira] [Created] (SPARK-28722) Change sequential label sorting in StringIndexer fit to parallel

2019-08-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28722: --- Summary: Change sequential label sorting in StringIndexer fit to parallel Key: SPARK-28722 URL: https://issues.apache.org/jira/browse/SPARK-28722 Project:

[jira] [Updated] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28652: Priority: Minor (was: Major) > spark.kubernetes.pyspark.pythonVersion is never passed to

[jira] [Updated] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28652: Issue Type: Test (was: Bug) > spark.kubernetes.pyspark.pythonVersion is never passed to

[jira] [Commented] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904743#comment-16904743 ] Liang-Chi Hsieh commented on SPARK-28652: - As existing tests don't explicitly check the Python

[jira] [Commented] (SPARK-28652) spark.kubernetes.pyspark.pythonVersion is never passed to executors

2019-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904729#comment-16904729 ] Liang-Chi Hsieh commented on SPARK-28652: - This looks interesting to me. I tried to look into

[jira] [Commented] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause

2019-08-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899788#comment-16899788 ] Liang-Chi Hsieh commented on SPARK-28422: - Thanks [~dongjoon]! > GROUPED_AGG pandas_udf

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889742#comment-16889742 ] Liang-Chi Hsieh commented on SPARK-24152: - Ok. I think it was fixed. > SparkR CRAN feasibility

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889671#comment-16889671 ] Liang-Chi Hsieh commented on SPARK-24152: - This CRAN issue is happening now, again. Emailed to

[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes

2019-07-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28441: Summary: PythonUDF used in correlated scalar subquery causes (was:

[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes UnsupportedOperationException

2019-07-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28441: Summary: PythonUDF used in correlated scalar subquery causes

[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes UnsupportedOperationException

2019-07-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28441: Priority: Major (was: Minor) > PythonUDF used in correlated scalar subquery causes >

[jira] [Commented] (SPARK-28288) Convert and port 'window.sql' into UDF test base

2019-07-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887707#comment-16887707 ] Liang-Chi Hsieh commented on SPARK-28288: - Those errors can be found in original window.sql.

[jira] [Updated] (SPARK-28365) Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Summary: Fallback locale to en_US in StopWordsRemover if system default locale isn't in

[jira] [Updated] (SPARK-28365) Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Summary: Set default locale param for StopWordsRemover to en_US if system default locale

[jira] [Updated] (SPARK-28365) Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Priority: Major (was: Minor) > Set default locale param for StopWordsRemover to en_US if

[jira] [Updated] (SPARK-28365) Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Component/s: (was: PySpark) > Set default locale param for StopWordsRemover to en_US

[jira] [Updated] (SPARK-28365) Set default locale param for StopWordsRemover to en_US if system default locale isn't in available locales in JVM

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Issue Type: Bug (was: Test) > Set default locale param for StopWordsRemover to en_US if

[jira] [Updated] (SPARK-28365) Set default locale for StopWordsRemover tests to prevent invalid locale error during test

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Component/s: (was: Tests) ML > Set default locale for

[jira] [Updated] (SPARK-28365) Set default locale for StopWordsRemover tests to prevent invalid locale error during test

2019-07-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-28365: Description: Because the local default locale isn't in available locales at {{Locale}},

[jira] [Created] (SPARK-28381) Upgraded version of Pyrolite to 4.30

2019-07-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28381: --- Summary: Upgraded version of Pyrolite to 4.30 Key: SPARK-28381 URL: https://issues.apache.org/jira/browse/SPARK-28381 Project: Spark Issue Type:

[jira] [Created] (SPARK-28378) Remove usage of cgi.escape

2019-07-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28378: --- Summary: Remove usage of cgi.escape Key: SPARK-28378 URL: https://issues.apache.org/jira/browse/SPARK-28378 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-28365) Set default locale for StopWordsRemover tests to prevent invalid locale error during test

2019-07-12 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28365: --- Summary: Set default locale for StopWordsRemover tests to prevent invalid locale error during test Key: SPARK-28365 URL: https://issues.apache.org/jira/browse/SPARK-28365

[jira] [Commented] (SPARK-28345) PythonUDF predicate should be able to pushdown to join

2019-07-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882712#comment-16882712 ] Liang-Chi Hsieh commented on SPARK-28345: - I found this when doing SPARK-28276. > PythonUDF

[jira] [Created] (SPARK-28345) PythonUDF predicate should be able to pushdown to join

2019-07-10 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28345: --- Summary: PythonUDF predicate should be able to pushdown to join Key: SPARK-28345 URL: https://issues.apache.org/jira/browse/SPARK-28345 Project: Spark

[jira] [Commented] (SPARK-28323) PythonUDF should be able to use in join condition

2019-07-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881717#comment-16881717 ] Liang-Chi Hsieh commented on SPARK-28323: - I found this bug when doing SPARK-28276. > PythonUDF

[jira] [Created] (SPARK-28323) PythonUDF should be able to use in join condition

2019-07-09 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28323: --- Summary: PythonUDF should be able to use in join condition Key: SPARK-28323 URL: https://issues.apache.org/jira/browse/SPARK-28323 Project: Spark

[jira] [Commented] (SPARK-28276) Convert and port 'cross-join.sql' into UDF test base

2019-07-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881033#comment-16881033 ] Liang-Chi Hsieh commented on SPARK-28276: - Will look into this. > Convert and port

[jira] [Comment Edited] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877497#comment-16877497 ] Liang-Chi Hsieh edited comment on SPARK-24152 at 7/3/19 5:19 AM: -

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877497#comment-16877497 ] Liang-Chi Hsieh commented on SPARK-24152: - Received reply that is cleaned up. > SparkR CRAN

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877401#comment-16877401 ] Liang-Chi Hsieh commented on SPARK-24152: - I noticed that this issue happens now again.

[jira] [Created] (SPARK-28215) as_tibble was removed from Arrow R API

2019-06-29 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28215: --- Summary: as_tibble was removed from Arrow R API Key: SPARK-28215 URL: https://issues.apache.org/jira/browse/SPARK-28215 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-22340) pyspark setJobGroup doesn't match java threads

2019-06-25 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872852#comment-16872852 ] Liang-Chi Hsieh commented on SPARK-22340: - [~hyukjin.kwon] Should we reopen this as you are open

[jira] [Commented] (SPARK-28079) CSV fails to detect corrupt record unless "columnNameOfCorruptRecord" is manually added to the schema

2019-06-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868577#comment-16868577 ] Liang-Chi Hsieh commented on SPARK-28079: - {{columnNameOfCorruptRecord}} currently applied only

[jira] [Commented] (SPARK-27946) Hive DDL to Spark DDL conversion USING "show create table"

2019-06-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867771#comment-16867771 ] Liang-Chi Hsieh commented on SPARK-27946: - [~smilegator] Thanks for pinging me. I'd like to do,

[jira] [Commented] (SPARK-28079) CSV fails to detect corrupt record unless "columnNameOfCorruptRecord" is manually added to the schema

2019-06-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866777#comment-16866777 ] Liang-Chi Hsieh commented on SPARK-28079: - Isn't it the expected behavior as documented in

[jira] [Comment Edited] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865714#comment-16865714 ] Liang-Chi Hsieh edited comment on SPARK-28058 at 6/17/19 3:59 PM: --

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865714#comment-16865714 ] Liang-Chi Hsieh commented on SPARK-28058: - [~hyukjin.kwon] Do you mean this is suspect to be a

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865695#comment-16865695 ] Liang-Chi Hsieh commented on SPARK-28058: - [~stwhit] Thanks for letting us know that! Although

[jira] [Created] (SPARK-28082) Add a note to DROPMALFORMED mode of CSV for column pruning

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28082: --- Summary: Add a note to DROPMALFORMED mode of CSV for column pruning Key: SPARK-28082 URL: https://issues.apache.org/jira/browse/SPARK-28082 Project: Spark

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865664#comment-16865664 ] Liang-Chi Hsieh commented on SPARK-28058: - Although this isn't a bug, I think it might be worth

[jira] [Commented] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865650#comment-16865650 ] Liang-Chi Hsieh commented on SPARK-28058: - This is due to CSV parser column pruning. You can

[jira] [Commented] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-16 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865006#comment-16865006 ] Liang-Chi Hsieh commented on SPARK-28054: - I tested on Hive, the query works. Btw, the issue is

[jira] [Commented] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864189#comment-16864189 ] Liang-Chi Hsieh commented on SPARK-28054: - Is this query working on Hive? > Unable to insert

[jira] [Commented] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-14 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864107#comment-16864107 ] Liang-Chi Hsieh commented on SPARK-28043: - To make duplicate JSON keys work, I think about it

[jira] [Commented] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863680#comment-16863680 ] Liang-Chi Hsieh commented on SPARK-28043: - I tried to look around that, like

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863180#comment-16863180 ] Liang-Chi Hsieh commented on SPARK-28006: - I'm curious about two questions: Can we use pandas

[jira] [Commented] (SPARK-27966) input_file_name empty when listing files in parallel

2019-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863030#comment-16863030 ] Liang-Chi Hsieh commented on SPARK-27966: - I can't see where input_file_name is, from the

  1   2   3   4   5   6   7   8   9   10   >