[jira] [Resolved] (SPARK-27539) Fix inaccurate aggregate outputRows estimation with column containing null values

2019-04-22 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27539. --- Resolution: Fixed Assignee: peng bo Fix Version/s: 2.4.3

[jira] [Updated] (SPARK-27539) Fix inaccurate aggregate outputRows estimation with column containing null values

2019-04-22 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27539: -- Summary: Fix inaccurate aggregate outputRows estimation with column containing null values

[jira] [Commented] (SPARK-27505) autoBroadcastJoinThreshold including bigger table

2019-04-22 Thread Mike Chan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823637#comment-16823637 ] Mike Chan commented on SPARK-27505: --- You mind sharing any info on self-reproducer? Tried to google

[jira] [Commented] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823631#comment-16823631 ] Hyukjin Kwon commented on SPARK-18673: -- This is blocked by SPARK-23710 > Dataframes doesn't work

[jira] [Assigned] (SPARK-27535) Date and timestamp JSON benchmarks

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-27535: Assignee: Maxim Gekk > Date and timestamp JSON benchmarks >

subscribe

2019-04-22 Thread Bowen Li

[jira] [Resolved] (SPARK-27535) Date and timestamp JSON benchmarks

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27535. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24430

[jira] [Resolved] (SPARK-27533) Date and timestamp CSV benchmarks

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27533. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24429

[jira] [Assigned] (SPARK-27533) Date and timestamp CSV benchmarks

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-27533: Assignee: Maxim Gekk > Date and timestamp CSV benchmarks >

[jira] [Assigned] (SPARK-27528) Use Parquet logical type TIMESTAMP_MICROS by default

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-27528: Assignee: Maxim Gekk > Use Parquet logical type TIMESTAMP_MICROS by default >

[jira] [Resolved] (SPARK-27528) Use Parquet logical type TIMESTAMP_MICROS by default

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27528. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24425

[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2019-04-22 Thread zhoukang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823613#comment-16823613 ] zhoukang commented on SPARK-25299: -- nice work! Really looking forward thanks [~yifeih] > Use remote

[jira] [Updated] (SPARK-27543) Support getRequiredJars and getRequiredFiles APIs for Hive UDFs

2019-04-22 Thread Sergey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated SPARK-27543: --- Issue Type: Improvement (was: Bug) > Support getRequiredJars and getRequiredFiles APIs for Hive UDFs >

[jira] [Created] (SPARK-27543) Support getRequiredJars and getRequiredFiles APIs for Hive UDFs

2019-04-22 Thread Sergey (JIRA)
Sergey created SPARK-27543: -- Summary: Support getRequiredJars and getRequiredFiles APIs for Hive UDFs Key: SPARK-27543 URL: https://issues.apache.org/jira/browse/SPARK-27543 Project: Spark Issue

[jira] [Updated] (SPARK-23773) JacksonGenerator does not include keys that have null value for StructTypes

2019-04-22 Thread Sergey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated SPARK-23773: --- Issue Type: Bug (was: Improvement) > JacksonGenerator does not include keys that have null value for

[jira] [Updated] (SPARK-23773) JacksonGenerator does not include keys that have null value for StructTypes

2019-04-22 Thread Sergey (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey updated SPARK-23773: --- Issue Type: Improvement (was: Bug) > JacksonGenerator does not include keys that have null value for

[jira] [Created] (SPARK-27542) SparkHadoopWriter doesn't set call setWorkOutputPath, causing NPEs for some legacy OutputFormats

2019-04-22 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27542: -- Summary: SparkHadoopWriter doesn't set call setWorkOutputPath, causing NPEs for some legacy OutputFormats Key: SPARK-27542 URL: https://issues.apache.org/jira/browse/SPARK-27542

[jira] [Updated] (SPARK-27542) SparkHadoopWriter doesn't set call setWorkOutputPath, causing NPEs when using certain legacy OutputFormats

2019-04-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27542: --- Summary: SparkHadoopWriter doesn't set call setWorkOutputPath, causing NPEs when using certain

[jira] [Commented] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

2019-04-22 Thread KaiXu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823462#comment-16823462 ] KaiXu commented on SPARK-18673: --- I'm OOO, please expect slow email response, sorry for the inconvenience.

[jira] [Commented] (SPARK-18673) Dataframes doesn't work on Hadoop 3.x; Hive rejects Hadoop version

2019-04-22 Thread shanyu zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823458#comment-16823458 ] shanyu zhao commented on SPARK-18673: - Ping. What is the verdict here for users want to use Spark

[jira] [Assigned] (SPARK-27534) Do not load `content` column in binary data source if it is not selected

2019-04-22 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng reassigned SPARK-27534: - Assignee: Weichen Xu > Do not load `content` column in binary data source if it is not

[jira] [Resolved] (SPARK-27531) Improve explain output of describe table command to show the inputs to the command.

2019-04-22 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27531. --- Resolution: Fixed Assignee: Dilip Biswal Fix Version/s: 3.0.0 This is

[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2019-04-22 Thread Yifei Huang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823317#comment-16823317 ] Yifei Huang commented on SPARK-25299: - You can follow the API refactor work here: 

[jira] [Resolved] (SPARK-27392) TestHive test tables should be placed in shared test state, not per session

2019-04-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-27392. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24302

[jira] [Assigned] (SPARK-27392) TestHive test tables should be placed in shared test state, not per session

2019-04-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-27392: - Assignee: Eric Liang > TestHive test tables should be placed in shared test state, not per

[jira] [Closed] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.6

2019-04-22 Thread shane knapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp closed SPARK-25079. --- i kept an eye on things over the weekend, and everything seemed to be working great! > [PYTHON]

[jira] [Updated] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.6

2019-04-22 Thread shane knapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp updated SPARK-25079: Description: for the impending arrow upgrade (https://issues.apache.org/jira/browse/SPARK-23874) 

[jira] [Created] (SPARK-27541) Refresh class definitions for jars added via addJar()

2019-04-22 Thread Naved Alam (JIRA)
Naved Alam created SPARK-27541: -- Summary: Refresh class definitions for jars added via addJar() Key: SPARK-27541 URL: https://issues.apache.org/jira/browse/SPARK-27541 Project: Spark Issue

[jira] [Commented] (SPARK-27367) Faster RoaringBitmap Serialization with v0.8.0

2019-04-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823178#comment-16823178 ] Liang-Chi Hsieh commented on SPARK-27367: - So I think the new serde API has performance

[jira] [Commented] (SPARK-27540) Add 'meanAveragePrecision_at_k' metric to RankingMetrics

2019-04-22 Thread Tarush Grover (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823163#comment-16823163 ] Tarush Grover commented on SPARK-27540: --- [~tuananh238] I am working on this issue. Please assign

[jira] [Commented] (SPARK-27337) QueryExecutionListener never cleans up listeners from the bus after SparkSession is cleared

2019-04-22 Thread Vinoo Ganesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823120#comment-16823120 ] Vinoo Ganesh commented on SPARK-27337: -- Hey [~cltlfcjin] - the thread is called Closing a

[jira] [Commented] (SPARK-27512) Decimal parsing leads to unexpected type inference

2019-04-22 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823115#comment-16823115 ] koert kuipers commented on SPARK-27512: --- i agree it is better than having two different decimal

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823097#comment-16823097 ] Thomas Graves commented on SPARK-27396: --- thanks for the questions and commenting, please also vote

[jira] [Resolved] (SPARK-27438) Increase precision of to_timestamp

2019-04-22 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27438. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24420

[jira] [Assigned] (SPARK-27438) Increase precision of to_timestamp

2019-04-22 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-27438: --- Assignee: Maxim Gekk > Increase precision of to_timestamp >

[jira] [Comment Edited] (SPARK-10925) Exception when joining DataFrames

2019-04-22 Thread Rafik (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823045#comment-16823045 ] Rafik edited comment on SPARK-10925 at 4/22/19 11:50 AM: - I managed to solve

[jira] [Commented] (SPARK-10925) Exception when joining DataFrames

2019-04-22 Thread Rafik (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823045#comment-16823045 ] Rafik commented on SPARK-10925: --- I managed to solve this by renaming the column after group by to

[jira] [Commented] (SPARK-27512) Decimal parsing leads to unexpected type inference

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823028#comment-16823028 ] Hyukjin Kwon commented on SPARK-27512: -- I see a behaviour change. Yes, looks for schema inference

[jira] [Reopened] (SPARK-27512) Decimal parsing leads to unexpected type inference

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-27512: -- > Decimal parsing leads to unexpected type inference >

[jira] [Commented] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2019-04-22 Thread Mahima Khatri (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823026#comment-16823026 ] Mahima Khatri commented on SPARK-27298: --- Yes,I can test this .Will surely let you know the

[jira] [Resolved] (SPARK-26703) Hive record writer will always depends on parquet-1.6 writer should fix it

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26703. -- Resolution: Duplicate > Hive record writer will always depends on parquet-1.6 writer should

[jira] [Updated] (SPARK-27337) QueryExecutionListener never cleans up listeners from the bus after SparkSession is cleared

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27337: - Priority: Major (was: Critical) > QueryExecutionListener never cleans up listeners from the

[jira] [Commented] (SPARK-27433) Spark Structured Streaming left outer joins returns outer nulls for already matched rows

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823005#comment-16823005 ] Hyukjin Kwon commented on SPARK-27433: -- See SPARK-26154. > Spark Structured Streaming left outer

[jira] [Updated] (SPARK-27337) QueryExecutionListener never cleans up listeners from the bus after SparkSession is cleared

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27337: - Component/s: (was: Spark Core) SQL > QueryExecutionListener never cleans

[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2019-04-22 Thread zhoukang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822997#comment-16822997 ] zhoukang commented on SPARK-25299: -- is there any progress of this task? [~yifeih] [~mcheah] > Use

[jira] [Commented] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822990#comment-16822990 ] Hyukjin Kwon commented on SPARK-27298: -- Will you be able to test it against Spark 2.4.1 too? >

[jira] [Created] (SPARK-27540) Add 'meanAveragePrecision_at_k' metric to RankingMetrics

2019-04-22 Thread Pham Nguyen Tuan Anh (JIRA)
Pham Nguyen Tuan Anh created SPARK-27540: Summary: Add 'meanAveragePrecision_at_k' metric to RankingMetrics Key: SPARK-27540 URL: https://issues.apache.org/jira/browse/SPARK-27540 Project:

[jira] [Resolved] (SPARK-19860) DataFrame join get conflict error if two frames has a same name column.

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-19860. -- Resolution: Incomplete I am leaving this resolved for the lack of information > DataFrame

[jira] [Commented] (SPARK-19860) DataFrame join get conflict error if two frames has a same name column.

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822984#comment-16822984 ] Hyukjin Kwon commented on SPARK-19860: -- Does the size of data matter to reproduce this issue, or

[jira] [Commented] (SPARK-27335) cannot collect() from Correlation.corr

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822982#comment-16822982 ] Hyukjin Kwon commented on SPARK-27335: -- Can you post some steps as code block? Otherwise, looks no

[jira] [Updated] (SPARK-27539) Inaccurate aggregate outputRows estimation with column contains null value

2019-04-22 Thread peng bo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] peng bo updated SPARK-27539: Summary: Inaccurate aggregate outputRows estimation with column contains null value (was: Inaccurate

[jira] [Updated] (SPARK-27539) Inaccurate aggregate outputRows estimation with null value column

2019-04-22 Thread peng bo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] peng bo updated SPARK-27539: Description: This issue is follow up of [https://github.com/apache/spark/pull/24286]. As [~smilegator]

[jira] [Created] (SPARK-27539) Inaccurate aggregate outputRows estimation with null value column

2019-04-22 Thread peng bo (JIRA)
peng bo created SPARK-27539: --- Summary: Inaccurate aggregate outputRows estimation with null value column Key: SPARK-27539 URL: https://issues.apache.org/jira/browse/SPARK-27539 Project: Spark

[jira] [Assigned] (SPARK-27522) Test migration from INT96 to TIMESTAMP_MICROS in parquet

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-27522: Assignee: Maxim Gekk > Test migration from INT96 to TIMESTAMP_MICROS in parquet >

[jira] [Resolved] (SPARK-27522) Test migration from INT96 to TIMESTAMP_MICROS in parquet

2019-04-22 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27522. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24417

[jira] [Commented] (SPARK-13263) SQL generation support for tablesample

2019-04-22 Thread angerszhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822891#comment-16822891 ] angerszhu commented on SPARK-13263: --- [~Tagar]  I make some change in Spark SQL's ASTBuild, can