[jira] [Assigned] (SPARK-23315) failed to get output from canonicalized data source v2 related plans

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23315: Assignee: Wenchen Fan (was: Apache Spark) > failed to get output from canonicalized data

[jira] [Assigned] (SPARK-23315) failed to get output from canonicalized data source v2 related plans

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23315: Assignee: Apache Spark (was: Wenchen Fan) > failed to get output from canonicalized data

[jira] [Commented] (SPARK-23315) failed to get output from canonicalized data source v2 related plans

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349927#comment-16349927 ] Apache Spark commented on SPARK-23315: -- User 'cloud-fan' has created a pull request for this issue:

[jira] [Created] (SPARK-23315) failed to get output from canonicalized data source v2 related plans

2018-02-01 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-23315: --- Summary: failed to get output from canonicalized data source v2 related plans Key: SPARK-23315 URL: https://issues.apache.org/jira/browse/SPARK-23315 Project: Spark

[jira] [Updated] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-23314: - Description: Under  SPARK-22216 When testing pandas_udf on group bys, I saw this error with the

[jira] [Updated] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-23314: - Description: Under  SPARK-22216 When testing pandas_udf on group bys, I saw this error with the

[jira] [Updated] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-23314: - Description: Under  SPARK-22216 When testing pandas_udf on group bys, I saw this error with the

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349899#comment-16349899 ] Felix Cheung commented on SPARK-23314: -- [~icexelloss] [~bryanc] > Pandas grouped udf on dataset

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349898#comment-16349898 ] Felix Cheung commented on SPARK-23314: -- log [Stage

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349897#comment-16349897 ] Felix Cheung commented on SPARK-23314: -- code   >>> flights = spark.read.option("inferSchema",

[jira] [Updated] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-23314: - Environment: (was: data sample

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349896#comment-16349896 ] Felix Cheung commented on SPARK-23314: -- data sample

[jira] [Updated] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-23314: - Description: Under  SPARK-22216 When testing pandas_udf on group bys, I saw this error with the

[jira] [Created] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-01 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-23314: Summary: Pandas grouped udf on dataset with timestamp column error Key: SPARK-23314 URL: https://issues.apache.org/jira/browse/SPARK-23314 Project: Spark

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349885#comment-16349885 ] Dongjoon Hyun commented on SPARK-23309: --- We are still investigating this, but is this a regression

[jira] [Assigned] (SPARK-23313) Add a migration guide for ORC

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23313: Assignee: Apache Spark > Add a migration guide for ORC > - >

[jira] [Assigned] (SPARK-23313) Add a migration guide for ORC

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23313: Assignee: (was: Apache Spark) > Add a migration guide for ORC >

[jira] [Commented] (SPARK-23313) Add a migration guide for ORC

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349855#comment-16349855 ] Apache Spark commented on SPARK-23313: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Created] (SPARK-23313) Add a migration guide for ORC

2018-02-01 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-23313: - Summary: Add a migration guide for ORC Key: SPARK-23313 URL: https://issues.apache.org/jira/browse/SPARK-23313 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-23313) Add a migration guide for ORC

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23313: -- Issue Type: Documentation (was: Bug) > Add a migration guide for ORC >

[jira] [Commented] (SPARK-23312) add a config to turn off vectorized cache reader

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349826#comment-16349826 ] Apache Spark commented on SPARK-23312: -- User 'cloud-fan' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23312) add a config to turn off vectorized cache reader

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23312: Assignee: Wenchen Fan (was: Apache Spark) > add a config to turn off vectorized cache

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349827#comment-16349827 ] Wenchen Fan commented on SPARK-23309: - I propose to add a config to turn off vectorized cache reader:

[jira] [Assigned] (SPARK-23312) add a config to turn off vectorized cache reader

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23312: Assignee: Apache Spark (was: Wenchen Fan) > add a config to turn off vectorized cache

[jira] [Created] (SPARK-23312) add a config to turn off vectorized cache reader

2018-02-01 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-23312: --- Summary: add a config to turn off vectorized cache reader Key: SPARK-23312 URL: https://issues.apache.org/jira/browse/SPARK-23312 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23290) inadvertent change in handling of DateType when converting to pandas dataframe

2018-02-01 Thread Takuya Ueshin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349796#comment-16349796 ] Takuya Ueshin commented on SPARK-23290: --- Thanks for the report! I'm afraid I couldn't figure out

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349770#comment-16349770 ] Wenchen Fan commented on SPARK-23309: - We need to know the schema of your cached data to figure out

[jira] [Assigned] (SPARK-23306) Race condition in TaskMemoryManager

2018-02-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-23306: --- Assignee: Zhan Zhang > Race condition in TaskMemoryManager >

[jira] [Resolved] (SPARK-23306) Race condition in TaskMemoryManager

2018-02-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23306. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20480

[jira] [Resolved] (SPARK-23020) Re-enable Flaky Test: org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher

2018-02-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23020. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20462

[jira] [Commented] (SPARK-23311) add FilterFunction test case for test CombineTypedFilters

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349701#comment-16349701 ] Apache Spark commented on SPARK-23311: -- User 'heary-cao' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23311) add FilterFunction test case for test CombineTypedFilters

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23311: Assignee: Apache Spark > add FilterFunction test case for test CombineTypedFilters >

[jira] [Assigned] (SPARK-23311) add FilterFunction test case for test CombineTypedFilters

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23311: Assignee: (was: Apache Spark) > add FilterFunction test case for test

[jira] [Created] (SPARK-23311) add FilterFunction test case for test CombineTypedFilters

2018-02-01 Thread caoxuewen (JIRA)
caoxuewen created SPARK-23311: - Summary: add FilterFunction test case for test CombineTypedFilters Key: SPARK-23311 URL: https://issues.apache.org/jira/browse/SPARK-23311 Project: Spark Issue

[jira] [Resolved] (SPARK-23284) Document several get API of ColumnVector's behavior when accessing null slot

2018-02-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23284. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20455

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-01 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349679#comment-16349679 ] Yin Huai commented on SPARK-23310: -- [~sitalke...@gmail.com] We found that the commit for SPARK-21113

[jira] [Assigned] (SPARK-23284) Document several get API of ColumnVector's behavior when accessing null slot

2018-02-01 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-23284: --- Assignee: Liang-Chi Hsieh > Document several get API of ColumnVector's behavior when

[jira] [Updated] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-01 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-23310: - Description: While running all TPC-DS queries with SF set to 1000, we noticed that Q95

[jira] [Created] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-01 Thread Yin Huai (JIRA)
Yin Huai created SPARK-23310: Summary: Perf regression introduced by SPARK-21113 Key: SPARK-23310 URL: https://issues.apache.org/jira/browse/SPARK-23310 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-23297) Spark job is finished but the stage process is error

2018-02-01 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23297: -- Affects Version/s: 2.1.2 > Spark job is finished but the stage process is error >

[jira] [Commented] (SPARK-23297) Spark job is finished but the stage process is error

2018-02-01 Thread shining (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349655#comment-16349655 ] shining commented on SPARK-23297: - I also encounter this problem.   It looks the job  still running at

[jira] [Commented] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

2018-02-01 Thread Dilip Biswal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349649#comment-16349649 ] Dilip Biswal commented on SPARK-23271: -- [~hyukjin.kwon] I took a look at this. To the best of my

[jira] [Commented] (SPARK-23297) Spark job is finished but the stage process is error

2018-02-01 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349638#comment-16349638 ] KaiXinXIaoLei commented on SPARK-23297: --- Job is finished. and the number of  

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349618#comment-16349618 ] Dongjoon Hyun commented on SPARK-23309: --- Yep. I'll make a PR for migration guide. For the conf

[jira] [Resolved] (SPARK-23293) data source v2 self join fails

2018-02-01 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal resolved SPARK-23293. Resolution: Fixed Fix Version/s: 2.3.0 > data source v2 self join fails >

[jira] [Updated] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal updated SPARK-23309: --- Target Version/s: 2.3.0 > Spark 2.3 cached query performance 20-30% worse then spark 2.2 >

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349606#comment-16349606 ] Xiao Li edited comment on SPARK-23309 at 2/2/18 12:54 AM: -- [~dongjoon] We need

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349606#comment-16349606 ] Xiao Li commented on SPARK-23309: - [~dongjoon] We need to document it in the migration guide. Basically,

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349605#comment-16349605 ] Xiao Li commented on SPARK-23309: - What is the data type of `something`? > Spark 2.3 cached query

[jira] [Commented] (SPARK-23235) Add executor Threaddump to api

2018-02-01 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349604#comment-16349604 ] Saisai Shao commented on SPARK-23235: - There's a new similar Jira (SPARK-23206) about adding and

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349588#comment-16349588 ] Dongjoon Hyun commented on SPARK-23309: --- Thank you for confirming for the non-cache case,

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349576#comment-16349576 ] Xiao Li commented on SPARK-23309: - Just to confirm it. The cached data only has one column whose type is 

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349555#comment-16349555 ] Thomas Graves commented on SPARK-23304: --- I don't have any hive tables backed by parquet to compare

[jira] [Comment Edited] (SPARK-10063) Remove DirectParquetOutputCommitter

2018-02-01 Thread Henrique dos Santos Goulart (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349550#comment-16349550 ] Henrique dos Santos Goulart edited comment on SPARK-10063 at 2/2/18 12:11 AM:

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349553#comment-16349553 ] Thomas Graves commented on SPARK-23309: --- [~dongjoon] is there any native way with the native hive

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349551#comment-16349551 ] Thomas Graves commented on SPARK-23309: --- seeing the same time difference after adding in the   

[jira] [Commented] (SPARK-10063) Remove DirectParquetOutputCommitter

2018-02-01 Thread Henrique dos Santos Goulart (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349550#comment-16349550 ] Henrique dos Santos Goulart commented on SPARK-10063: - There is any alternative right

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349543#comment-16349543 ] Sean Owen commented on SPARK-23308: --- Sure, but what could you meaningfully do with

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349532#comment-16349532 ] Dongjoon Hyun commented on SPARK-23309: --- According to the issue title, there is no regression

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349516#comment-16349516 ] Xiao Li commented on SPARK-23309: - [~tgraves] We are just hoping the new Hive reader does not introduce a

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349511#comment-16349511 ] Xiao Li commented on SPARK-23304: - Based on the new plan, it sounds like the plan is not changed. Could

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349510#comment-16349510 ] Dongjoon Hyun commented on SPARK-23309: --- Thank you for reporting this, [~tgraves]. In addition to

[jira] [Comment Edited] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349498#comment-16349498 ] Dongjoon Hyun edited comment on SPARK-23304 at 2/1/18 11:33 PM: I updated

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349498#comment-16349498 ] Dongjoon Hyun commented on SPARK-23304: --- I updated the affected version according to the latest

[jira] [Updated] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23304: -- Affects Version/s: 2.2.1 > Spark SQL coalesce() against hive not working >

[jira] [Assigned] (SPARK-23296) Diagnostics message for user code exceptions should include the stacktrace

2018-02-01 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-23296: -- Assignee: Gera Shegalov > Diagnostics message for user code exceptions should include

[jira] [Resolved] (SPARK-23296) Diagnostics message for user code exceptions should include the stacktrace

2018-02-01 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-23296. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20470

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349483#comment-16349483 ] Thomas Graves commented on SPARK-23309: --- sure, I can also run with the  --conf

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349480#comment-16349480 ] Thomas Graves commented on SPARK-23304: --- I just ran the query (show()) and saw the # of partitions. 

[jira] [Updated] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-23304: -- Attachment: spark23_oldorc_explain_convermetastoreorcfalse.txt > Spark SQL coalesce() against

[jira] [Updated] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-23309: Priority: Blocker (was: Major) > Spark 2.3 cached query performance 20-30% worse then spark 2.2 >

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349470#comment-16349470 ] Xiao Li commented on SPARK-23309: - [~tgraves] Could you first run count before you run the show?

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349461#comment-16349461 ] Xiao Li commented on SPARK-23304: - That is fine. Obviously, at least, we need to submit a PR to document

[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors

2018-02-01 Thread Dan Meany (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349440#comment-16349440 ] Dan Meany commented on SPARK-19371: --- We have had this issue on many occasions and nothing I tried

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349446#comment-16349446 ] Thomas Graves commented on SPARK-23304: --- It still seems like a bug to me since the coalesce isn't

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349444#comment-16349444 ] Thomas Graves commented on SPARK-23304: --- [~smilegator] just to make sure you saw my comment above,

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349436#comment-16349436 ] Xiao Li commented on SPARK-23304: - In this release, we also made a change in the default of another

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349431#comment-16349431 ] Thomas Graves commented on SPARK-23309: --- I'm curious if anyone else is seeing the same behavior? 

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349430#comment-16349430 ] Thomas Graves commented on SPARK-23304: ---   I filed Jira

[jira] [Created] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-01 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-23309: - Summary: Spark 2.3 cached query performance 20-30% worse then spark 2.2 Key: SPARK-23309 URL: https://issues.apache.org/jira/browse/SPARK-23309 Project: Spark

[jira] [Commented] (SPARK-23294) Spark Streaming + Rate source + Console Sink : Receiver MaxRate is violated

2018-02-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349411#comment-16349411 ] Shixiong Zhu commented on SPARK-23294: -- [~rmatte] the configurations you posted in the ticket is for

[jira] [Resolved] (SPARK-23294) Spark Streaming + Rate source + Console Sink : Receiver MaxRate is violated

2018-02-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-23294. -- Resolution: Not A Problem > Spark Streaming + Rate source + Console Sink : Receiver MaxRate is

[jira] [Assigned] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23307: Assignee: Shixiong Zhu (was: Apache Spark) > Spark UI should sort jobs/stages with the

[jira] [Created] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-01 Thread JIRA
Márcio Furlani Carmona created SPARK-23308: -- Summary: ignoreCorruptFiles should not ignore retryable IOException Key: SPARK-23308 URL: https://issues.apache.org/jira/browse/SPARK-23308

[jira] [Commented] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349382#comment-16349382 ] Apache Spark commented on SPARK-23307: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23307: Assignee: Apache Spark (was: Shixiong Zhu) > Spark UI should sort jobs/stages with the

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349369#comment-16349369 ] Thomas Graves commented on SPARK-23304: --- Note I've removed some of the columns from the output, if

[jira] [Updated] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-23304: -- Attachment: spark23_oldorc_explain.txt spark22_oldorc_explain.txt > Spark SQL

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349367#comment-16349367 ] Thomas Graves commented on SPARK-23304: --- ok I've attached 2 files one with spark 2.3 and one with

[jira] [Commented] (SPARK-23284) Document several get API of ColumnVector's behavior when accessing null slot

2018-02-01 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349353#comment-16349353 ] Sameer Agarwal commented on SPARK-23284: Thanks, given the other open blockers, we should have

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349346#comment-16349346 ] Sameer Agarwal commented on SPARK-23304: Also, is there a JIRA/repro for the caching issue you

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349340#comment-16349340 ] Xiao Li commented on SPARK-23304: - I do not think our native ORC reader respects

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349339#comment-16349339 ] Xiao Li commented on SPARK-23304: - Hi, [~tgraves], could you change the two SQLConf `spark.sql.orc.impl`

[jira] [Commented] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2018-02-01 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349325#comment-16349325 ] Yin Huai commented on SPARK-12297: -- [~zi] has this issue got resolved in Hive? I see HIVE-12767 is still

[jira] [Commented] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349323#comment-16349323 ] Sameer Agarwal commented on SPARK-23307: Bumping this to a blocker for 2.3 > Spark UI should

[jira] [Updated] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal updated SPARK-23307: --- Priority: Blocker (was: Major) > Spark UI should sort jobs/stages with the completed

[jira] [Updated] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal updated SPARK-23307: --- Target Version/s: 2.3.0 > Spark UI should sort jobs/stages with the completed timestamp

[jira] [Updated] (SPARK-23292) python tests related to pandas are skipped

2018-02-01 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-23292: - Priority: Critical (was: Blocker) > python tests related to pandas are skipped >

[jira] [Assigned] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-23307: Assignee: Shixiong Zhu > Spark UI should sort jobs/stages with the completed timestamp

[jira] [Updated] (SPARK-23307) Spark UI should sort jobs/stages with the completed timestamp before cleaning up them

2018-02-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-23307: - Description: When you have a long running job, it may be deleted from UI quickly when it

  1   2   >