[jira] [Commented] (SPARK-16196) Optimize in-memory scan performance using ColumnarBatches

2018-09-11 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611633#comment-16611633 ] Kazuaki Ishizaki commented on SPARK-16196: -- [~cloud_fan] This PR in the Jira entry proposes two

[jira] [Commented] (SPARK-25412) FeatureHasher would change the value of output feature

2018-09-11 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611628#comment-16611628 ] Vincent commented on SPARK-25412: - [~nick.pentre...@gmail.com] thanks. > FeatureHasher would change the

[jira] [Created] (SPARK-25412) FeatureHasher would change the value of output feature

2018-09-11 Thread Vincent (JIRA)
Vincent created SPARK-25412: --- Summary: FeatureHasher would change the value of output feature Key: SPARK-25412 URL: https://issues.apache.org/jira/browse/SPARK-25412 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-25380) Generated plans occupy over 50% of Spark driver memory

2018-09-11 Thread Michael Spector (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611606#comment-16611606 ] Michael Spector commented on SPARK-25380: - [~vanzin] Here's the breakdown: !Screen Shot

[jira] [Updated] (SPARK-25380) Generated plans occupy over 50% of Spark driver memory

2018-09-11 Thread Michael Spector (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Spector updated SPARK-25380: Attachment: Screen Shot 2018-09-12 at 8.20.05.png > Generated plans occupy over 50% of

[jira] [Resolved] (SPARK-25385) Upgrade jackson version to 2.7.8

2018-09-11 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-25385. - Resolution: Won't Fix > Upgrade jackson version to 2.7.8 > > >

[jira] [Resolved] (SPARK-25410) Spark executor on YARN does not include memoryOverhead when starting an ExecutorRunnable

2018-09-11 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-25410. Resolution: Not A Bug bq. This means that the amount of memoryOverhead will not be used

[jira] [Commented] (SPARK-8000) SQLContext.read.load() should be able to auto-detect input data

2018-09-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611541#comment-16611541 ] Hyukjin Kwon commented on SPARK-8000: - Yup, it's still a good to do. > SQLContext.read.load() should

[jira] [Commented] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611538#comment-16611538 ] Hyukjin Kwon commented on SPARK-25378: -- If there's a simple way to fix, it might be okay but still

[jira] [Commented] (SPARK-25396) Read array of JSON objects via an Iterator

2018-09-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611537#comment-16611537 ] Hyukjin Kwon commented on SPARK-25396: -- Yea, I postponed the closing thing for the try I made at

[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception

2018-09-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611535#comment-16611535 ] Liang-Chi Hsieh commented on SPARK-25271: - Yeah, looks like after some changes, this kind of

[jira] [Assigned] (SPARK-23483) Feature parity for Python vs Scala APIs

2018-09-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-23483: Assignee: Huaxin Gao > Feature parity for Python vs Scala APIs >

[jira] [Commented] (SPARK-23483) Feature parity for Python vs Scala APIs

2018-09-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611528#comment-16611528 ] Hyukjin Kwon commented on SPARK-23483: -- I am also assigning this to [~huaxingao] since she's been

[jira] [Resolved] (SPARK-23483) Feature parity for Python vs Scala APIs

2018-09-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23483. -- Resolution: Done > Feature parity for Python vs Scala APIs >

[jira] [Reopened] (SPARK-23483) Feature parity for Python vs Scala APIs

2018-09-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-23483: -- > Feature parity for Python vs Scala APIs > --- > >

[jira] [Resolved] (SPARK-23483) Feature parity for Python vs Scala APIs

2018-09-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23483. -- Resolution: Fixed Let me leave this resolved. > Feature parity for Python vs Scala APIs >

[jira] [Commented] (SPARK-25238) Lint-Python: Upgrading to the current version of pycodestyle fails

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611523#comment-16611523 ] Apache Spark commented on SPARK-25238: -- User 'srowen' has created a pull request for this issue:

[jira] [Commented] (SPARK-25238) Lint-Python: Upgrading to the current version of pycodestyle fails

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611524#comment-16611524 ] Apache Spark commented on SPARK-25238: -- User 'srowen' has created a pull request for this issue:

[jira] [Created] (SPARK-25411) Implement range partition in Spark

2018-09-11 Thread Wang, Gang (JIRA)
Wang, Gang created SPARK-25411: -- Summary: Implement range partition in Spark Key: SPARK-25411 URL: https://issues.apache.org/jira/browse/SPARK-25411 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen

2018-09-11 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611502#comment-16611502 ] Kazuaki Ishizaki commented on SPARK-20184: -- Although I created another JIRA

[jira] [Commented] (SPARK-16196) Optimize in-memory scan performance using ColumnarBatches

2018-09-11 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611494#comment-16611494 ] Kazuaki Ishizaki commented on SPARK-16196: -- I see. I will check this. > Optimize in-memory

[jira] [Resolved] (SPARK-25354) Parquet vectorized record reader has unneeded operation in several methods

2018-09-11 Thread SongYadong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SongYadong resolved SPARK-25354. Resolution: Won't Fix The benefit is not obvious.  > Parquet vectorized record reader has

[jira] [Commented] (SPARK-25354) Parquet vectorized record reader has unneeded operation in several methods

2018-09-11 Thread SongYadong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611432#comment-16611432 ] SongYadong commented on SPARK-25354: The benefit is not obvious. PR is closed. > Parquet vectorized

[jira] [Updated] (SPARK-25410) Spark executor on YARN does not include memoryOverhead when starting an ExecutorRunnable

2018-09-11 Thread Anbang Hu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anbang Hu updated SPARK-25410: -- Description: When deploying on YARN, only {{executorMemory}} is used to launch executors in

[jira] [Created] (SPARK-25410) Spark executor on YARN does not include memoryOverhead when starting an ExecutorRunnable

2018-09-11 Thread Anbang Hu (JIRA)
Anbang Hu created SPARK-25410: - Summary: Spark executor on YARN does not include memoryOverhead when starting an ExecutorRunnable Key: SPARK-25410 URL: https://issues.apache.org/jira/browse/SPARK-25410

[jira] [Updated] (SPARK-19489) Stable serialization format for external & native code integration

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-19489: Target Version/s: (was: 2.4.0) > Stable serialization format for external & native code

[jira] [Resolved] (SPARK-19489) Stable serialization format for external & native code integration

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-19489. - Resolution: Fixed Fix Version/s: 2.3.0 > Stable serialization format for external &

[jira] [Commented] (SPARK-25409) Speed up Spark History at start if there are tens of thousands of applications.

2018-09-11 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611409#comment-16611409 ] Yuming Wang commented on SPARK-25409: - Please create pull request from Github:

[jira] [Updated] (SPARK-25395) Replace Spark Optional class with Java Optional

2018-09-11 Thread Mario Molina (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mario Molina updated SPARK-25395: - Summary: Replace Spark Optional class with Java Optional (was: Remove Spark Optional Java API)

[jira] [Commented] (SPARK-24572) "eager execution" for R shell, IDE

2018-09-11 Thread Weiqiang Zhuang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611357#comment-16611357 ] Weiqiang Zhuang commented on SPARK-24572: - [~felixcheung] are you thinking of using the same

[jira] [Resolved] (SPARK-25399) Reusing execution threads from continuous processing for microbatch streaming can result in correctness issues

2018-09-11 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-25399. --- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Issue resolved by

[jira] [Assigned] (SPARK-25399) Reusing execution threads from continuous processing for microbatch streaming can result in correctness issues

2018-09-11 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das reassigned SPARK-25399: - Assignee: Mukul Murthy > Reusing execution threads from continuous processing for

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-09-11 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611331#comment-16611331 ] Bruce Robbins commented on SPARK-25164: --- Thanks [~Tagar] for the feedback. I assume the 44%

[jira] [Updated] (SPARK-25409) Speed up Spark History at start if there are tens of thousands of applications.

2018-09-11 Thread Rong Tang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rong Tang updated SPARK-25409: -- Attachment: SPARK-25409.0001.patch > Speed up Spark History at start if there are tens of thousands

[jira] [Created] (SPARK-25409) Speed up Spark History at start if there are tens of thousands of applications.

2018-09-11 Thread Rong Tang (JIRA)
Rong Tang created SPARK-25409: - Summary: Speed up Spark History at start if there are tens of thousands of applications. Key: SPARK-25409 URL: https://issues.apache.org/jira/browse/SPARK-25409 Project:

[jira] [Commented] (SPARK-25331) Structured Streaming File Sink duplicates records in case of driver failure

2018-09-11 Thread Mihaly Toth (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611244#comment-16611244 ] Mihaly Toth commented on SPARK-25331: - I will try to make it idempotent then. > Structured

[jira] [Commented] (SPARK-25408) Move to idiomatic Java 8

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611227#comment-16611227 ] Apache Spark commented on SPARK-25408: -- User 'Fokko' has created a pull request for this issue:

[jira] [Commented] (SPARK-25408) Move to idiomatic Java 8

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611223#comment-16611223 ] Apache Spark commented on SPARK-25408: -- User 'Fokko' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25408) Move to idiomatic Java 8

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25408: Assignee: Apache Spark > Move to idiomatic Java 8 > > >

[jira] [Assigned] (SPARK-25408) Move to idiomatic Java 8

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25408: Assignee: (was: Apache Spark) > Move to idiomatic Java 8 >

[jira] [Created] (SPARK-25408) Move to idiomatic Java 8

2018-09-11 Thread Fokko Driesprong (JIRA)
Fokko Driesprong created SPARK-25408: Summary: Move to idiomatic Java 8 Key: SPARK-25408 URL: https://issues.apache.org/jira/browse/SPARK-25408 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-23820) Allow the long form of call sites to be recorded in the log

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611175#comment-16611175 ] Apache Spark commented on SPARK-23820: -- User 'michaelmior' has created a pull request for this

[jira] [Assigned] (SPARK-23820) Allow the long form of call sites to be recorded in the log

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23820: Assignee: Michael Mior (was: Apache Spark) > Allow the long form of call sites to be

[jira] [Assigned] (SPARK-23820) Allow the long form of call sites to be recorded in the log

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23820: Assignee: Apache Spark (was: Michael Mior) > Allow the long form of call sites to be

[jira] [Commented] (SPARK-25170) Add Task Metrics description to the documentation

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611159#comment-16611159 ] Apache Spark commented on SPARK-25170: -- User 'LucaCanali' has created a pull request for this

[jira] [Updated] (SPARK-23820) Allow the long form of call sites to be recorded in the log

2018-09-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23820: -- Fix Version/s: (was: 2.4.0) > Allow the long form of call sites to be recorded in the log >

[jira] [Reopened] (SPARK-23820) Allow the long form of call sites to be recorded in the log

2018-09-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-23820: --- Reopened pending some further discussion about the implementation > Allow the long form of call sites

[jira] [Resolved] (SPARK-25398) Minor bugs from comparing unrelated types

2018-09-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25398. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22384

[jira] [Assigned] (SPARK-15815) Hang while enable blacklistExecutor and DynamicExecutorAllocator

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15815: Assignee: (was: Apache Spark) > Hang while enable blacklistExecutor and

[jira] [Assigned] (SPARK-15815) Hang while enable blacklistExecutor and DynamicExecutorAllocator

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-15815: Assignee: Apache Spark > Hang while enable blacklistExecutor and

[jira] [Commented] (SPARK-15815) Hang while enable blacklistExecutor and DynamicExecutorAllocator

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611133#comment-16611133 ] Apache Spark commented on SPARK-15815: -- User 'dhruve' has created a pull request for this issue:

[jira] [Comment Edited] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-09-11 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611089#comment-16611089 ] Ruslan Dautkhanov edited comment on SPARK-25164 at 9/11/18 7:00 PM:

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-09-11 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611089#comment-16611089 ] Ruslan Dautkhanov commented on SPARK-25164: --- Thanks [~bersprockets] Very good find ! Thanks.

[jira] [Commented] (SPARK-21542) Helper functions for custom Python Persistence

2018-09-11 Thread John Bauer (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611085#comment-16611085 ] John Bauer commented on SPARK-21542: You don't show your code for __init__ or setParams.  I recall

[jira] [Commented] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611079#comment-16611079 ] Apache Spark commented on SPARK-23425: -- User 'sujith71955' has created a pull request for this

[jira] [Updated] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-09-11 Thread Sujith (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-23425: --- Docs Text: Release notes: Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading

[jira] [Commented] (SPARK-25380) Generated plans occupy over 50% of Spark driver memory

2018-09-11 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611012#comment-16611012 ] Marcelo Vanzin commented on SPARK-25380: Another bit of information that would be useful is the

[jira] [Resolved] (SPARK-24889) dataset.unpersist() doesn't update storage memory stats

2018-09-11 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-24889. Resolution: Fixed Fix Version/s: 2.3.2 2.4.0 Issue resolved by

[jira] [Assigned] (SPARK-24889) dataset.unpersist() doesn't update storage memory stats

2018-09-11 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-24889: -- Assignee: Liang-Chi Hsieh > dataset.unpersist() doesn't update storage memory stats

[jira] [Updated] (SPARK-19903) Watermark metadata is lost when using resolved attributes

2018-09-11 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19903: - Target Version/s: (was: 2.4.0) > Watermark metadata is lost when using resolved attributes >

[jira] [Commented] (SPARK-19903) Watermark metadata is lost when using resolved attributes

2018-09-11 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610931#comment-16610931 ] Shixiong Zhu commented on SPARK-19903: -- Yes. I removed the target version. > Watermark metadata is

[jira] [Assigned] (SPARK-25221) [DEPLOY] Consistent trailing whitespace treatment of conf values

2018-09-11 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-25221: -- Assignee: Gera Shegalov > [DEPLOY] Consistent trailing whitespace treatment of conf

[jira] [Resolved] (SPARK-25221) [DEPLOY] Consistent trailing whitespace treatment of conf values

2018-09-11 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-25221. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22213

[jira] [Commented] (SPARK-25407) Spark throws a `ParquetDecodingException` when attempting to read a field from a complex type in certain cases of schema merging

2018-09-11 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610841#comment-16610841 ] Michael Allman commented on SPARK-25407: I have a code-complete patch for this bug, but I want

[jira] [Created] (SPARK-25407) Spark throws a `ParquetDecodingException` when attempting to read a field from a complex type in certain cases of schema merging

2018-09-11 Thread Michael Allman (JIRA)
Michael Allman created SPARK-25407: -- Summary: Spark throws a `ParquetDecodingException` when attempting to read a field from a complex type in certain cases of schema merging Key: SPARK-25407 URL:

[jira] [Commented] (SPARK-16323) Avoid unnecessary cast when doing integral divide

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610837#comment-16610837 ] Apache Spark commented on SPARK-16323: -- User 'mgaido91' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25389) INSERT OVERWRITE DIRECTORY STORED AS should prevent duplicate fields

2018-09-11 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-25389: - Assignee: Dongjoon Hyun > INSERT OVERWRITE DIRECTORY STORED AS should prevent

[jira] [Commented] (SPARK-16323) Avoid unnecessary cast when doing integral divide

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610836#comment-16610836 ] Apache Spark commented on SPARK-16323: -- User 'mgaido91' has created a pull request for this issue:

[jira] [Resolved] (SPARK-25389) INSERT OVERWRITE DIRECTORY STORED AS should prevent duplicate fields

2018-09-11 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-25389. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22378

[jira] [Updated] (SPARK-25406) Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests

2018-09-11 Thread Michael Allman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Allman updated SPARK-25406: --- Priority: Major (was: Critical) > Incorrect usage of withSQLConf method in Parquet schema

[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2018-09-11 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610810#comment-16610810 ] Reynold Xin commented on SPARK-19489: - We can close this now. > Stable serialization format for

[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2018-09-11 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610798#comment-16610798 ] Wes McKinney commented on SPARK-19489: -- Since there's a native Rust library for Arrow in

[jira] [Assigned] (SPARK-25406) Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25406: Assignee: (was: Apache Spark) > Incorrect usage of withSQLConf method in Parquet

[jira] [Commented] (SPARK-25406) Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610788#comment-16610788 ] Apache Spark commented on SPARK-25406: -- User 'mallman' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25406) Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests

2018-09-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25406: Assignee: Apache Spark > Incorrect usage of withSQLConf method in Parquet schema pruning

[jira] [Created] (SPARK-25406) Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests

2018-09-11 Thread Michael Allman (JIRA)
Michael Allman created SPARK-25406: -- Summary: Incorrect usage of withSQLConf method in Parquet schema pruning test suite masks failing tests Key: SPARK-25406 URL:

[jira] [Comment Edited] (SPARK-7768) Make user-defined type (UDT) API public

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610733#comment-16610733 ] Wenchen Fan edited comment on SPARK-7768 at 9/11/18 2:47 PM: - I'm retargeting

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610733#comment-16610733 ] Wenchen Fan commented on SPARK-7768: I'm retargeting to 3.0. BTW is there any concerns about making

[jira] [Updated] (SPARK-9576) DataFrame API improvement umbrella ticket (in Spark 2.x)

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-9576: --- Target Version/s: 3.0.0 (was: 2.4.0) > DataFrame API improvement umbrella ticket (in Spark 2.x) >

[jira] [Updated] (SPARK-7768) Make user-defined type (UDT) API public

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-7768: --- Target Version/s: 3.0.0 (was: 2.4.0) > Make user-defined type (UDT) API public >

[jira] [Commented] (SPARK-8000) SQLContext.read.load() should be able to auto-detect input data

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610731#comment-16610731 ] Wenchen Fan commented on SPARK-8000: do we still want to do it? > SQLContext.read.load() should be

[jira] [Commented] (SPARK-12978) Skip unnecessary final group-by when input data already clustered with group-by keys

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610729#comment-16610729 ] Wenchen Fan commented on SPARK-12978: - Sorry we missed this issue. I think this is a very

[jira] [Commented] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page

2018-09-11 Thread sandeep katta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610728#comment-16610728 ] sandeep katta commented on SPARK-25392: --- Yes the link is active,I will analyze more with code

[jira] [Commented] (SPARK-13682) Finalize the public API for FileFormat

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610723#comment-16610723 ] Wenchen Fan commented on SPARK-13682: - I'm closing it. We will keep `FileFormat` API private, and

[jira] [Resolved] (SPARK-13682) Finalize the public API for FileFormat

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-13682. - Resolution: Won't Fix > Finalize the public API for FileFormat >

[jira] [Updated] (SPARK-14098) Generate Java code to build CachedColumnarBatch and get values from CachedColumnarBatch when DataFrame.cache() is called

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-14098: Target Version/s: 3.0.0 (was: 2.4.0) > Generate Java code to build CachedColumnarBatch and get

[jira] [Updated] (SPARK-14922) Alter Table Drop Partition Using Predicate-based Partition Spec

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-14922: Target Version/s: 3.0.0 (was: 2.4.0) > Alter Table Drop Partition Using Predicate-based

[jira] [Resolved] (SPARK-15380) Generate code that stores a float/double value in each column from ColumnarBatch when DataFrame.cache() is used

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-15380. - Resolution: Won't Fix > Generate code that stores a float/double value in each column from >

[jira] [Commented] (SPARK-15420) Repartition and sort before Parquet writes

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610719#comment-16610719 ] Wenchen Fan commented on SPARK-15420: - Have we fixed it? > Repartition and sort before Parquet

[jira] [Updated] (SPARK-15690) Fast single-node (single-process) in-memory shuffle

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15690: Target Version/s: (was: 2.4.0) > Fast single-node (single-process) in-memory shuffle >

[jira] [Commented] (SPARK-15690) Fast single-node (single-process) in-memory shuffle

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610715#comment-16610715 ] Wenchen Fan commented on SPARK-15690: - I'm removing the target version, since no one is working on

[jira] [Commented] (SPARK-15693) Write schema definition out for file-based data sources to avoid schema inference

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610713#comment-16610713 ] Wenchen Fan commented on SPARK-15693: - Do we still want to do it? > Write schema definition out for

[jira] [Updated] (SPARK-15691) Refactor and improve Hive support

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15691: Target Version/s: 3.0.0 (was: 2.4.0) > Refactor and improve Hive support >

[jira] [Updated] (SPARK-15694) Implement ScriptTransformation in sql/core

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15694: Target Version/s: 3.0.0 (was: 2.4.0) > Implement ScriptTransformation in sql/core >

[jira] [Commented] (SPARK-16011) SQL metrics include duplicated attempts

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610705#comment-16610705 ] Wenchen Fan commented on SPARK-16011: - Since the behavior is intentional, I'm closing this ticket.

[jira] [Updated] (SPARK-15867) Use bucket files for TABLESAMPLE BUCKET

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-15867: Target Version/s: (was: 2.4.0) > Use bucket files for TABLESAMPLE BUCKET >

[jira] [Commented] (SPARK-15867) Use bucket files for TABLESAMPLE BUCKET

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610709#comment-16610709 ] Wenchen Fan commented on SPARK-15867: - I'm removing the target version, since no one is working on

[jira] [Commented] (SPARK-16323) Avoid unnecessary cast when doing integral divide

2018-09-11 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610703#comment-16610703 ] Marco Gaido commented on SPARK-16323: - Sure [~cloud_fan], will do. Thanks. > Avoid unnecessary cast

[jira] [Resolved] (SPARK-16011) SQL metrics include duplicated attempts

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-16011. - Resolution: Won't Fix > SQL metrics include duplicated attempts >

[jira] [Updated] (SPARK-16011) SQL metrics include duplicated attempts

2018-09-11 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-16011: Target Version/s: (was: 2.4.0) > SQL metrics include duplicated attempts >

  1   2   >