[jira] [Commented] (SPARK-25106) A new Kafka consumer gets created for every batch

2018-08-23 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591170#comment-16591170 ] Tathagata Das commented on SPARK-25106: --- This is interesting! I dont know how this could be

[jira] [Commented] (SPARK-24630) SPIP: Support SQLStreaming in Spark

2018-08-23 Thread Genmao Yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591151#comment-16591151 ] Genmao Yu commented on SPARK-24630: --- [~Jackey Lee] I am glad to participate in code review.  > SPIP:

[jira] [Resolved] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2018-08-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-4502. Resolution: Fixed Fix Version/s: 2.4.0 > Spark SQL reads unneccesary nested fields from Parquet >

[jira] [Assigned] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2018-08-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-4502: -- Assignee: Michael Allman > Spark SQL reads unneccesary nested fields from Parquet >

[jira] [Closed] (SPARK-25210) spark driver apply task success info cost much time

2018-08-23 Thread wangminfeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangminfeng closed SPARK-25210. --- > spark driver apply task success info cost much time >

[jira] [Assigned] (SPARK-25221) [DEPLOY] Consistent trailing whitespace treatment of conf values

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25221: Assignee: (was: Apache Spark) > [DEPLOY] Consistent trailing whitespace treatment of

[jira] [Commented] (SPARK-25221) [DEPLOY] Consistent trailing whitespace treatment of conf values

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591097#comment-16591097 ] Apache Spark commented on SPARK-25221: -- User 'gerashegalov' has created a pull request for this

[jira] [Assigned] (SPARK-25221) [DEPLOY] Consistent trailing whitespace treatment of conf values

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25221: Assignee: Apache Spark > [DEPLOY] Consistent trailing whitespace treatment of conf

[jira] [Created] (SPARK-25221) [DEPLOY] Consistent trailing whitespace treatment of conf values

2018-08-23 Thread Gera Shegalov (JIRA)
Gera Shegalov created SPARK-25221: - Summary: [DEPLOY] Consistent trailing whitespace treatment of conf values Key: SPARK-25221 URL: https://issues.apache.org/jira/browse/SPARK-25221 Project: Spark

[jira] [Commented] (SPARK-25220) [K8S] Split out node selector config between driver and executors.

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591058#comment-16591058 ] Apache Spark commented on SPARK-25220: -- User 'jweaver-personal' has created a pull request for this

[jira] [Assigned] (SPARK-25220) [K8S] Split out node selector config between driver and executors.

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25220: Assignee: Apache Spark > [K8S] Split out node selector config between driver and

[jira] [Assigned] (SPARK-25220) [K8S] Split out node selector config between driver and executors.

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25220: Assignee: (was: Apache Spark) > [K8S] Split out node selector config between driver

[jira] [Created] (SPARK-25220) [K8S] Split out node selector config between driver and executors.

2018-08-23 Thread Jonathan A Weaver (JIRA)
Jonathan A Weaver created SPARK-25220: - Summary: [K8S] Split out node selector config between driver and executors. Key: SPARK-25220 URL: https://issues.apache.org/jira/browse/SPARK-25220

[jira] [Resolved] (SPARK-25209) Optimization in Dataset.apply for DataFrames

2018-08-23 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-25209. --- Resolution: Fixed Assignee: Bogdan Raducanu Fix Version/s: 2.4.0 >

[jira] [Commented] (SPARK-25210) spark driver apply task success info cost much time

2018-08-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591050#comment-16591050 ] Hyukjin Kwon commented on SPARK-25210: -- If it's unclear whether it is an issue or not, let's ask it

[jira] [Resolved] (SPARK-25210) spark driver apply task success info cost much time

2018-08-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25210. -- Resolution: Invalid Let's reopen when it's clear if that's an issue. > spark driver apply

[jira] [Resolved] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-08-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23425. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20611

[jira] [Assigned] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-08-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-23425: Assignee: Sujith > load data for hdfs file path with wild card usage is not working

[jira] [Resolved] (SPARK-25205) typo in spark.network.crypto.keyFactoryIteration

2018-08-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25205. -- Resolution: Fixed Fix Version/s: 2.3.2 2.4.0 Issue resolved by pull

[jira] [Assigned] (SPARK-25205) typo in spark.network.crypto.keyFactoryIteration

2018-08-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-25205: Assignee: Imran Rashid > typo in spark.network.crypto.keyFactoryIteration >

[jira] [Commented] (SPARK-24814) Relationship between catalog and datasources

2018-08-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591029#comment-16591029 ] Xiao Li commented on SPARK-24814: - Also cc [~cloud_fan] > Relationship between catalog and datasources

[jira] [Created] (SPARK-25219) KMeans Clustering - Text Data - Results are incorrect

2018-08-23 Thread Vasanthkumar Velayudham (JIRA)
Vasanthkumar Velayudham created SPARK-25219: --- Summary: KMeans Clustering - Text Data - Results are incorrect Key: SPARK-25219 URL: https://issues.apache.org/jira/browse/SPARK-25219 Project:

[jira] [Commented] (SPARK-24564) Add test suite for RecordBinaryComparator

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590923#comment-16590923 ] Apache Spark commented on SPARK-24564: -- User 'henryr' has created a pull request for this issue:

[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590921#comment-16590921 ] Apache Spark commented on SPARK-23207: -- User 'henryr' has created a pull request for this issue:

[jira] [Commented] (SPARK-25114) RecordBinaryComparator may return wrong result when subtraction between two words is divisible by Integer.MAX_VALUE

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590924#comment-16590924 ] Apache Spark commented on SPARK-25114: -- User 'henryr' has created a pull request for this issue:

[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel, GaussianMixtureModel save implementation for Row order issues

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590922#comment-16590922 ] Apache Spark commented on SPARK-22905: -- User 'henryr' has created a pull request for this issue:

[jira] [Updated] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-25124: -- Target Version/s: 2.3.2 > VectorSizeHint.size is buggy, breaking streaming pipeline >

[jira] [Commented] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590919#comment-16590919 ] Joseph K. Bradley commented on SPARK-25124: --- I merged

[jira] [Updated] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-25124: -- Target Version/s: 2.3.2, 2.4.0 (was: 2.3.2) > VectorSizeHint.size is buggy, breaking

[jira] [Updated] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-25124: -- Fix Version/s: 2.4.0 > VectorSizeHint.size is buggy, breaking streaming pipeline >

[jira] [Updated] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-25124: -- Shepherd: Joseph K. Bradley > VectorSizeHint.size is buggy, breaking streaming

[jira] [Assigned] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-23 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley reassigned SPARK-25124: - Assignee: Huaxin Gao > VectorSizeHint.size is buggy, breaking streaming

[jira] [Commented] (SPARK-25202) SQL Function Split Should Respect Limit Argument

2018-08-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590874#comment-16590874 ] Liang-Chi Hsieh commented on SPARK-25202: - [~phegstrom] No problem. Please submit a PR for this.

[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590847#comment-16590847 ] Bruce Robbins commented on SPARK-23207: --- Will we be back-porting this to 2.1, or does the 18 month

[jira] [Comment Edited] (SPARK-8582) Optimize checkpointing to avoid computing an RDD twice

2018-08-23 Thread Baris ERGUN (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590733#comment-16590733 ] Baris ERGUN edited comment on SPARK-8582 at 8/23/18 9:46 PM: - +1 When is this

[jira] [Commented] (SPARK-24539) HistoryServer does not display metrics from tasks that complete after stage failure

2018-08-23 Thread Ankur Gupta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590837#comment-16590837 ] Ankur Gupta commented on SPARK-24539: - Verified this issue is same as SPARK-24415. The PR for

[jira] [Assigned] (SPARK-25218) Potential resource leaks in TransportServer and SocketAuthHelper

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25218: Assignee: Apache Spark (was: Shixiong Zhu) > Potential resource leaks in

[jira] [Assigned] (SPARK-25218) Potential resource leaks in TransportServer and SocketAuthHelper

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25218: Assignee: Shixiong Zhu (was: Apache Spark) > Potential resource leaks in

[jira] [Commented] (SPARK-25218) Potential resource leaks in TransportServer and SocketAuthHelper

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590766#comment-16590766 ] Apache Spark commented on SPARK-25218: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Created] (SPARK-25218) Potential resource leaks in TransportServer and SocketAuthHelper

2018-08-23 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-25218: Summary: Potential resource leaks in TransportServer and SocketAuthHelper Key: SPARK-25218 URL: https://issues.apache.org/jira/browse/SPARK-25218 Project: Spark

[jira] [Created] (SPARK-25217) Error thrown when creating BlockMatrix

2018-08-23 Thread cs5090237 (JIRA)
cs5090237 created SPARK-25217: - Summary: Error thrown when creating BlockMatrix Key: SPARK-25217 URL: https://issues.apache.org/jira/browse/SPARK-25217 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-24415) Stage page aggregated executor metrics wrong when failures

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590748#comment-16590748 ] Apache Spark commented on SPARK-24415: -- User 'ankuriitg' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24415) Stage page aggregated executor metrics wrong when failures

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24415: Assignee: (was: Apache Spark) > Stage page aggregated executor metrics wrong when

[jira] [Assigned] (SPARK-24415) Stage page aggregated executor metrics wrong when failures

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24415: Assignee: Apache Spark > Stage page aggregated executor metrics wrong when failures >

[jira] [Commented] (SPARK-8582) Optimize checkpointing to avoid computing an RDD twice

2018-08-23 Thread Baris ERGUN (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590733#comment-16590733 ] Baris ERGUN commented on SPARK-8582: +1 when this issue is planned to be resolved. I am facing it on

[jira] [Resolved] (SPARK-25204) rate source test is flaky

2018-08-23 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-25204. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22191

[jira] [Assigned] (SPARK-25204) rate source test is flaky

2018-08-23 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das reassigned SPARK-25204: - Assignee: Jose Torres > rate source test is flaky > - > >

[jira] [Updated] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-23207: Fix Version/s: 2.2.3 > Shuffle+Repartition on an DataFrame could lead to incorrect answers >

[jira] [Updated] (SPARK-24564) Add test suite for RecordBinaryComparator

2018-08-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24564: Fix Version/s: 2.3.2 2.2.3 > Add test suite for RecordBinaryComparator >

[jira] [Updated] (SPARK-25114) RecordBinaryComparator may return wrong result when subtraction between two words is divisible by Integer.MAX_VALUE

2018-08-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-25114: Fix Version/s: 2.2.3 > RecordBinaryComparator may return wrong result when subtraction between two >

[jira] [Assigned] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25216: Assignee: Apache Spark > Provide better error message when a column contains dot and

[jira] [Assigned] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25216: Assignee: (was: Apache Spark) > Provide better error message when a column contains

[jira] [Commented] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590681#comment-16590681 ] Apache Spark commented on SPARK-25216: -- User 'icexelloss' has created a pull request for this

[jira] [Updated] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-25216: --- Description: The current error message is  often confusing to a new Spark user that a column containing

[jira] [Updated] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-25216: --- Description: The current error message is  often confusing to a new Spark user that a column containing

[jira] [Created] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
Li Jin created SPARK-25216: -- Summary: Provide better error message when a column contains dot and needs backticks quote Key: SPARK-25216 URL: https://issues.apache.org/jira/browse/SPARK-25216 Project: Spark

[jira] [Updated] (SPARK-25214) Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`

2018-08-23 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25214: - Description: When there are missing offsets, Kafka v2 source may return duplicated records

[jira] [Commented] (SPARK-25200) Allow setting HADOOP_CONF_DIR as a spark property

2018-08-23 Thread Adam Balogh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590642#comment-16590642 ] Adam Balogh commented on SPARK-25200: - Great! I can work on it, I was thinking of adding something

[jira] [Assigned] (SPARK-25214) Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25214: Assignee: Apache Spark (was: Shixiong Zhu) > Kafka v2 source may return duplicated

[jira] [Assigned] (SPARK-25214) Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25214: Assignee: Shixiong Zhu (was: Apache Spark) > Kafka v2 source may return duplicated

[jira] [Assigned] (SPARK-25214) Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25214: Assignee: Apache Spark (was: Shixiong Zhu) > Kafka v2 source may return duplicated

[jira] [Assigned] (SPARK-25214) Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25214: Assignee: Shixiong Zhu (was: Apache Spark) > Kafka v2 source may return duplicated

[jira] [Commented] (SPARK-25214) Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590629#comment-16590629 ] Apache Spark commented on SPARK-25214: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Created] (SPARK-25215) Make PipelineModel public

2018-08-23 Thread Nicholas Resnick (JIRA)
Nicholas Resnick created SPARK-25215: Summary: Make PipelineModel public Key: SPARK-25215 URL: https://issues.apache.org/jira/browse/SPARK-25215 Project: Spark Issue Type: Wish

[jira] [Commented] (SPARK-25200) Allow setting HADOOP_CONF_DIR as a spark property

2018-08-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590621#comment-16590621 ] Marcelo Vanzin commented on SPARK-25200: Sounds like a good idea, just need someone to work on

[jira] [Commented] (SPARK-25200) Allow setting HADOOP_CONF_DIR as a spark property

2018-08-23 Thread Adam Balogh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590618#comment-16590618 ] Adam Balogh commented on SPARK-25200: - cc [~vanzin] > Allow setting HADOOP_CONF_DIR as a spark

[jira] [Assigned] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25213: Assignee: Apache Spark > DataSourceV2 doesn't seem to produce unsafe rows >

[jira] [Assigned] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25213: Assignee: (was: Apache Spark) > DataSourceV2 doesn't seem to produce unsafe rows >

[jira] [Commented] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590585#comment-16590585 ] Apache Spark commented on SPARK-25213: -- User 'rdblue' has created a pull request for this issue:

[jira] [Created] (SPARK-25214) Kafka v2 source may return duplicated records when `failOnDataLoss` is `false`

2018-08-23 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-25214: Summary: Kafka v2 source may return duplicated records when `failOnDataLoss` is `false` Key: SPARK-25214 URL: https://issues.apache.org/jira/browse/SPARK-25214

[jira] [Commented] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590531#comment-16590531 ] Ryan Blue commented on SPARK-25213: --- Sorry, I just realized the point is that the filter could have a

[jira] [Updated] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-23 Thread Jiang Xingbo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiang Xingbo updated SPARK-23207: - Affects Version/s: 1.6.0 2.0.0 2.1.0

[jira] [Commented] (SPARK-25202) SQL Function Split Should Respect Limit Argument

2018-08-23 Thread Parker Hegstrom (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590500#comment-16590500 ] Parker Hegstrom commented on SPARK-25202: - [~viirya] I do have time so can take it. Let me know

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-25206: --- Target Version/s: 2.3.2 > Wrong data may be returned when enable pushdown >

[jira] [Commented] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590480#comment-16590480 ] Marcelo Vanzin commented on SPARK-25206: Updating to blocker given recent discussions about

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-25206: --- Labels: correctness (was: ) > Wrong data may be returned when enable pushdown >

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-25206: --- Priority: Blocker (was: Major) > Wrong data may be returned when enable pushdown >

[jira] [Commented] (SPARK-24768) Have a built-in AVRO data source implementation

2018-08-23 Thread Antonio Murgia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590441#comment-16590441 ] Antonio Murgia commented on SPARK-24768: Will this support UDT to the extent the parquet

[jira] [Issue Comment Deleted] (SPARK-24772) support reading AVRO logical types - Date

2018-08-23 Thread Antonio Murgia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio Murgia updated SPARK-24772: --- Comment: was deleted (was: Will this support UDT to the extent the parquet reader/writer

[jira] [Commented] (SPARK-24772) support reading AVRO logical types - Date

2018-08-23 Thread Antonio Murgia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590439#comment-16590439 ] Antonio Murgia commented on SPARK-24772: Will this support UDT to the extent the parquet

[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-23 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590435#comment-16590435 ] Daniel Darabos commented on SPARK-23207: Sorry, could you clarify the fix version please?

[jira] [Commented] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590406#comment-16590406 ] Ryan Blue commented on SPARK-25213: --- [~cloud_fan], that PR ensures that there is a Project node on top

[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro

2018-08-23 Thread Sujith (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396 ] Sujith edited comment on SPARK-25073 at 8/23/18 3:48 PM: -  Yes, in the executor

[jira] [Commented] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590397#comment-16590397 ] Wenchen Fan commented on SPARK-25213: - [~rdblue] I think we have a problem here. In

[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro

2018-08-23 Thread Sujith (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396 ] Sujith edited comment on SPARK-25073 at 8/23/18 3:43 PM: -  Yes, in the executor

[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro

2018-08-23 Thread Sujith (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396 ] Sujith edited comment on SPARK-25073 at 8/23/18 3:42 PM: -  Yes, in the executor

[jira] [Commented] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error req

2018-08-23 Thread Sujith (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396 ] Sujith commented on SPARK-25073:  Yes, in the executor memory validation check we are displaying the

[jira] [Assigned] (SPARK-25212) Support Filter in ConvertToLocalRelation

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25212: Assignee: Apache Spark > Support Filter in ConvertToLocalRelation >

[jira] [Assigned] (SPARK-25212) Support Filter in ConvertToLocalRelation

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25212: Assignee: (was: Apache Spark) > Support Filter in ConvertToLocalRelation >

[jira] [Created] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Li Jin (JIRA)
Li Jin created SPARK-25213: -- Summary: DataSourceV2 doesn't seem to produce unsafe rows Key: SPARK-25213 URL: https://issues.apache.org/jira/browse/SPARK-25213 Project: Spark Issue Type: Task

[jira] [Commented] (SPARK-25212) Support Filter in ConvertToLocalRelation

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590364#comment-16590364 ] Apache Spark commented on SPARK-25212: -- User 'bogdanrdc' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25208) Loosen Cast.forceNullable for DecimalType.

2018-08-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-25208: --- Assignee: Takuya Ueshin > Loosen Cast.forceNullable for DecimalType. >

[jira] [Resolved] (SPARK-25208) Loosen Cast.forceNullable for DecimalType.

2018-08-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-25208. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22200

[jira] [Assigned] (SPARK-25196) Analyze column statistics in cached query

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25196: Assignee: Apache Spark > Analyze column statistics in cached query >

[jira] [Commented] (SPARK-25196) Analyze column statistics in cached query

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590323#comment-16590323 ] Apache Spark commented on SPARK-25196: -- User 'maropu' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25196) Analyze column statistics in cached query

2018-08-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25196: Assignee: (was: Apache Spark) > Analyze column statistics in cached query >

[jira] [Comment Edited] (SPARK-25196) Analyze column statistics in cached query

2018-08-23 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589604#comment-16589604 ] Takeshi Yamamuro edited comment on SPARK-25196 at 8/23/18 2:34 PM: ---

[jira] [Commented] (SPARK-25211) speculation and fetch failed result in hang of job

2018-08-23 Thread Lijia Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590267#comment-16590267 ] Lijia Liu commented on SPARK-25211: --- In https://issues.apache.org/jira/browse/SPARK-23948, This issue

[jira] [Resolved] (SPARK-25126) avoid creating OrcFile.Reader for all orc files

2018-08-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25126. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22157

[jira] [Assigned] (SPARK-25126) avoid creating OrcFile.Reader for all orc files

2018-08-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-25126: Assignee: Rao Fu > avoid creating OrcFile.Reader for all orc files >

  1   2   >