[jira] [Updated] (SPARK-17129) Support statistics collection and cardinality estimation for partitioned tables

2017-11-17 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17129: - Description: Support statistics collection and cardinality estimation for partitioned tables.

[jira] [Assigned] (SPARK-22550) 64KB JVM bytecode limit problem with elt

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22550: Assignee: Apache Spark > 64KB JVM bytecode limit problem with elt >

[jira] [Commented] (SPARK-22550) 64KB JVM bytecode limit problem with elt

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257901#comment-16257901 ] Apache Spark commented on SPARK-22550: -- User 'kiszk' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22550) 64KB JVM bytecode limit problem with elt

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22550: Assignee: (was: Apache Spark) > 64KB JVM bytecode limit problem with elt >

[jira] [Commented] (SPARK-22549) 64KB JVM bytecode limit problem with concat_ws

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257898#comment-16257898 ] Apache Spark commented on SPARK-22549: -- User 'kiszk' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22549) 64KB JVM bytecode limit problem with concat_ws

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22549: Assignee: (was: Apache Spark) > 64KB JVM bytecode limit problem with concat_ws >

[jira] [Assigned] (SPARK-22549) 64KB JVM bytecode limit problem with concat_ws

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22549: Assignee: Apache Spark > 64KB JVM bytecode limit problem with concat_ws >

[jira] [Updated] (SPARK-22550) 64KB JVM bytecode limit problem with elt

2017-11-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-22550: - Issue Type: Sub-task (was: Bug) Parent: SPARK-22510 > 64KB JVM bytecode limit

[jira] [Created] (SPARK-22550) 64KB JVM bytecode limit problem with elt

2017-11-17 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-22550: Summary: 64KB JVM bytecode limit problem with elt Key: SPARK-22550 URL: https://issues.apache.org/jira/browse/SPARK-22550 Project: Spark Issue Type:

[jira] [Updated] (SPARK-22549) 64KB JVM bytecode limit problem with concat_ws

2017-11-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-22549: - Issue Type: Sub-task (was: Bug) Parent: SPARK-22510 > 64KB JVM bytecode limit

[jira] [Created] (SPARK-22549) 64KB JVM bytecode limit problem with concat_ws

2017-11-17 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-22549: Summary: 64KB JVM bytecode limit problem with concat_ws Key: SPARK-22549 URL: https://issues.apache.org/jira/browse/SPARK-22549 Project: Spark Issue

[jira] [Updated] (SPARK-22498) 64KB JVM bytecode limit problem with concat

2017-11-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-22498: - Summary: 64KB JVM bytecode limit problem with concat (was: 64KB JVM bytecode limit

[jira] [Updated] (SPARK-22498) 64KB JVM bytecode limit problem with concat

2017-11-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-22498: - Description: {{concat}} can throw an exception due to the 64KB JVM bytecode limit when

[jira] [Assigned] (SPARK-22548) Incorrect nested AND expression pushed down to JDBC data source

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22548: Assignee: (was: Apache Spark) > Incorrect nested AND expression pushed down to JDBC

[jira] [Commented] (SPARK-22548) Incorrect nested AND expression pushed down to JDBC data source

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257839#comment-16257839 ] Apache Spark commented on SPARK-22548: -- User 'jliwork' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22548) Incorrect nested AND expression pushed down to JDBC data source

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22548: Assignee: Apache Spark > Incorrect nested AND expression pushed down to JDBC data source

[jira] [Created] (SPARK-22548) Incorrect nested AND expression pushed down to JDBC data source

2017-11-17 Thread Jia Li (JIRA)
Jia Li created SPARK-22548: -- Summary: Incorrect nested AND expression pushed down to JDBC data source Key: SPARK-22548 URL: https://issues.apache.org/jira/browse/SPARK-22548 Project: Spark Issue

[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors

2017-11-17 Thread Sergei Lebedev (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257809#comment-16257809 ] Sergei Lebedev commented on SPARK-19371: > Usually the answer is to force a shuffle [...]

[jira] [Resolved] (SPARK-22544) FileStreamSource should use its own hadoop conf to call globPathIfNecessary

2017-11-17 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-22544. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.2.2

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-11-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257742#comment-16257742 ] Bryan Cutler commented on SPARK-21187: -- [~icexelloss] It looks like there is a bug in older Arrow

[jira] [Commented] (SPARK-22517) NullPointerException in ShuffleExternalSorter.spill()

2017-11-17 Thread Andreas Maier (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257660#comment-16257660 ] Andreas Maier commented on SPARK-22517: --- Unfortunately I don't have some minimal code to reproduce

[jira] [Created] (SPARK-22547) Don't include executor ID in metrics name

2017-11-17 Thread Li Haoyi (JIRA)
Li Haoyi created SPARK-22547: Summary: Don't include executor ID in metrics name Key: SPARK-22547 URL: https://issues.apache.org/jira/browse/SPARK-22547 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread mohamed imran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257247#comment-16257247 ] mohamed imran commented on SPARK-22526: --- [~srowen] Yes I agree that it is an issue with the http

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257230#comment-16257230 ] Sean Owen commented on SPARK-22526: --- It could be a problem with S3, or the S3 API. You haven't shown

[jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread mohamed imran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257224#comment-16257224 ] mohamed imran edited comment on SPARK-22526 at 11/17/17 5:04 PM: -

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread mohamed imran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257224#comment-16257224 ] mohamed imran commented on SPARK-22526: --- [~srowen] I am processing inside the foreach loop. like

[jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread mohamed imran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257224#comment-16257224 ] mohamed imran edited comment on SPARK-22526 at 11/17/17 5:03 PM: -

[jira] [Assigned] (SPARK-22538) SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame

2017-11-17 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22538: --- Assignee: Liang-Chi Hsieh > SQLTransformer.transform(inputDataFrame) uncaches

[jira] [Resolved] (SPARK-22538) SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame

2017-11-17 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22538. - Resolution: Fixed Fix Version/s: 2.3.0 2.2.2 Issue resolved by pull

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-11-17 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257142#comment-16257142 ] Li Jin commented on SPARK-21187: [~bryanc], the only type left is Decimal and that depends on Arrow 0.8

[jira] [Assigned] (SPARK-22409) Add function type argument to pandas_udf

2017-11-17 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22409: --- Assignee: Li Jin > Add function type argument to pandas_udf >

[jira] [Updated] (SPARK-22409) Add function type argument to pandas_udf

2017-11-17 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-22409: Affects Version/s: (was: 2.2.0) 2.3.0 > Add function type argument to

[jira] [Resolved] (SPARK-22409) Add function type argument to pandas_udf

2017-11-17 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22409. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19630

[jira] [Commented] (SPARK-22274) User-defined aggregation functions with pandas udf

2017-11-17 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257120#comment-16257120 ] Li Jin commented on SPARK-22274: Absolutely. > User-defined aggregation functions with pandas udf >

[jira] [Commented] (SPARK-22516) CSV Read breaks: When "multiLine" = "true", if "comment" option is set as last line's first character

2017-11-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257088#comment-16257088 ] Marco Gaido commented on SPARK-22516: - not sure why but this is caused by the fact that your file

[jira] [Commented] (SPARK-22532) Spark SQL function 'drop_duplicates' throws error when passing in a column that is an element of a struct

2017-11-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257020#comment-16257020 ] Marco Gaido commented on SPARK-22532: - the reason is that `header.eventId.lo` is not a column name,

[jira] [Commented] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-17 Thread Eric Maynard (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257004#comment-16257004 ] Eric Maynard commented on SPARK-22541: -- Yeah, this is a common problem when you have side effects in

[jira] [Commented] (SPARK-22343) Add support for publishing Spark metrics into Prometheus

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256994#comment-16256994 ] Apache Spark commented on SPARK-22343: -- User 'matyix' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22343) Add support for publishing Spark metrics into Prometheus

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22343: Assignee: (was: Apache Spark) > Add support for publishing Spark metrics into

[jira] [Assigned] (SPARK-22343) Add support for publishing Spark metrics into Prometheus

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22343: Assignee: Apache Spark > Add support for publishing Spark metrics into Prometheus >

[jira] [Resolved] (SPARK-22540) HighlyCompressedMapStatus's avgSize is incorrect

2017-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22540. --- Resolution: Fixed Fix Version/s: 2.2.2 2.3.0 > HighlyCompressedMapStatus's

[jira] [Assigned] (SPARK-22540) HighlyCompressedMapStatus's avgSize is incorrect

2017-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-22540: - Assignee: yucai > HighlyCompressedMapStatus's avgSize is incorrect >

[jira] [Resolved] (SPARK-22528) History service and non-HDFS filesystems

2017-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22528. --- Resolution: Invalid Move to the mailing list for now; not obviously something due to Spark. >

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256986#comment-16256986 ] Sean Owen commented on SPARK-22526: --- You're only showing a read of one file, a .zip file. I don't see

[jira] [Resolved] (SPARK-22518) Make default cache storage level configurable

2017-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22518. --- Resolution: Won't Fix > Make default cache storage level configurable >

[jira] [Resolved] (SPARK-22513) Provide build profile for hadoop 2.8

2017-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22513. --- Resolution: Won't Fix > Provide build profile for hadoop 2.8 >

[jira] [Assigned] (SPARK-22475) show histogram in DESC COLUMN command

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22475: Assignee: Apache Spark > show histogram in DESC COLUMN command >

[jira] [Assigned] (SPARK-22475) show histogram in DESC COLUMN command

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22475: Assignee: (was: Apache Spark) > show histogram in DESC COLUMN command >

[jira] [Commented] (SPARK-22475) show histogram in DESC COLUMN command

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256924#comment-16256924 ] Apache Spark commented on SPARK-22475: -- User 'mgaido91' has created a pull request for this issue:

[jira] [Commented] (SPARK-22274) User-defined aggregation functions with pandas udf

2017-11-17 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256919#comment-16256919 ] holdenk commented on SPARK-22274: - Wonderful, do ping me on the PR then :) > User-defined aggregation

[jira] [Updated] (SPARK-22042) ReorderJoinPredicates can break when child's partitioning is not decided

2017-11-17 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-22042: Affects Version/s: (was: 2.2.0) (was: 2.1.0)

[jira] [Commented] (SPARK-22546) Allow users to update the dataType of a column

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256901#comment-16256901 ] Apache Spark commented on SPARK-22546: -- User 'xuanyuanking' has created a pull request for this

[jira] [Assigned] (SPARK-22546) Allow users to update the dataType of a column

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22546: Assignee: (was: Apache Spark) > Allow users to update the dataType of a column >

[jira] [Assigned] (SPARK-22546) Allow users to update the dataType of a column

2017-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22546: Assignee: Apache Spark > Allow users to update the dataType of a column >

[jira] [Created] (SPARK-22546) Allow users to update the dataType of a column

2017-11-17 Thread Li Yuanjian (JIRA)
Li Yuanjian created SPARK-22546: --- Summary: Allow users to update the dataType of a column Key: SPARK-22546 URL: https://issues.apache.org/jira/browse/SPARK-22546 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-14959) ​Problem Reading partitioned ORC or Parquet files

2017-11-17 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256789#comment-16256789 ] Steve Loughran commented on SPARK-14959: Came across a reference to this while scanning for

[jira] [Commented] (SPARK-22541) Dataframes: applying multiple filters one after another using udfs and accumulators results in faulty accumulators

2017-11-17 Thread Janne K. Olesen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256676#comment-16256676 ] Janne K. Olesen commented on SPARK-22541: - I agree, the filtered results are correct, but that is

[jira] [Assigned] (SPARK-22495) Fix setup of SPARK_HOME variable on Windows

2017-11-17 Thread Jakub Nowacki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Nowacki reassigned SPARK-22495: - Assignee: Jakub Nowacki > Fix setup of SPARK_HOME variable on Windows >