[jira] [Comment Edited] (SPARK-27570) java.io.EOFException Reached the end of stream - Reading Parquet from Swift

2019-07-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887498#comment-16887498 ] Josh Rosen edited comment on SPARK-27570 at 7/18/19 12:28 AM: --

[jira] [Updated] (SPARK-28430) Some stage table rows render wrong number of columns if tasks are missing metrics

2019-07-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28430: --- Attachment: ui-screenshot.png > Some stage table rows render wrong number of columns if tasks are

[jira] [Assigned] (SPARK-28430) Some stage table rows render wrong number of columns if tasks are missing metrics

2019-07-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-28430: -- Assignee: Josh Rosen > Some stage table rows render wrong number of columns if tasks are

[jira] [Updated] (SPARK-28430) Some stage table rows render wrong number of columns if tasks are missing metrics

2019-07-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28430: --- Description: The Spark UI's stages table renders too few columns for some tasks if a subset of the

[jira] [Created] (SPARK-28430) Some stage table rows render wrong number of columns if tasks are missing metrics

2019-07-17 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-28430: -- Summary: Some stage table rows render wrong number of columns if tasks are missing metrics Key: SPARK-28430 URL: https://issues.apache.org/jira/browse/SPARK-28430

[jira] [Commented] (SPARK-27570) java.io.EOFException Reached the end of stream - Reading Parquet from Swift

2019-07-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887498#comment-16887498 ] Josh Rosen commented on SPARK-27570: [~ste...@apache.org], I finally got a chance to test your

[jira] [Created] (SPARK-28427) Support more Postgres JSON functions

2019-07-17 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-28427: -- Summary: Support more Postgres JSON functions Key: SPARK-28427 URL: https://issues.apache.org/jira/browse/SPARK-28427 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-26021) -0.0 and 0.0 not treated consistently, doesn't match Hive

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-26021: --- Labels: correctness (was: ) > -0.0 and 0.0 not treated consistently, doesn't match Hive >

[jira] [Updated] (SPARK-26352) join reordering should not change the order of output attributes

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-26352: --- Labels: correctness (was: ) > join reordering should not change the order of output attributes >

[jira] [Updated] (SPARK-26864) Query may return incorrect result when python udf is used as a join condition and the udf uses attributes from both legs of left semi join.

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-26864: --- Labels: correctness (was: ) > Query may return incorrect result when python udf is used as a join

[jira] [Updated] (SPARK-27134) array_distinct function does not work correctly with columns containing array of array

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27134: --- Labels: correctness (was: ) > array_distinct function does not work correctly with columns

[jira] [Comment Edited] (SPARK-27416) UnsafeMapData & UnsafeArrayData Kryo serialization breaks when two machines have different Oops size

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886535#comment-16886535 ] Josh Rosen edited comment on SPARK-27416 at 7/16/19 10:56 PM: -- I think that

[jira] [Commented] (SPARK-27416) UnsafeMapData & UnsafeArrayData Kryo serialization breaks when two machines have different Oops size

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886535#comment-16886535 ] Josh Rosen commented on SPARK-27416: I think that we should backport this for Spark 2.4.4 because

[jira] [Comment Edited] (SPARK-27406) UnsafeArrayData serialization breaks when two machines have different Oops size

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886533#comment-16886533 ] Josh Rosen edited comment on SPARK-27406 at 7/16/19 10:51 PM: -- Adding the

[jira] [Updated] (SPARK-27406) UnsafeArrayData serialization breaks when two machines have different Oops size

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27406: --- Labels: correctness (was: ) Adding the 'correctness' label to this fixed issue because the related

[jira] [Updated] (SPARK-10914) UnsafeRow serialization breaks when two machines have different Oops size

2019-07-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10914: --- Labels: correctness (was: ) > UnsafeRow serialization breaks when two machines have different Oops

[jira] [Updated] (SPARK-28049) i want to a first ticket in zira

2019-07-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28049: --- Fix Version/s: (was: 2.4.4) (was: 2.4.3) > i want to a first ticket in

[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException

2019-07-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885550#comment-16885550 ] Josh Rosen commented on SPARK-28340: SPARK-23816 is a related issue about fetch failures caused by

[jira] [Commented] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException

2019-07-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885547#comment-16885547 ] Josh Rosen commented on SPARK-28340: Another variant of this issue, this time on the shuffle read

[jira] [Commented] (SPARK-28375) Enforce idempotence on the PullupCorrelatedPredicates optimizer rule

2019-07-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885531#comment-16885531 ] Josh Rosen commented on SPARK-28375: I'm not sure; is it possible to trigger double-optimization

[jira] [Updated] (SPARK-28375) Enforce idempotence on the PullupCorrelatedPredicates optimizer rule

2019-07-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28375: --- Labels: correctness (was: ) Adding the 'correctness' label so we remember to backport this fix to

[jira] [Commented] (SPARK-24079) Update the nullability of Join output based on inferred predicates

2019-07-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884815#comment-16884815 ] Josh Rosen commented on SPARK-24079: I'm marking SPARK-27915 (a newer ticket filed by me) as a

[jira] [Commented] (SPARK-28304) FileFormatWriter introduces an uncoditional sort, even when all attributes are constants

2019-07-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884801#comment-16884801 ] Josh Rosen commented on SPARK-28304: I'm linking SPARK-21317 as a related issue: that older ticket

[jira] [Resolved] (SPARK-24786) Executors not being released after all cached data is unpersisted

2019-07-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-24786. Resolution: Duplicate I'm resolving this PR because it's marked as a duplicate of SPARK-20286,

[jira] [Resolved] (SPARK-23992) ShuffleDependency does not need to be deserialized every time

2019-07-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-23992. Resolution: Won't Fix Per my comment on GitHub at

[jira] [Comment Edited] (SPARK-27991) ShuffleBlockFetcherIterator should take Netty constant-factor overheads into account when limiting number of simultaneous block fetches

2019-07-10 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882532#comment-16882532 ] Josh Rosen edited comment on SPARK-27991 at 7/11/19 12:28 AM: -- I've tried

[jira] [Commented] (SPARK-27991) ShuffleBlockFetcherIterator should take Netty constant-factor overheads into account when limiting number of simultaneous block fetches

2019-07-10 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882532#comment-16882532 ] Josh Rosen commented on SPARK-27991: I've tried to come up with a standalone reproduction of this

[jira] [Created] (SPARK-28340) Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException"

2019-07-10 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-28340: -- Summary: Noisy exceptions when tasks are killed: "DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file: java.nio.channels.ClosedByInterruptException" Key: SPARK-28340

[jira] [Comment Edited] (SPARK-27570) java.io.EOFException Reached the end of stream - Reading Parquet from Swift

2019-07-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881623#comment-16881623 ] Josh Rosen edited comment on SPARK-27570 at 7/10/19 12:22 AM: -- I ran into a

[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2019-07-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881625#comment-16881625 ] Josh Rosen commented on SPARK-25966: Cross-post: there's discussion of a similar issue at

[jira] [Commented] (SPARK-27570) java.io.EOFException Reached the end of stream - Reading Parquet from Swift

2019-07-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881623#comment-16881623 ] Josh Rosen commented on SPARK-27570: I ran into a very similar issue, except I was reading from S3

[jira] [Reopened] (SPARK-11309) Clean up hacky use of MemoryManager inside of HashedRelation

2019-07-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reopened SPARK-11309: I'm re-opening this issue because this problem is still relevant. There's some discussion of this

[jira] [Updated] (SPARK-28266) data correctness issue: data duplication when `path` serde peroperty is present

2019-07-07 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28266: --- Labels: correctness (was: ) > data correctness issue: data duplication when `path` serde peroperty

[jira] [Commented] (SPARK-28200) Decimal overflow handling in ExpressionEncoder

2019-06-29 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875568#comment-16875568 ] Josh Rosen commented on SPARK-28200: MickJermsurawong and I have a patch for this, including tests

[jira] [Updated] (SPARK-28200) Decimal overflow handling in ExpressionEncoder

2019-06-28 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28200: --- Summary: Decimal overflow handling in ExpressionEncoder (was: Overflow handling in

[jira] [Created] (SPARK-28176) Add Dataset.collect(PartialFunction) method for parity with RDD API

2019-06-26 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-28176: -- Summary: Add Dataset.collect(PartialFunction) method for parity with RDD API Key: SPARK-28176 URL: https://issues.apache.org/jira/browse/SPARK-28176 Project: Spark

[jira] [Updated] (SPARK-28166) Query optimization for symmetric difference / disjunctive union of Datasets

2019-06-25 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28166: --- Description: The *symmetric difference* (a.k.a. *disjunctive union*) of two sets is their set

[jira] [Created] (SPARK-28166) Query optimization for symmetric difference / disjunctive union of Datasets

2019-06-25 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-28166: -- Summary: Query optimization for symmetric difference / disjunctive union of Datasets Key: SPARK-28166 URL: https://issues.apache.org/jira/browse/SPARK-28166 Project:

[jira] [Commented] (SPARK-16474) Global Aggregation doesn't seem to work at all

2019-06-24 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871798#comment-16871798 ] Josh Rosen commented on SPARK-16474: I just ran into this same issue. The problem here is that

[jira] [Updated] (SPARK-26038) Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long

2019-06-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-26038: --- Fix Version/s: 2.4.4 > Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in

[jira] [Commented] (SPARK-26038) Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long

2019-06-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869554#comment-16869554 ] Josh Rosen commented on SPARK-26038: Backported for 2.4.4 in

[jira] [Commented] (SPARK-28024) Incorrect numeric values when out of range

2019-06-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869552#comment-16869552 ] Josh Rosen commented on SPARK-28024: I've linked two existing, related tickets: * SPARK-26218 (fail

[jira] [Updated] (SPARK-28067) Incorrect results in decimal aggregation with whole-stage code gen enabled

2019-06-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28067: --- Labels: correctness (was: ) > Incorrect results in decimal aggregation with whole-stage code gen

[jira] [Updated] (SPARK-26038) Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long

2019-06-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-26038: --- Labels: correctness (was: ) We just independently rediscovered this bug. I'm adding the

[jira] [Comment Edited] (SPARK-26038) Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long

2019-06-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868926#comment-16868926 ] Josh Rosen edited comment on SPARK-26038 at 6/20/19 8:47 PM: - We just

[jira] [Resolved] (SPARK-28112) Fix Kryo exception perf. bottleneck in tests due to absence of ML/MLlib classes

2019-06-19 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-28112. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24916

[jira] [Commented] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors

2019-06-19 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868178#comment-16868178 ] Josh Rosen commented on SPARK-26555: Backported for 2.4.4 in

[jira] [Updated] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors

2019-06-19 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-26555: --- Fix Version/s: 2.4.4 > Thread safety issue causes createDataset to fail with misleading errors >

[jira] [Resolved] (SPARK-28102) Failed LZ4 JNI initialization is repeatedly re-attempted, causing lock contention issues

2019-06-19 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-28102. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24905

[jira] [Resolved] (SPARK-27839) Improve UTF8String.replace() / StringReplace performance

2019-06-19 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-27839. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24707

[jira] [Updated] (SPARK-28102) Failed LZ4 JNI initialization is repeatedly re-attempted, causing lock contention issues

2019-06-19 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28102: --- Description: Spark's use of {{lz4-java}} ends up calling {{LZ4Factory.fastestInstance}}, which

[jira] [Updated] (SPARK-28102) Failed LZ4 JNI initialization is repeatedly re-attempted, causing lock contention issues

2019-06-19 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28102: --- Summary: Failed LZ4 JNI initialization is repeatedly re-attempted, causing lock contention issues

[jira] [Created] (SPARK-28102) Add configuration for selecting LZ4 implementation (safe, unsafe, JNI)

2019-06-18 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-28102: -- Summary: Add configuration for selecting LZ4 implementation (safe, unsafe, JNI) Key: SPARK-28102 URL: https://issues.apache.org/jira/browse/SPARK-28102 Project: Spark

[jira] [Updated] (SPARK-28007) Caret operator (^) means bitwise XOR in Spark/Hive and exponentiation in Postgres/Redshift

2019-06-12 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28007: --- Description: The expression {{expr1 ^ expr2}} has different meanings in Spark and Postgres: * [In

[jira] [Updated] (SPARK-28007) Caret operator (^) means bitwise XOR in Spark/Hive and exponentiation in Postgres/Redshift

2019-06-11 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-28007: --- Summary: Caret operator (^) means bitwise XOR in Spark/Hive and exponentiation in Postgres/Redshift

[jira] [Created] (SPARK-28007) Caret operator (^) means bitwise XOR in Spark and exponentiation in Postgres/Redshift

2019-06-11 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-28007: -- Summary: Caret operator (^) means bitwise XOR in Spark and exponentiation in Postgres/Redshift Key: SPARK-28007 URL: https://issues.apache.org/jira/browse/SPARK-28007

[jira] [Created] (SPARK-27991) ShuffleBlockFetcherIterator should take Netty constant-factor overheads into account when limiting number of simultaneous block fetches

2019-06-10 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27991: -- Summary: ShuffleBlockFetcherIterator should take Netty constant-factor overheads into account when limiting number of simultaneous block fetches Key: SPARK-27991 URL:

[jira] [Commented] (SPARK-27972) Move SQL migration guide to the top level

2019-06-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859538#comment-16859538 ] Josh Rosen commented on SPARK-27972: This reminds me: we should probably link these guides from the

[jira] [Commented] (SPARK-27969) Non-deterministic expressions in filters or projects can unnecessarily prevent all scan-time column pruning, harming performance

2019-06-06 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858208#comment-16858208 ] Josh Rosen commented on SPARK-27969: It looks like this issue has been reported twice in the past:

[jira] [Updated] (SPARK-27969) Non-deterministic expressions in filters or projects can unnecessarily prevent all scan-time column pruning, harming performance

2019-06-06 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27969: --- Component/s: Optimizer > Non-deterministic expressions in filters or projects can unnecessarily >

[jira] [Commented] (SPARK-27761) Make UDF nondeterministic by default(?)

2019-06-06 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858155#comment-16858155 ] Josh Rosen commented on SPARK-27761: FYI, I'm marking SPARK-27969 as a blocker to this because

[jira] [Created] (SPARK-27969) Non-deterministic expressions in filters or projects can unnecessarily prevent all scan-time column pruning, harming performance

2019-06-06 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27969: -- Summary: Non-deterministic expressions in filters or projects can unnecessarily prevent all scan-time column pruning, harming performance Key: SPARK-27969 URL:

[jira] [Updated] (SPARK-27940) SubtractedRDD is OOM-prone because it does not support spilling

2019-06-03 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27940: --- Description: {{SubtractedRDD}}, which is used to implement {{RDD.subtract()}} and

[jira] [Updated] (SPARK-27940) SubtractedRDD is OOM-prone because it does not support spilling

2019-06-03 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27940: --- Description: {{SubtractedRDD}}, which is used to implement {{RDD.subtract()}} and

[jira] [Updated] (SPARK-27940) SubtractedRDD is OOM-prone because it does not support spilling

2019-06-03 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27940: --- Description: {{SubtractedRDD}}, which is used to implement {{RDD.subtract()}} and

[jira] [Created] (SPARK-27940) SubtractedRDD is OOM-prone because it does not support spilling

2019-06-03 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27940: -- Summary: SubtractedRDD is OOM-prone because it does not support spilling Key: SPARK-27940 URL: https://issues.apache.org/jira/browse/SPARK-27940 Project: Spark

[jira] [Created] (SPARK-27915) Update logical Filter's output nullability based on IsNotNull conditions

2019-06-01 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27915: -- Summary: Update logical Filter's output nullability based on IsNotNull conditions Key: SPARK-27915 URL: https://issues.apache.org/jira/browse/SPARK-27915 Project: Spark

[jira] [Assigned] (SPARK-27684) Reduce ScalaUDF conversion overheads for primitives

2019-05-30 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-27684: -- Assignee: Marco Gaido > Reduce ScalaUDF conversion overheads for primitives >

[jira] [Resolved] (SPARK-27684) Reduce ScalaUDF conversion overheads for primitives

2019-05-30 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-27684. Resolution: Fixed Fix Version/s: 3.0.0 Fixed for 3.0 in

[jira] [Commented] (SPARK-27785) Introduce .joinWith() overloads for typed inner joins of 3 or more tables

2019-05-30 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852089#comment-16852089 ] Josh Rosen commented on SPARK-27785: I think this might require a little bit more design work before

[jira] [Created] (SPARK-27846) Eagerly compute Configuration.properties in sc.hadoopConfiguration

2019-05-26 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27846: -- Summary: Eagerly compute Configuration.properties in sc.hadoopConfiguration Key: SPARK-27846 URL: https://issues.apache.org/jira/browse/SPARK-27846 Project: Spark

[jira] [Created] (SPARK-27841) Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII

2019-05-25 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27841: -- Summary: Improve UTF8String fromString()/toString()/numChars() performance when strings are ASCII Key: SPARK-27841 URL: https://issues.apache.org/jira/browse/SPARK-27841

[jira] [Created] (SPARK-27839) Improve UTF8String.replace() / StringReplace performance

2019-05-25 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27839: -- Summary: Improve UTF8String.replace() / StringReplace performance Key: SPARK-27839 URL: https://issues.apache.org/jira/browse/SPARK-27839 Project: Spark Issue

[jira] [Created] (SPARK-27829) In Dataset.joinWith inner joins, don't nest data before shuffling

2019-05-23 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27829: -- Summary: In Dataset.joinWith inner joins, don't nest data before shuffling Key: SPARK-27829 URL: https://issues.apache.org/jira/browse/SPARK-27829 Project: Spark

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2019-05-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845603#comment-16845603 ] Josh Rosen commented on SPARK-19468: Chiming in to add a strong +1 here, since this seems like it

[jira] [Updated] (SPARK-27799) Allow SerializerManager.canUseKryo whitelist to be extended via a configuration

2019-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27799: --- Issue Type: New Feature (was: Bug) > Allow SerializerManager.canUseKryo whitelist to be extended

[jira] [Updated] (SPARK-27799) Allow SerializerManager.canUseKryo whitelist to be extended via a configuration

2019-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27799: --- Description: Kryo serialization can offer a substantial performance boost compared to Java

[jira] [Updated] (SPARK-27799) Allow SerializerManager.canUseKryo whitelist to be extended via a configuration

2019-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27799: --- Description: Kryo serialization can offer a substantial performance boost compared to Java

[jira] [Updated] (SPARK-27799) Allow SerializerManager.canUseKryo whitelist to be extended via a configuration

2019-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27799: --- Description: Kryo serialization can offer a substantial performance boost compared to Java

[jira] [Created] (SPARK-27799) Allow SerializerManager.canUseKryo to be customized via configuration

2019-05-21 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27799: -- Summary: Allow SerializerManager.canUseKryo to be customized via configuration Key: SPARK-27799 URL: https://issues.apache.org/jira/browse/SPARK-27799 Project: Spark

[jira] [Updated] (SPARK-27799) Allow SerializerManager.canUseKryo whitelist to be extended via a configuration

2019-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27799: --- Summary: Allow SerializerManager.canUseKryo whitelist to be extended via a configuration (was:

[jira] [Commented] (SPARK-23978) Kryo much slower when mllib jar not on classpath

2019-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845336#comment-16845336 ] Josh Rosen commented on SPARK-23978: +1; I've also seen this in unit tests of my own Spark

[jira] [Commented] (SPARK-27676) InMemoryFileIndex should hard-fail on missing files instead of logging and continuing

2019-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845249#comment-16845249 ] Josh Rosen commented on SPARK-27676: +1 [~ste...@apache.org]: I agree that this is by no means

[jira] [Assigned] (SPARK-27676) InMemoryFileIndex should hard-fail on missing files instead of logging and continuing

2019-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-27676: -- Assignee: Josh Rosen > InMemoryFileIndex should hard-fail on missing files instead of

[jira] [Updated] (SPARK-27786) SHA1, MD5, and Base64 expression codegen doesn't work when commons-codec is shaded

2019-05-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27786: --- Description: When running a custom build of Spark which shades {{commons-codec}}, the {{sha1Hex}}

[jira] [Created] (SPARK-27786) SHA1, MD5, and Base64 expression codegen doesn't work when commons-codec is shaded

2019-05-20 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27786: -- Summary: SHA1, MD5, and Base64 expression codegen doesn't work when commons-codec is shaded Key: SPARK-27786 URL: https://issues.apache.org/jira/browse/SPARK-27786

[jira] [Updated] (SPARK-27785) Introduce .joinWith() overloads for typed inner joins of 3 or more tables

2019-05-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27785: --- Description: Today it's rather painful to do a typed dataset join of more than two tables:

[jira] [Updated] (SPARK-27785) Introduce .joinWith() overloads for typed inner joins of 3 or more tables

2019-05-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27785: --- Description: Today it's rather painful to do a typed dataset join of more than two tables:

[jira] [Updated] (SPARK-27785) Introduce .joinWith() overloads for typed inner joins of 3 or more tables

2019-05-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27785: --- Summary: Introduce .joinWith() overloads for typed inner joins of 3 or more tables (was: Introduce

[jira] [Created] (SPARK-27785) Introduce .joinWith() overload for inner join of 3 or more tables

2019-05-20 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27785: -- Summary: Introduce .joinWith() overload for inner join of 3 or more tables Key: SPARK-27785 URL: https://issues.apache.org/jira/browse/SPARK-27785 Project: Spark

[jira] [Commented] (SPARK-27726) Performance of InMemoryStore suffers under load

2019-05-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840948#comment-16840948 ] Josh Rosen commented on SPARK-27726: Thanks for the detailed bug repot! I appreciate the performance

[jira] [Updated] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations

2019-05-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27736: --- Description: This ticket describes a fault-tolerance edge-case which can cause Spark jobs to fail

[jira] [Updated] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations

2019-05-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27736: --- Description: I have discovered a fault-tolerance edge-case which can cause Spark jobs to fail if a

[jira] [Updated] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations

2019-05-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27736: --- Description: I have discovered a fault-tolerance edge-case which can cause Spark jobs to fail if a

[jira] [Updated] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations

2019-05-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27736: --- Description: I have discovered a fault-tolerance edge-case which can cause Spark jobs to fail if a

[jira] [Created] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations

2019-05-15 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27736: -- Summary: Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations Key: SPARK-27736 URL:

[jira] [Commented] (SPARK-27684) Reduce ScalaUDF conversion overheads for primitives

2019-05-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839968#comment-16839968 ] Josh Rosen commented on SPARK-27684: I'm not planning to work on this right now, so this ticket is

[jira] [Updated] (SPARK-27712) createDataFrame() reorders row

2019-05-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27712: --- Labels: correctness (was: ) > createDataFrame() reorders row > -- > >

[jira] [Updated] (SPARK-27710) ClassNotFoundException: $line196400984558.$read$ in OuterScopes

2019-05-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-27710: --- Issue Type: Bug (was: Improvement) > ClassNotFoundException: $line196400984558.$read$ in

[jira] [Created] (SPARK-27710) ClassNotFoundException: $line196400984558.$read$ in OuterScopes

2019-05-14 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-27710: -- Summary: ClassNotFoundException: $line196400984558.$read$ in OuterScopes Key: SPARK-27710 URL: https://issues.apache.org/jira/browse/SPARK-27710 Project: Spark

<    1   2   3   4   5   6   7   8   9   10   >