[jira] [Commented] (SPARK-47063) CAST long to timestamp has different behavior for codegen vs interpreted

2024-02-26 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820877#comment-17820877 ] Robert Joseph Evans commented on SPARK-47063: - [~planga82] I was not planning on putting up

[jira] [Created] (SPARK-47063) CAST long to timestamp has different behavior for codegen vs interpreted

2024-02-15 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-47063: --- Summary: CAST long to timestamp has different behavior for codegen vs interpreted Key: SPARK-47063 URL: https://issues.apache.org/jira/browse/SPARK-47063

[jira] [Created] (SPARK-46778) get_json_object flattens wildcard queries that match a single value

2024-01-19 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-46778: --- Summary: get_json_object flattens wildcard queries that match a single value Key: SPARK-46778 URL: https://issues.apache.org/jira/browse/SPARK-46778

[jira] [Created] (SPARK-46761) quoted strings in a JSON path should support ? characters

2024-01-18 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-46761: --- Summary: quoted strings in a JSON path should support ? characters Key: SPARK-46761 URL: https://issues.apache.org/jira/browse/SPARK-46761 Project:

[jira] [Updated] (SPARK-45879) Number check for InputFileBlockSources is missing for V2 source (BatchScan) ?

2023-11-13 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-45879: Affects Version/s: 3.4.1 3.2.3 > Number check for

[jira] [Updated] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset

2023-10-18 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-45599: Priority: Blocker (was: Major) > Percentile can produce a wrong answer if -0.0

[jira] [Updated] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset

2023-10-18 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-45599: Labels: data-corruption (was: ) > Percentile can produce a wrong answer if -0.0

[jira] [Created] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset

2023-10-18 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-45599: --- Summary: Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset Key: SPARK-45599 URL: https://issues.apache.org/jira/browse/SPARK-45599

[jira] [Created] (SPARK-45243) RADIX sort is not stable and can produce different results for first/collect_list aggs

2023-09-20 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-45243: --- Summary: RADIX sort is not stable and can produce different results for first/collect_list aggs Key: SPARK-45243 URL:

[jira] [Created] (SPARK-44500) parse_url treats key as regular expression

2023-07-20 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-44500: --- Summary: parse_url treats key as regular expression Key: SPARK-44500 URL: https://issues.apache.org/jira/browse/SPARK-44500 Project: Spark

[jira] [Created] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used

2023-03-22 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-42898: --- Summary: Cast from string to date and date to string say timezone is needed, but it is not used Key: SPARK-42898 URL:

[jira] [Created] (SPARK-41218) ParquetTable reports is supports negative scale decimal values

2022-11-21 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-41218: --- Summary: ParquetTable reports is supports negative scale decimal values Key: SPARK-41218 URL: https://issues.apache.org/jira/browse/SPARK-41218

[jira] [Created] (SPARK-40280) Failure to create parquet predicate push down for ints and longs on some valid files

2022-08-30 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-40280: --- Summary: Failure to create parquet predicate push down for ints and longs on some valid files Key: SPARK-40280 URL:

[jira] [Created] (SPARK-40129) Decimal multiply can produce the wrong answer because it rounds twice

2022-08-17 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-40129: --- Summary: Decimal multiply can produce the wrong answer because it rounds twice Key: SPARK-40129 URL: https://issues.apache.org/jira/browse/SPARK-40129

[jira] [Commented] (SPARK-40089) Sorting of at least Decimal(20, 2) fails for some values near the max.

2022-08-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580434#comment-17580434 ] Robert Joseph Evans commented on SPARK-40089: - I put up a PR

[jira] [Updated] (SPARK-40089) Sorting of at least Decimal(20, 2) fails for some values near the max.

2022-08-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-40089: Summary: Sorting of at least Decimal(20, 2) fails for some values near the max.

[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580377#comment-17580377 ] Robert Joseph Evans commented on SPARK-40089: - Never mind I figured out that there is a

[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580360#comment-17580360 ] Robert Joseph Evans commented on SPARK-40089: - I have been trying to come up with a patch,

[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579897#comment-17579897 ] Robert Joseph Evans commented on SPARK-40089: - Looking at the code it appears that the

[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579892#comment-17579892 ] Robert Joseph Evans commented on SPARK-40089: - It sure looks like it is related to the

[jira] [Commented] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579887#comment-17579887 ] Robert Joseph Evans commented on SPARK-40089: - I have been trying to debug this and it does

[jira] [Updated] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-40089: Attachment: input.parquet > Doring of at least Decimal(20, 2) fails for some

[jira] [Created] (SPARK-40089) Doring of at least Decimal(20, 2) fails for some values near the max.

2022-08-15 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-40089: --- Summary: Doring of at least Decimal(20, 2) fails for some values near the max. Key: SPARK-40089 URL: https://issues.apache.org/jira/browse/SPARK-40089

[jira] [Created] (SPARK-39031) NaN != NaN in pivot

2022-04-26 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-39031: --- Summary: NaN != NaN in pivot Key: SPARK-39031 URL: https://issues.apache.org/jira/browse/SPARK-39031 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-38955) from_csv can corrupt surrounding lines if a lineSep is in the data

2022-04-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525310#comment-17525310 ] Robert Joseph Evans commented on SPARK-38955: - Conceptually I am fine if we want to remove

[jira] [Created] (SPARK-38955) from_csv can corrupt surrounding lines if a lineSep is in the data

2022-04-19 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-38955: --- Summary: from_csv can corrupt surrounding lines if a lineSep is in the data Key: SPARK-38955 URL: https://issues.apache.org/jira/browse/SPARK-38955

[jira] [Commented] (SPARK-38604) ceil and floor return different types when called from scala than sql

2022-03-19 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509265#comment-17509265 ] Robert Joseph Evans commented on SPARK-38604: - I marked this as a critical because it

[jira] [Updated] (SPARK-38604) ceil and floor return different types when called from scala than sql

2022-03-19 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-38604: Priority: Blocker (was: Major) > ceil and floor return different types when

[jira] [Updated] (SPARK-38604) ceil and floor return different types when called from scala than sql

2022-03-19 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-38604: Priority: Critical (was: Blocker) > ceil and floor return different types when

[jira] [Created] (SPARK-38604) ceil and floor return different types when called from scala than sql

2022-03-19 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-38604: --- Summary: ceil and floor return different types when called from scala than sql Key: SPARK-38604 URL: https://issues.apache.org/jira/browse/SPARK-38604

[jira] [Commented] (SPARK-38577) Interval types are not truncated to the expected endField when creating a DataFrame via Duration

2022-03-17 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508269#comment-17508269 ] Robert Joseph Evans commented on SPARK-38577: - This is especially problematic because it is

[jira] [Created] (SPARK-37024) Even more decimal overflow issues in average

2021-10-16 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-37024: --- Summary: Even more decimal overflow issues in average Key: SPARK-37024 URL: https://issues.apache.org/jira/browse/SPARK-37024 Project: Spark

[jira] [Commented] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-06-23 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368098#comment-17368098 ] Robert Joseph Evans commented on SPARK-35563: - Or just do the overflow check on the int. I

[jira] [Commented] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-06-23 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368096#comment-17368096 ] Robert Joseph Evans commented on SPARK-35563: - Yes, technically if we switch it from an int

[jira] [Comment Edited] (SPARK-35089) non consistent results running count for same dataset after filter and lead window function

2021-06-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364334#comment-17364334 ] Robert Joseph Evans edited comment on SPARK-35089 at 6/16/21, 2:54 PM:

[jira] [Commented] (SPARK-35089) non consistent results running count for same dataset after filter and lead window function

2021-06-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364334#comment-17364334 ] Robert Joseph Evans commented on SPARK-35089: - {quote}I understand ordering data, but I

[jira] [Commented] (SPARK-35089) non consistent results running count for same dataset after filter and lead window function

2021-06-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364277#comment-17364277 ] Robert Joseph Evans commented on SPARK-35089: - [~Tonzetic], I don't know what you mean by an

[jira] [Commented] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-06-14 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362910#comment-17362910 ] Robert Joseph Evans commented on SPARK-35563: - [~dc-heros] Thanks for looking into this. I

[jira] [Updated] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-06-01 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-35563: Labels: data-loss (was: ) > [SQL] Window operations with over Int.MaxValue + 1

[jira] [Updated] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-06-01 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-35563: Priority: Blocker (was: Major) > [SQL] Window operations with over Int.MaxValue

[jira] [Commented] (SPARK-35089) non consistent results running count for same dataset after filter and lead window function

2021-06-01 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355088#comment-17355088 ] Robert Joseph Evans commented on SPARK-35089: - I should add that the above "solution" is

[jira] [Commented] (SPARK-35089) non consistent results running count for same dataset after filter and lead window function

2021-06-01 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355083#comment-17355083 ] Robert Joseph Evans commented on SPARK-35089: - [~Tonzetic] to be clear my point was just to

[jira] [Commented] (SPARK-35089) non consistent results running count for same dataset after filter and lead window function

2021-05-29 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353778#comment-17353778 ] Robert Joseph Evans commented on SPARK-35089: - On window functions if the {{order by}}

[jira] [Updated] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-05-29 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-35563: Priority: Blocker (was: Major) > [SQL] Window operations with over Int.MaxValue

[jira] [Created] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2021-05-29 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-35563: --- Summary: [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows Key: SPARK-35563 URL: https://issues.apache.org/jira/browse/SPARK-35563

[jira] [Comment Edited] (SPARK-35108) Pickle produces incorrect key labels for GenericRowWithSchema (data corruption)

2021-05-04 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338948#comment-17338948 ] Robert Joseph Evans edited comment on SPARK-35108 at 5/4/21, 12:34 PM:

[jira] [Commented] (SPARK-35108) Pickle produces incorrect key labels for GenericRowWithSchema (data corruption)

2021-05-04 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338948#comment-17338948 ] Robert Joseph Evans commented on SPARK-35108: - Looks good. Thanks for the fix. > Pickle

[jira] [Commented] (SPARK-35108) Pickle produces incorrect key labels for GenericRowWithSchema (data corruption)

2021-04-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324088#comment-17324088 ] Robert Joseph Evans commented on SPARK-35108: - If you have SPARK_HOME set when you run

[jira] [Updated] (SPARK-35108) Pickle produces incorrect key labels for GenericRowWithSchema (data corruption)

2021-04-16 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-35108: Attachment: test.sh test.py > Pickle produces incorrect key

[jira] [Created] (SPARK-35108) Pickle produces incorrect key labels for GenericRowWithSchema (data corruption)

2021-04-16 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-35108: --- Summary: Pickle produces incorrect key labels for GenericRowWithSchema (data corruption) Key: SPARK-35108 URL: https://issues.apache.org/jira/browse/SPARK-35108

[jira] [Commented] (SPARK-32110) -0.0 vs 0.0 is inconsistent

2020-12-10 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247362#comment-17247362 ] Robert Joseph Evans commented on SPARK-32110: - Thanks [~tanelk] then resolving this is fine.

[jira] [Commented] (SPARK-32110) -0.0 vs 0.0 is inconsistent

2020-12-10 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247305#comment-17247305 ] Robert Joseph Evans commented on SPARK-32110: - I have not tried this again on the latest

[jira] [Commented] (SPARK-32110) -0.0 vs 0.0 is inconsistent

2020-12-09 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246585#comment-17246585 ] Robert Joseph Evans commented on SPARK-32110: - Are we sure that this issue should be closed?

[jira] [Commented] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-22 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182345#comment-17182345 ] Robert Joseph Evans commented on SPARK-32672: - Honestly, it is not a big deal what happened.

[jira] [Commented] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-21 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181873#comment-17181873 ] Robert Joseph Evans commented on SPARK-32672: - OK reading through the code I understand what

[jira] [Updated] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-21 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-32672: Attachment: small_bad.snappy.parquet > Data corruption in some cached compressed

[jira] [Commented] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-21 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181868#comment-17181868 ] Robert Joseph Evans commented on SPARK-32672: - So I am able to reduce the corruption down to

[jira] [Updated] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-32672: Affects Version/s: 3.1.0 > Data corruption in some cached compressed boolean

[jira] [Commented] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181486#comment-17181486 ] Robert Joseph Evans commented on SPARK-32672: - I added some debugging to the compression

[jira] [Commented] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181478#comment-17181478 ] Robert Joseph Evans commented on SPARK-32672: - I did a little debugging and found that

[jira] [Commented] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181466#comment-17181466 ] Robert Joseph Evans commented on SPARK-32672: - I verified that this is still happening on

[jira] [Commented] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181459#comment-17181459 ] Robert Joseph Evans commented on SPARK-32672: - I verified that this is still happening on

[jira] [Updated] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-32672: Affects Version/s: 2.4.6 > Data corruption in some cached compressed boolean

[jira] [Updated] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-32672: Summary: Data corruption in some cached compressed boolean columns (was: Daat

[jira] [Updated] (SPARK-32672) Data corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-32672: Attachment: bad_order.snappy.parquet > Data corruption in some cached compressed

[jira] [Created] (SPARK-32672) Daat corruption in some cached compressed boolean columns

2020-08-20 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-32672: --- Summary: Daat corruption in some cached compressed boolean columns Key: SPARK-32672 URL: https://issues.apache.org/jira/browse/SPARK-32672 Project:

[jira] [Commented] (SPARK-32612) int columns produce inconsistent results on pandas UDFs

2020-08-17 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178927#comment-17178927 ] Robert Joseph Evans commented on SPARK-32612: - This is just one example that shows what can

[jira] [Created] (SPARK-32612) int columns produce inconsistent results on pandas UDFs

2020-08-13 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-32612: --- Summary: int columns produce inconsistent results on pandas UDFs Key: SPARK-32612 URL: https://issues.apache.org/jira/browse/SPARK-32612 Project: Spark

[jira] [Commented] (SPARK-32334) Investigate commonizing Columnar and Row data transformations

2020-07-23 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163576#comment-17163576 ] Robert Joseph Evans commented on SPARK-32334: - Row to columnar and columnar to row is mostly

[jira] [Commented] (SPARK-32334) Investigate commonizing Columnar and Row data transformations

2020-07-21 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162147#comment-17162147 ] Robert Joseph Evans commented on SPARK-32334: - I think I can get the conversation started

[jira] [Commented] (SPARK-32274) Add in the ability for a user to replace the serialization format of the cache

2020-07-10 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155660#comment-17155660 ] Robert Joseph Evans commented on SPARK-32274: - I filed

[jira] [Updated] (SPARK-32274) Add in the ability for a user to replace the serialization format of the cache

2020-07-10 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-32274: Description: Caching a dataset or dataframe can be a very expensive operation,

[jira] [Commented] (SPARK-32274) Add in the ability for a user to replace the serialization format of the cache

2020-07-10 Thread Robert Joseph Evans (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155654#comment-17155654 ] Robert Joseph Evans commented on SPARK-32274: - If someone could assign this to me that would

[jira] [Created] (SPARK-32274) Add in the ability for a user to replace the serialization format of the cache

2020-07-10 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-32274: --- Summary: Add in the ability for a user to replace the serialization format of the cache Key: SPARK-32274 URL: https://issues.apache.org/jira/browse/SPARK-32274

[jira] [Created] (SPARK-32110) -0.0 vs 0.0 is inconsistent

2020-06-26 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-32110: --- Summary: -0.0 vs 0.0 is inconsistent Key: SPARK-32110 URL: https://issues.apache.org/jira/browse/SPARK-32110 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-28774) ReusedExchangeExec cannot be columnar

2019-08-19 Thread Robert Joseph Evans (Jira)
Robert Joseph Evans created SPARK-28774: --- Summary: ReusedExchangeExec cannot be columnar Key: SPARK-28774 URL: https://issues.apache.org/jira/browse/SPARK-28774 Project: Spark Issue

[jira] [Created] (SPARK-28213) Remove duplication between columnar and ColumnarBatchScan

2019-06-28 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created SPARK-28213: --- Summary: Remove duplication between columnar and ColumnarBatchScan Key: SPARK-28213 URL: https://issues.apache.org/jira/browse/SPARK-28213 Project:

[jira] [Updated] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-06-04 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-27396: Epic Name: Public APIs for extended Columnar Processing Support > SPIP: Public

[jira] [Created] (SPARK-27945) Make minimal changes to support columnar processing

2019-06-04 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created SPARK-27945: --- Summary: Make minimal changes to support columnar processing Key: SPARK-27945 URL: https://issues.apache.org/jira/browse/SPARK-27945 Project: Spark

[jira] [Updated] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-06-04 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-27396: Issue Type: Epic (was: Improvement) > SPIP: Public APIs for extended Columnar

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-05-03 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832776#comment-16832776 ] Robert Joseph Evans commented on SPARK-27396: - [~bryanc] The nice to have arrow formatting

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-29 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829631#comment-16829631 ] Robert Joseph Evans commented on SPARK-27396: - I have updated this SPIP to clarify some

[jira] [Updated] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-29 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-27396: Description: *SPIP: Columnar Processing Without Arrow Formatting Guarantees.*  

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822411#comment-16822411 ] Robert Joseph Evans commented on SPARK-27396: - [~mengxr], My goal is to provide a framework

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-16 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819592#comment-16819592 ] Robert Joseph Evans commented on SPARK-27396: - This SPIP is to put a framework in place to

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-16 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819125#comment-16819125 ] Robert Joseph Evans commented on SPARK-27396: - [~bryanc], I see your point that if this is

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-15 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817990#comment-16817990 ] Robert Joseph Evans commented on SPARK-27396: - There are actually a few public facing APIs I 

[jira] [Commented] (SPARK-26413) SPIP: RDD Arrow Support in Spark Core and PySpark

2019-04-13 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817081#comment-16817081 ] Robert Joseph Evans commented on SPARK-26413: - SPARK-27396 covers this, but with a slightly

[jira] [Commented] (SPARK-24579) SPIP: Standardize Optimized Data Exchange between Spark and DL/AI frameworks

2019-04-13 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817080#comment-16817080 ] Robert Joseph Evans commented on SPARK-24579: - This SPIP SPARK-27396 covers a superset of

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-11 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815846#comment-16815846 ] Robert Joseph Evans commented on SPARK-27396: - [~kiszk],   The exact detail of some of the

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-11 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815426#comment-16815426 ] Robert Joseph Evans commented on SPARK-27396: - Thanks [~tgraves] I updated the JIRA with you

[jira] [Updated] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-11 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated SPARK-27396: Shepherd: Thomas Graves > SPIP: Public APIs for extended Columnar Processing

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-10 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814883#comment-16814883 ] Robert Joseph Evans commented on SPARK-27396: - This SPIP has been up for 5 days and I see 10

[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-05 Thread Robert Joseph Evans (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811183#comment-16811183 ] Robert Joseph Evans commented on SPARK-27396: - I have kept this at a high level just

[jira] [Created] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support

2019-04-05 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created SPARK-27396: --- Summary: SPIP: Public APIs for extended Columnar Processing Support Key: SPARK-27396 URL: https://issues.apache.org/jira/browse/SPARK-27396 Project: