[jira] [Created] (SPARK-30832) SQL function doc headers should link to anchors

2020-02-14 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30832: Summary: SQL function doc headers should link to anchors Key: SPARK-30832 URL: https://issues.apache.org/jira/browse/SPARK-30832 Project: Spark

[jira] [Created] (SPARK-30731) Refine doc-building workflow

2020-02-04 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30731: Summary: Refine doc-building workflow Key: SPARK-30731 URL: https://issues.apache.org/jira/browse/SPARK-30731 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-30510) Publicly document options under spark.sql.*

2020-01-31 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30510: - Description: SPARK-20236 added a new option,

[jira] [Updated] (SPARK-30510) Publicly document options under spark.sql.*

2020-01-31 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30510: - Summary: Publicly document options under spark.sql.* (was: Document

[jira] [Updated] (SPARK-30665) Eliminate pypandoc dependency

2020-01-29 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30665: - Summary: Eliminate pypandoc dependency (was: Remove Pandoc dependency in PySpark

[jira] [Created] (SPARK-30672) numpy is a dependency for building PySpark API docs

2020-01-29 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30672: Summary: numpy is a dependency for building PySpark API docs Key: SPARK-30672 URL: https://issues.apache.org/jira/browse/SPARK-30672 Project: Spark

[jira] [Commented] (SPARK-30665) Remove Pandoc dependency in PySpark setup.py

2020-01-29 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026399#comment-17026399 ] Nicholas Chammas commented on SPARK-30665: --  > Remove Pandoc dependency in PySpark setup.py >

[jira] [Created] (SPARK-30665) Remove Pandoc dependency in PySpark setup.py

2020-01-28 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30665: Summary: Remove Pandoc dependency in PySpark setup.py Key: SPARK-30665 URL: https://issues.apache.org/jira/browse/SPARK-30665 Project: Spark Issue

[jira] [Commented] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2020-01-23 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022576#comment-17022576 ] Nicholas Chammas commented on SPARK-19248: -- Thanks for getting to the bottom of the issue,

[jira] [Resolved] (SPARK-30557) Add public documentation for SPARK_SUBMIT_OPTS

2020-01-23 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-30557. -- Resolution: Won't Fix > Add public documentation for SPARK_SUBMIT_OPTS >

[jira] [Commented] (SPARK-30557) Add public documentation for SPARK_SUBMIT_OPTS

2020-01-17 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018254#comment-17018254 ] Nicholas Chammas commented on SPARK-30557: -- [~vanzin] - Do you know if this is something we

[jira] [Created] (SPARK-30557) Add public documentation for SPARK_SUBMIT_OPTS

2020-01-17 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30557: Summary: Add public documentation for SPARK_SUBMIT_OPTS Key: SPARK-30557 URL: https://issues.apache.org/jira/browse/SPARK-30557 Project: Spark Issue

[jira] [Commented] (SPARK-30510) Document spark.sql.sources.partitionOverwriteMode

2020-01-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015473#comment-17015473 ] Nicholas Chammas commented on SPARK-30510: -- [~hyukjin.kwon] I think I'm missing something here

[jira] [Created] (SPARK-30510) Document spark.sql.sources.partitionOverwriteMode

2020-01-14 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30510: Summary: Document spark.sql.sources.partitionOverwriteMode Key: SPARK-30510 URL: https://issues.apache.org/jira/browse/SPARK-30510 Project: Spark

[jira] [Created] (SPARK-30173) Automatically close stale PRs

2019-12-08 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30173: Summary: Automatically close stale PRs Key: SPARK-30173 URL: https://issues.apache.org/jira/browse/SPARK-30173 Project: Spark Issue Type:

[jira] [Updated] (SPARK-30128) Promote remaining "hidden" PySpark DataFrameReader options to load APIs

2019-12-04 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30128: - Description: Following on to SPARK-29903 and similar issues (linked), there are options

[jira] [Created] (SPARK-30128) Promote remaining "hidden" PySpark DataFrameReader options to load APIs

2019-12-04 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30128: Summary: Promote remaining "hidden" PySpark DataFrameReader options to load APIs Key: SPARK-30128 URL: https://issues.apache.org/jira/browse/SPARK-30128

[jira] [Commented] (SPARK-27547) fix DataFrame self-join problems

2019-12-03 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987444#comment-16987444 ] Nicholas Chammas commented on SPARK-27547: -- Should this be marked as resolved by

[jira] [Updated] (SPARK-30091) Document mergeSchema option directly in the Python Parquet APIs

2019-12-03 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30091: - Summary: Document mergeSchema option directly in the Python Parquet APIs (was:

[jira] [Created] (SPARK-30113) Document mergeSchema option in Python Orc APIs

2019-12-03 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30113: Summary: Document mergeSchema option in Python Orc APIs Key: SPARK-30113 URL: https://issues.apache.org/jira/browse/SPARK-30113 Project: Spark Issue

[jira] [Updated] (SPARK-30091) Document mergeSchema option directly in the Python API

2019-12-01 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30091: - Affects Version/s: (was: 3.0.0) 2.4.4 > Document mergeSchema

[jira] [Created] (SPARK-30091) Document mergeSchema option directly in the Python API

2019-12-01 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30091: Summary: Document mergeSchema option directly in the Python API Key: SPARK-30091 URL: https://issues.apache.org/jira/browse/SPARK-30091 Project: Spark

[jira] [Created] (SPARK-30084) Add docs showing how to automatically rebuild Python API docs

2019-11-29 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30084: Summary: Add docs showing how to automatically rebuild Python API docs Key: SPARK-30084 URL: https://issues.apache.org/jira/browse/SPARK-30084 Project: Spark

[jira] [Commented] (SPARK-29903) Add documentation for recursiveFileLookup

2019-11-17 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976252#comment-16976252 ] Nicholas Chammas commented on SPARK-29903: -- Happy to do that. Going to wait for [this

[jira] [Commented] (SPARK-29903) Add documentation for recursiveFileLookup

2019-11-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974577#comment-16974577 ] Nicholas Chammas commented on SPARK-29903: -- cc [~cloud_fan] and [~weichenxu123] > Add

[jira] [Created] (SPARK-29903) Add documentation for recursiveFileLookup

2019-11-14 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-29903: Summary: Add documentation for recursiveFileLookup Key: SPARK-29903 URL: https://issues.apache.org/jira/browse/SPARK-29903 Project: Spark Issue

[jira] [Comment Edited] (SPARK-27990) Provide a way to recursively load data from datasource

2019-11-07 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969462#comment-16969462 ] Nicholas Chammas edited comment on SPARK-27990 at 11/7/19 5:54 PM: ---

[jira] [Commented] (SPARK-27990) Provide a way to recursively load data from datasource

2019-11-07 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969462#comment-16969462 ] Nicholas Chammas commented on SPARK-27990: -- Are there any docs for this new option? I can't

[jira] [Reopened] (SPARK-16483) Unifying struct fields and columns

2019-10-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-16483: -- Though this has been bulk closed, I still think it's a valuable potential improvement to

[jira] [Updated] (SPARK-29280) DataFrameReader should support a compression option

2019-09-27 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-29280: - Description:

[jira] [Comment Edited] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-27 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939856#comment-16939856 ] Nicholas Chammas edited comment on SPARK-29102 at 9/28/19 5:35 AM: --- I

[jira] [Commented] (SPARK-29280) DataFrameReader should support a compression option

2019-09-27 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939864#comment-16939864 ] Nicholas Chammas commented on SPARK-29280: -- cc [~hyukjin.kwon], [~cloud_fan] > DataFrameReader

[jira] [Created] (SPARK-29280) DataFrameReader should support a compression option

2019-09-27 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-29280: Summary: DataFrameReader should support a compression option Key: SPARK-29280 URL: https://issues.apache.org/jira/browse/SPARK-29280 Project: Spark

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-27 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939856#comment-16939856 ] Nicholas Chammas commented on SPARK-29102: -- I figured it out. Looks like the correct setting is

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-23 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936077#comment-16936077 ] Nicholas Chammas commented on SPARK-29102: -- I wonder if

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933803#comment-16933803 ] Nicholas Chammas commented on SPARK-29102: -- [~hyukjin.kwon] - Would you happen to know how to

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-18 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932997#comment-16932997 ] Nicholas Chammas commented on SPARK-29102: -- {quote}It duplicately decompresses and each map

[jira] [Resolved] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-18 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-29102. -- Resolution: Won't Fix > Read gzipped file into multiple partitions without full gzip

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-18 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932953#comment-16932953 ] Nicholas Chammas commented on SPARK-29102: -- Ah, thanks for the reference! So if I'm just trying

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-16 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930835#comment-16930835 ] Nicholas Chammas commented on SPARK-29102: -- cc [~cloud_fan] and [~hyukjin.kwon]: I noticed your

[jira] [Created] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-16 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-29102: Summary: Read gzipped file into multiple partitions without full gzip expansion on a single-node Key: SPARK-29102 URL: https://issues.apache.org/jira/browse/SPARK-29102

[jira] [Commented] (SPARK-25603) Generalize Nested Column Pruning

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910738#comment-16910738 ] Nicholas Chammas commented on SPARK-25603: -- [~dbtsai] - Just watched [your Spark Summit talk on

[jira] [Comment Edited] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910655#comment-16910655 ] Nicholas Chammas edited comment on SPARK-4502 at 8/19/19 7:55 PM: --

[jira] [Comment Edited] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910691#comment-16910691 ] Nicholas Chammas edited comment on SPARK-25150 at 8/19/19 7:39 PM: --- I

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Affects Version/s: 2.4.3 Labels: correctness (was: ) I haven't been

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Labels: correctness (was: ) Tagging this as a correctness issue since Spark 2+'s

[jira] [Updated] (SPARK-18084) write.partitionBy() does not recognize nested columns that select() can access

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18084: - Affects Version/s: 2.4.3 Retested and confirmed that this issue is still present in

[jira] [Updated] (SPARK-10892) Join with Data Frame returns wrong results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-10892: - Affects Version/s: 2.4.0 Labels: correctness (was: ) Updating affected

[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910655#comment-16910655 ] Nicholas Chammas commented on SPARK-4502: - Thanks for your notes [~Bartalos]. Just FYI, nested

[jira] [Updated] (SPARK-16824) Add API docs for VectorUDT

2019-05-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-16824: - Labels: (was: bulk-closed) > Add API docs for VectorUDT > --

[jira] [Reopened] (SPARK-16824) Add API docs for VectorUDT

2019-05-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-16824: -- Reviewing the links here, it seems that VectorUDT has been in use since 2016 at the

[jira] [Updated] (SPARK-16824) Add API docs for VectorUDT

2019-05-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-16824: - Affects Version/s: 2.4.3 > Add API docs for VectorUDT > -- > >

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Labels: (was: bulk-closed) > Regex_replace works in 1.6 but not in 2.0 >

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Affects Version/s: 2.4.3 > Regex_replace works in 1.6 but not in 2.0 >

[jira] [Reopened] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-19248: -- > Regex_replace works in 1.6 but not in 2.0 > - >

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Component/s: PySpark > Regex_replace works in 1.6 but not in 2.0 >

[jira] [Commented] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844547#comment-16844547 ] Nicholas Chammas commented on SPARK-19248: -- Looks like Spark 2.4.3 still exhibits the behavior

[jira] [Commented] (SPARK-18277) na.fill() and friends should work on struct fields

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844530#comment-16844530 ] Nicholas Chammas commented on SPARK-18277: -- [~hyukjin.kwon] - If I still think this issue is

[jira] [Commented] (SPARK-10892) Join with Data Frame returns wrong results

2018-10-30 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668941#comment-16668941 ] Nicholas Chammas commented on SPARK-10892: -- Is this issue still present in Spark 2.3.2 or

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632512#comment-16632512 ] Nicholas Chammas commented on SPARK-25150: -- Correct, this isn't a cross join. It's just a plain

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632381#comment-16632381 ] Nicholas Chammas commented on SPARK-25150: -- ([~petertoth] - Seeing your comment edit now.) OK,

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632281#comment-16632281 ] Nicholas Chammas commented on SPARK-25150: -- I've uploaded the expected output. I realize that

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Attachment: expected-output.txt > Joining DataFrames derived from the same source

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Description: I have two DataFrames, A and B. From B, I have derived two additional

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632252#comment-16632252 ] Nicholas Chammas commented on SPARK-25150: -- The attachments on this ticket contain a complete

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1663#comment-1663 ] Nicholas Chammas commented on SPARK-25150: -- [~cloud_fan] / [~srowen] - Would you consider this

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623788#comment-16623788 ] Nicholas Chammas commented on SPARK-25150: -- Given that Spark appears to provide incorrect

[jira] [Comment Edited] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584239#comment-16584239 ] Nicholas Chammas edited comment on SPARK-25150 at 8/17/18 6:15 PM: --- I

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584239#comment-16584239 ] Nicholas Chammas commented on SPARK-25150: -- I know there are a bunch of pending bug fixes in

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Attachment: zombie-analysis.py states.csv persons.csv

[jira] [Created] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-25150: Summary: Joining DataFrames derived from the same source yields confusing/incorrect results Key: SPARK-25150 URL: https://issues.apache.org/jira/browse/SPARK-25150

[jira] [Comment Edited] (SPARK-23945) Column.isin() should accept a single-column DataFrame as input

2018-05-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468045#comment-16468045 ] Nicholas Chammas edited comment on SPARK-23945 at 5/8/18 10:22 PM: ---

[jira] [Commented] (SPARK-23945) Column.isin() should accept a single-column DataFrame as input

2018-05-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468045#comment-16468045 ] Nicholas Chammas commented on SPARK-23945: -- > So in the grand scheme of things I'd expect

[jira] [Comment Edited] (SPARK-23945) Column.isin() should accept a single-column DataFrame as input

2018-05-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433316#comment-16433316 ] Nicholas Chammas edited comment on SPARK-23945 at 5/8/18 10:13 PM: --- I

[jira] [Commented] (SPARK-23945) Column.isin() should accept a single-column DataFrame as input

2018-04-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433316#comment-16433316 ] Nicholas Chammas commented on SPARK-23945: -- I always looked at DataFrames and SQL as two

[jira] [Updated] (SPARK-23945) Column.isin() should accept a single-column DataFrame as input

2018-04-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-23945: - Description: In SQL you can filter rows based on the result of a subquery: {code:java}

[jira] [Created] (SPARK-23945) Column.isin() should accept a single-column DataFrame as input

2018-04-09 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-23945: Summary: Column.isin() should accept a single-column DataFrame as input Key: SPARK-23945 URL: https://issues.apache.org/jira/browse/SPARK-23945 Project:

[jira] [Commented] (SPARK-22513) Provide build profile for hadoop 2.8

2018-03-26 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414190#comment-16414190 ] Nicholas Chammas commented on SPARK-22513: -- Thanks for the breakdown. This will be handy for

[jira] [Comment Edited] (SPARK-23716) Change SHA512 style in release artifacts to play nicely with shasum utility

2018-03-23 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412423#comment-16412423 ] Nicholas Chammas edited comment on SPARK-23716 at 3/24/18 5:13 AM: --- For

[jira] [Resolved] (SPARK-23716) Change SHA512 style in release artifacts to play nicely with shasum utility

2018-03-23 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-23716. -- Resolution: Won't Fix For my use case, there is no value in updating the Spark release

[jira] [Commented] (SPARK-22513) Provide build profile for hadoop 2.8

2018-03-23 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412218#comment-16412218 ] Nicholas Chammas commented on SPARK-22513: -- Fair enough. Just as an alternate confirmation,

[jira] [Commented] (SPARK-23534) Spark run on Hadoop 3.0.0

2018-03-19 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405615#comment-16405615 ] Nicholas Chammas commented on SPARK-23534: -- I don't know what it takes to add a Hadoop 3.0 build

[jira] [Commented] (SPARK-23534) Spark run on Hadoop 3.0.0

2018-03-19 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405563#comment-16405563 ] Nicholas Chammas commented on SPARK-23534: -- I believe this ticket is a duplicate of SPARK-23151,

[jira] [Commented] (SPARK-22513) Provide build profile for hadoop 2.8

2018-03-19 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405561#comment-16405561 ] Nicholas Chammas commented on SPARK-22513: -- [~srowen] - Just curious: How do you know that Spark

[jira] [Created] (SPARK-23716) Change SHA512 style in release artifacts to play nicely with shasum utility

2018-03-16 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-23716: Summary: Change SHA512 style in release artifacts to play nicely with shasum utility Key: SPARK-23716 URL: https://issues.apache.org/jira/browse/SPARK-23716

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2018-03-07 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389723#comment-16389723 ] Nicholas Chammas commented on SPARK-18492: -- [~imranshaik] - This is an open source project. You

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2018-03-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383843#comment-16383843 ] Nicholas Chammas commented on SPARK-18492: -- Are you seeing the same on Spark 2.3.0? Apparently,

[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2017-10-24 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217115#comment-16217115 ] Nicholas Chammas commented on SPARK-13587: -- To follow-up on my [earlier

[jira] [Commented] (SPARK-17025) Cannot persist PySpark ML Pipeline model that includes custom Transformer

2017-09-15 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168038#comment-16168038 ] Nicholas Chammas commented on SPARK-17025: -- I take that back. I won't be able to test this for

[jira] [Commented] (SPARK-17025) Cannot persist PySpark ML Pipeline model that includes custom Transformer

2017-08-15 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128271#comment-16128271 ] Nicholas Chammas commented on SPARK-17025: -- I'm still interested in this but I won't be able to

[jira] [Created] (SPARK-21712) Clarify PySpark Column.substr() type checking error message

2017-08-11 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-21712: Summary: Clarify PySpark Column.substr() type checking error message Key: SPARK-21712 URL: https://issues.apache.org/jira/browse/SPARK-21712 Project: Spark

[jira] [Commented] (SPARK-21110) Structs should be usable in inequality filters

2017-06-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059536#comment-16059536 ] Nicholas Chammas commented on SPARK-21110: -- cc [~marmbrus] - Assuming this is a valid feature

[jira] [Updated] (SPARK-21110) Structs should be usable in inequality filters

2017-06-15 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-21110: - Summary: Structs should be usable in inequality filters (was: Structs should be

[jira] [Created] (SPARK-21110) Structs should be orderable

2017-06-15 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-21110: Summary: Structs should be orderable Key: SPARK-21110 URL: https://issues.apache.org/jira/browse/SPARK-21110 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-12661) Drop Python 2.6 support in PySpark

2017-06-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035062#comment-16035062 ] Nicholas Chammas commented on SPARK-12661: -- I think we are good to resolve this provided that

[jira] [Commented] (SPARK-9862) Join: Handling data skew

2017-05-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020030#comment-16020030 ] Nicholas Chammas commented on SPARK-9862: - Is this issue meant to be a SQL-equivalent of

[jira] [Comment Edited] (SPARK-19553) Add GroupedData.countApprox()

2017-03-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870780#comment-15870780 ] Nicholas Chammas edited comment on SPARK-19553 at 3/14/17 2:38 PM: --- The

[jira] [Commented] (SPARK-15474) ORC data source fails to write and read back empty dataframe

2017-03-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893703#comment-15893703 ] Nicholas Chammas commented on SPARK-15474: -- cc [~owen.omalley] > ORC data source fails to

[jira] [Commented] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-03-01 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890930#comment-15890930 ] Nicholas Chammas commented on SPARK-19578: -- Makes sense to me. I suppose the Apache Arrow

[jira] [Commented] (SPARK-15474) ORC data source fails to write and read back empty dataframe

2017-03-01 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890639#comment-15890639 ] Nicholas Chammas commented on SPARK-15474: -- There is a related discussion on ORC-152 which

<    1   2   3   4   5   6   7   8   9   10   >