[jira] [Resolved] (SPARK-30557) Add public documentation for SPARK_SUBMIT_OPTS

2020-01-23 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-30557. -- Resolution: Won't Fix > Add public documentation for SPARK_SUBMIT_O

[jira] [Commented] (SPARK-30557) Add public documentation for SPARK_SUBMIT_OPTS

2020-01-17 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018254#comment-17018254 ] Nicholas Chammas commented on SPARK-30557: -- [~vanzin] - Do you know if this is something we

[jira] [Created] (SPARK-30557) Add public documentation for SPARK_SUBMIT_OPTS

2020-01-17 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30557: Summary: Add public documentation for SPARK_SUBMIT_OPTS Key: SPARK-30557 URL: https://issues.apache.org/jira/browse/SPARK-30557 Project: Spark Issue

Re: More publicly documenting the options under spark.sql.*

2020-01-15 Thread Nicholas Chammas
e >>> the question of who it's for, if you have to read source to find it.) >>> >>> I don't know if we need to overhaul the conf system, but there may >>> indeed be some confs that could legitimately be documented. I don't >>> know which. >>> >

More publicly documenting the options under spark.sql.*

2020-01-14 Thread Nicholas Chammas
I filed SPARK-30510 thinking that we had forgotten to document an option, but it turns out that there's a whole bunch of stuff under SQLConf.scala

[jira] [Commented] (SPARK-30510) Document spark.sql.sources.partitionOverwriteMode

2020-01-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015473#comment-17015473 ] Nicholas Chammas commented on SPARK-30510: -- [~hyukjin.kwon] I think I'm missing something here

[jira] [Created] (SPARK-30510) Document spark.sql.sources.partitionOverwriteMode

2020-01-14 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30510: Summary: Document spark.sql.sources.partitionOverwriteMode Key: SPARK-30510 URL: https://issues.apache.org/jira/browse/SPARK-30510 Project: Spark

Running Spark through a debugger

2019-12-16 Thread Nicholas Chammas
I normally stick to the Python parts of Spark, but I am interested in walking through the DSv2 code and understanding how it works. I tried following the "IDE Setup" section of the developer tools page, but quickly hit several problems loading the

Re: Closing stale PRs with a GitHub Action

2019-12-15 Thread Nicholas Chammas
time is long and it posts >>> some friendly message about reopening if there is a material change in the >>> proposed PR, the problem, or interest in merging it. >>> >>> On Fri, Dec 6, 2019 at 11:20 AM Nicholas Chammas < >>> nicholas.cham...@gmail.com

R linter is broken

2019-12-13 Thread Nicholas Chammas
The R linter GitHub action seems to be busted . Looks like we need to update some repository references

[jira] [Created] (SPARK-30173) Automatically close stale PRs

2019-12-08 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30173: Summary: Automatically close stale PRs Key: SPARK-30173 URL: https://issues.apache.org/jira/browse/SPARK-30173 Project: Spark Issue Type

Re: Closing stale PRs with a GitHub Action

2019-12-06 Thread Nicholas Chammas
t's standard practice and doesn't mean it can't be > reopened. > Often the related JIRA should be closed as well but we have done that > separately with bulk-close in the past. > > On Thu, Dec 5, 2019 at 3:24 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > &g

Closing stale PRs with a GitHub Action

2019-12-05 Thread Nicholas Chammas
It’s that topic again.  We have almost 500 open PRs. A good chunk of them are more than a year old. The oldest open PR dates to summer 2015. https://github.com/apache/spark/pulls?q=is%3Apr+is%3Aopen+sort%3Acreated-asc GitHub has an Action for closing stale PRs.

[jira] [Updated] (SPARK-30128) Promote remaining "hidden" PySpark DataFrameReader options to load APIs

2019-12-04 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30128: - Description: Following on to SPARK-29903 and similar issues (linked), there are options

[jira] [Created] (SPARK-30128) Promote remaining "hidden" PySpark DataFrameReader options to load APIs

2019-12-04 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30128: Summary: Promote remaining "hidden" PySpark DataFrameReader options to load APIs Key: SPARK-30128 URL: https://issues.apache.org/jira/browse/S

[jira] [Commented] (SPARK-27547) fix DataFrame self-join problems

2019-12-03 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987444#comment-16987444 ] Nicholas Chammas commented on SPARK-27547: -- Should this be marked as resolved by [#25107|https

Re: Auto-linking Jira tickets to their PRs

2019-12-03 Thread Nicholas Chammas
Hyukjin Kwon wrote: > I think it's broken .. cc Josh Rosen > > 2019년 12월 4일 (수) 오전 10:25, Nicholas Chammas 님이 > 작성: > >> We used to have a bot or something that automatically linked Jira tickets >> to PRs that mentioned them in their title. I don't see that happening >

[jira] [Updated] (SPARK-30091) Document mergeSchema option directly in the Python Parquet APIs

2019-12-03 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30091: - Summary: Document mergeSchema option directly in the Python Parquet APIs

[jira] [Created] (SPARK-30113) Document mergeSchema option in Python Orc APIs

2019-12-03 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30113: Summary: Document mergeSchema option in Python Orc APIs Key: SPARK-30113 URL: https://issues.apache.org/jira/browse/SPARK-30113 Project: Spark Issue

Auto-linking Jira tickets to their PRs

2019-12-03 Thread Nicholas Chammas
We used to have a bot or something that automatically linked Jira tickets to PRs that mentioned them in their title. I don't see that happening anymore. Did we intentionally remove this functionality, or is it temporarily broken for some reason?

[jira] [Updated] (SPARK-30091) Document mergeSchema option directly in the Python API

2019-12-01 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30091: - Affects Version/s: (was: 3.0.0) 2.4.4 > Document mergeSch

[jira] [Created] (SPARK-30091) Document mergeSchema option directly in the Python API

2019-12-01 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30091: Summary: Document mergeSchema option directly in the Python API Key: SPARK-30091 URL: https://issues.apache.org/jira/browse/SPARK-30091 Project: Spark

[jira] [Created] (SPARK-30084) Add docs showing how to automatically rebuild Python API docs

2019-11-29 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30084: Summary: Add docs showing how to automatically rebuild Python API docs Key: SPARK-30084 URL: https://issues.apache.org/jira/browse/SPARK-30084 Project: Spark

Re: Can't build unidoc

2019-11-29 Thread Nicholas Chammas
at 11:48 AM Nicholas Chammas > wrote: > > > > Howdy folks. Running `./build/sbt unidoc` on the latest master is giving > me this trace: > > > > ``` > > [warn] :: > > [warn] ::

Can't build unidoc

2019-11-29 Thread Nicholas Chammas
Howdy folks. Running `./build/sbt unidoc` on the latest master is giving me this trace: ``` [warn] :: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] :: [warn] ::

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-19 Thread Nicholas Chammas
> I don't think the default Hadoop version matters except for the spark-hadoop-cloud module, which is only meaningful under the hadoop-3.2 profile. What do you mean by "only meaningful under the hadoop-3.2 profile"? On Tue, Nov 19, 2019 at 5:40 PM Cheng Lian wrote: > Hey Steve, > > In terms of

[jira] [Commented] (SPARK-29903) Add documentation for recursiveFileLookup

2019-11-17 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976252#comment-16976252 ] Nicholas Chammas commented on SPARK-29903: -- Happy to do that. Going to wait for [this PR|https

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview

2019-11-16 Thread Nicholas Chammas
> Data Source API with Catalog Supports Where can we read more about this? The linked Nabble thread doesn't mention the word "Catalog". On Thu, Nov 7, 2019 at 5:53 PM Xingbo Jiang wrote: > Hi all, > > To enable wide-scale community testing of the upcoming Spark 3.0 release, > the Apache Spark

[jira] [Commented] (SPARK-29903) Add documentation for recursiveFileLookup

2019-11-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974577#comment-16974577 ] Nicholas Chammas commented on SPARK-29903: -- cc [~cloud_fan] and [~weichenxu123] >

[jira] [Created] (SPARK-29903) Add documentation for recursiveFileLookup

2019-11-14 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-29903: Summary: Add documentation for recursiveFileLookup Key: SPARK-29903 URL: https://issues.apache.org/jira/browse/SPARK-29903 Project: Spark Issue Type

[jira] [Comment Edited] (SPARK-27990) Provide a way to recursively load data from datasource

2019-11-07 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969462#comment-16969462 ] Nicholas Chammas edited comment on SPARK-27990 at 11/7/19 5:54 PM

[jira] [Commented] (SPARK-27990) Provide a way to recursively load data from datasource

2019-11-07 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-27990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969462#comment-16969462 ] Nicholas Chammas commented on SPARK-27990: -- Are there any docs for this new option? I can't

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-03 Thread Nicholas Chammas
On Fri, Nov 1, 2019 at 8:41 AM Steve Loughran wrote: > It would be really good if the spark distributions shipped with later > versions of the hadoop artifacts. > I second this. If we need to keep a Hadoop 2.x profile around, why not make it Hadoop 2.8 or something newer? Koert Kuipers wrote:

Spark 3.0 and S3A

2019-10-28 Thread Nicholas Chammas
Howdy folks, I have a question about what is happening with the 3.0 release in relation to Hadoop and hadoop-aws . Today, among other builds, we release a build of Spark built against Hadoop 2.7 and another one built

[jira] [Reopened] (SPARK-16483) Unifying struct fields and columns

2019-10-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-16483: -- Though this has been bulk closed, I still think it's a valuable potential improvement

[jira] [Updated] (SPARK-29280) DataFrameReader should support a compression option

2019-09-27 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-29280: - Description: [DataFrameWriter|http://spark.apache.org/docs/latest/api/python

[jira] [Comment Edited] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-27 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939856#comment-16939856 ] Nicholas Chammas edited comment on SPARK-29102 at 9/28/19 5:35 AM: --- I

[jira] [Commented] (SPARK-29280) DataFrameReader should support a compression option

2019-09-27 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939864#comment-16939864 ] Nicholas Chammas commented on SPARK-29280: -- cc [~hyukjin.kwon], [~cloud_fan] > DataFrameRea

[jira] [Created] (SPARK-29280) DataFrameReader should support a compression option

2019-09-27 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-29280: Summary: DataFrameReader should support a compression option Key: SPARK-29280 URL: https://issues.apache.org/jira/browse/SPARK-29280 Project: Spark

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-27 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939856#comment-16939856 ] Nicholas Chammas commented on SPARK-29102: -- I figured it out. Looks like the correct setting

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-23 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936077#comment-16936077 ] Nicholas Chammas commented on SPARK-29102: -- I wonder if [newAPIHadoopFile|http

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933803#comment-16933803 ] Nicholas Chammas commented on SPARK-29102: -- [~hyukjin.kwon] - Would you happen to know how

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-18 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932997#comment-16932997 ] Nicholas Chammas commented on SPARK-29102: -- {quote}It duplicately decompresses and each map

[jira] [Resolved] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-18 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-29102. -- Resolution: Won't Fix > Read gzipped file into multiple partitions without full g

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-18 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932953#comment-16932953 ] Nicholas Chammas commented on SPARK-29102: -- Ah, thanks for the reference! So if I'm just trying

[jira] [Commented] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-16 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930835#comment-16930835 ] Nicholas Chammas commented on SPARK-29102: -- cc [~cloud_fan] and [~hyukjin.kwon]: I noticed your

[jira] [Created] (SPARK-29102) Read gzipped file into multiple partitions without full gzip expansion on a single-node

2019-09-16 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-29102: Summary: Read gzipped file into multiple partitions without full gzip expansion on a single-node Key: SPARK-29102 URL: https://issues.apache.org/jira/browse/SPARK-29102

Re: DSv2 sync - 4 September 2019

2019-09-09 Thread Nicholas Chammas
> > On Mon, Sep 9, 2019 at 12:46 AM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> A quick question about failure modes, as a casual observer of the DSv2 >> effort: >> >> I was considering filing a JIRA ticket about enhancing the >> Dat

Re: DSv2 sync - 4 September 2019

2019-09-08 Thread Nicholas Chammas
A quick question about failure modes, as a casual observer of the DSv2 effort: I was considering filing a JIRA ticket about enhancing the DataFrameReader to include the failure *reason* in addition to the corrupt record when the mode is PERMISSIVE. So if you are loading a CSV, for example, and a

Providing a namespace for third-party configurations

2019-08-30 Thread Nicholas Chammas
I discovered today that EMR provides its own optimizations for Spark . Some of these optimizations are controlled by configuration settings with names like `spark.sql.dynamicPartitionPruning.enabled` or

[jira] [Commented] (SPARK-25603) Generalize Nested Column Pruning

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910738#comment-16910738 ] Nicholas Chammas commented on SPARK-25603: -- [~dbtsai] - Just watched [your Spark Summit talk

[jira] [Comment Edited] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910655#comment-16910655 ] Nicholas Chammas edited comment on SPARK-4502 at 8/19/19 7:55 PM

[jira] [Comment Edited] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910691#comment-16910691 ] Nicholas Chammas edited comment on SPARK-25150 at 8/19/19 7:39 PM: --- I

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Affects Version/s: 2.4.3 Labels: correctness (was: ) I haven't been

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Labels: correctness (was: ) Tagging this as a correctness issue since Spark 2+'s

[jira] [Updated] (SPARK-18084) write.partitionBy() does not recognize nested columns that select() can access

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18084: - Affects Version/s: 2.4.3 Retested and confirmed that this issue is still present

[jira] [Updated] (SPARK-10892) Join with Data Frame returns wrong results

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-10892: - Affects Version/s: 2.4.0 Labels: correctness (was: ) Updating affected

[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2019-08-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910655#comment-16910655 ] Nicholas Chammas commented on SPARK-4502: - Thanks for your notes [~Bartalos]. Just FYI, nested

Re: Recognizing non-code contributions

2019-08-05 Thread Nicholas Chammas
On Mon, Aug 5, 2019 at 9:55 AM Sean Owen wrote: > On Mon, Aug 5, 2019 at 3:50 AM Myrle Krantz wrote: > > So... events coordinators? I'd still make them committers. I guess I'm > still struggling to understand what problem making people VIP's without > giving them committership is trying to

Python API for mapGroupsWithState

2019-08-02 Thread Nicholas Chammas
Can someone succinctly describe the challenge in adding the `mapGroupsWithState()` API to PySpark? I was hoping for some suboptimal but nonetheless working solution to be available in Python, as there are with Python UDFs for example, but that doesn't seem to be case. The JIRA ticket for

[jira] [Updated] (SPARK-16824) Add API docs for VectorUDT

2019-05-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-16824: - Labels: (was: bulk-closed) > Add API docs for Vector

[jira] [Reopened] (SPARK-16824) Add API docs for VectorUDT

2019-05-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-16824: -- Reviewing the links here, it seems that VectorUDT has been in use since 2016

[jira] [Updated] (SPARK-16824) Add API docs for VectorUDT

2019-05-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-16824: - Affects Version/s: 2.4.3 > Add API docs for Vector

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Labels: (was: bulk-closed) > Regex_replace works in 1.6 but not in

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Affects Version/s: 2.4.3 > Regex_replace works in 1.6 but not in

[jira] [Reopened] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-19248: -- > Regex_replace works in 1.6 but not in

[jira] [Updated] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19248: - Component/s: PySpark > Regex_replace works in 1.6 but not in

[jira] [Commented] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844547#comment-16844547 ] Nicholas Chammas commented on SPARK-19248: -- Looks like Spark 2.4.3 still exhibits the behavior

[jira] [Commented] (SPARK-18277) na.fill() and friends should work on struct fields

2019-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844530#comment-16844530 ] Nicholas Chammas commented on SPARK-18277: -- [~hyukjin.kwon] - If I still think this issue

Re: Suggestion on Join Approach with Spark

2019-05-15 Thread Nicholas Chammas
This kind of question is for the User list, or for something like Stack Overflow. It's not on topic here. The dev list (i.e. this list) is for discussions about the development of Spark itself. On Wed, May 15, 2019 at 1:50 PM Chetan Khatri wrote: > Any one help me, I am confused. :( > > On

[issue34713] csvwriter.writerow()'s return type is undocumented

2019-03-13 Thread Nicholas Chammas
Nicholas Chammas added the comment: Nope, go ahead. -- ___ Python tracker <https://bugs.python.org/issue34713> ___ ___ Python-bugs-list mailing list Unsub

Re: [PySpark] Revisiting PySpark type annotations

2019-01-25 Thread Nicholas Chammas
I think the annotations are compatible with Python 2 since Maciej implemented them via stub files , which Python 2 simply ignores. Folks using mypy to check types will get the benefit whether they're on Python 2 or 3,

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Nicholas Chammas
ively mature for core ETL and >> incremental processing purpose. I interact with a lot of users using it >> everyday. We can always expand the use cases and add more, but that also >> adds maintenance burden. In any case, it'd be good to get some activity >> here. >>

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Nicholas Chammas
As an observer, this thread is interesting and concerning. Is there an emerging consensus that Structured Streaming is somehow not relevant anymore? Or is it just that folks consider it "complete enough"? Structured Streaming was billed as the replacement to DStreams. If committers, generally

Re: Noisy spark-website notifications

2018-12-19 Thread Nicholas Chammas
it should only send one email when a PR is merged. > > On Thu, Dec 20, 2018 at 10:58 AM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Can we somehow disable these new email alerts coming through for the >> Spark website repo? >> >> On Wed, D

Noisy spark-website notifications

2018-12-19 Thread Nicholas Chammas
Can we somehow disable these new email alerts coming through for the Spark website repo? On Wed, Dec 19, 2018 at 8:25 PM GitBox wrote: > ueshin commented on a change in pull request #163: Announce the schedule > of 2019 Spark+AI summit at SF > URL: >

[jira] [Commented] (SPARK-10892) Join with Data Frame returns wrong results

2018-10-30 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668941#comment-16668941 ] Nicholas Chammas commented on SPARK-10892: -- Is this issue still present in Spark 2.3.2 or 2.4.0

Re: Documentation of boolean column operators missing?

2018-10-23 Thread Nicholas Chammas
On Tue, 23 Oct 2018 at 21:32, Sean Owen wrote: > >> The comments say that it is not possible to overload 'and' and 'or', >> which would have been more natural. >> > Yes, unfortunately, Python does not allow you to override and, or, or not. They are not implemented as “dunder” method (e.g.

Re: Documentation of boolean column operators missing?

2018-10-23 Thread Nicholas Chammas
, 2018 at 3:02 PM Nicholas Chammas wrote: > So it appears then that the equivalent operators for PySpark are > completely missing from the docs, right? That’s surprising. And if there > are column function equivalents for |, &, and ~, then I can’t find those > either for PySpark.

Re: Documentation of boolean column operators missing?

2018-10-23 Thread Nicholas Chammas
la/index.html#org.apache.spark.sql.Column > > On Tue, Oct 23, 2018, 12:27 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I can’t seem to find any documentation of the &, |, and ~ operators for >> PySpark DataFrame columns. I assume that should be in our

Re: Documentation of boolean column operators missing?

2018-10-23 Thread Nicholas Chammas
> > https://spark.apache.org/docs/2.3.0/api/sql/index.html > > > > On Tue, Oct 23, 2018 at 10:27 AM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I can’t seem to find any documentation of the &, |, and ~ operators for >> PySpark DataFram

Documentation of boolean column operators missing?

2018-10-23 Thread Nicholas Chammas
I can’t seem to find any documentation of the &, |, and ~ operators for PySpark DataFrame columns. I assume that should be in our docs somewhere. Was it always missing? Am I just missing something obvious? Nick

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-10 Thread Nicholas Chammas
FYI I believe we have an open correctness issue here: https://issues.apache.org/jira/browse/SPARK-25150 However, it needs review by another person to confirm whether it is indeed a correctness issue (and whether it still impacts this latest RC). Nick 2018년 10월 10일 (수) 오후 3:14, Jean Georges

[Distutils] Re: Notes from python core sprint on workflow tooling

2018-09-30 Thread Nicholas Chammas
On Sun, Sep 30, 2018 at 2:17 PM Tzu-ping Chung wrote: > I can’t speak for others (also not really sure what “we” should include > here…), but I > have a couple of interactions with the author on Twitter. I can’t recall > whether I invited > him to join distutils-sig specifically, but I would

[Distutils] Re: Notes from python core sprint on workflow tooling

2018-09-30 Thread Nicholas Chammas
On Sun, Sep 30, 2018 at 6:48 AM Nathaniel Smith wrote: > So I think now might be a time for a bit of top-down design. **I want > a picture of the elephant.** If we had that, maybe we could see how > all these different ideas could be put together into a coherent whole. > So at the Python core

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632512#comment-16632512 ] Nicholas Chammas commented on SPARK-25150: -- Correct, this isn't a cross join. It's just a plain

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632381#comment-16632381 ] Nicholas Chammas commented on SPARK-25150: -- ([~petertoth] - Seeing your comment edit now.) OK

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632281#comment-16632281 ] Nicholas Chammas commented on SPARK-25150: -- I've uploaded the expected output. I realize

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Attachment: expected-output.txt > Joining DataFrames derived from the same sou

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Description: I have two DataFrames, A and B. From B, I have derived two additional

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632252#comment-16632252 ] Nicholas Chammas commented on SPARK-25150: -- The attachments on this ticket contain a complete

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1663#comment-1663 ] Nicholas Chammas commented on SPARK-25150: -- [~cloud_fan] / [~srowen] - Would you consider

Re: [Python-ideas] PEPs: Theory of operation [was: Moving to another forum system ...]

2018-09-22 Thread Nicholas Chammas
On Sat, Sep 22, 2018 at 8:52 AM Anders Hovmöller wrote: > >>> I think that entire paragraph made it sound even worse than what I > wrote originally. It reads to an outsider as “if you don’t know what’s > wrong I’m not going to tell you”. > > > > More like, if you're not sufficiently familiar

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-09-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623788#comment-16623788 ] Nicholas Chammas commented on SPARK-25150: -- Given that Spark appears to provide incorrect

[issue34713] csvwriter.writerow()'s return type is undocumented

2018-09-17 Thread Nicholas Chammas
Nicholas Chammas added the comment: Looks like it's bytes written, not characters: ``` >>> import csv >>> with open('test.csv', 'w', newline='') as csv_file: ... csv_writer = csv.writer( ... csv_file, ... dialect='unix', ... ) ... csv_writer.wr

[issue34713] csvwriter.writerow()'s return type is undocumented

2018-09-17 Thread Nicholas Chammas
New submission from Nicholas Chammas : It _looks_ like csvwriter.writerow() returns the number of bytes (or is it characters?) written. However, there is no documentation of this: https://docs.python.org/3.7/library/csv.html#csv.csvwriter.writerow Is this behavior part of the method's

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Nicholas Chammas
I believe -1 votes are merited only for correctness bugs and regressions since the previous release. Does SPARK-23200 count as either? 2018년 9월 17일 (월) 오전 9:40, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>님이 작성: > -1 > > I would like to see:

Re: Should python-2 be supported in Spark 3.0?

2018-09-15 Thread Nicholas Chammas
As Reynold pointed out, we don't have to drop Python 2 support right off the bat. We can just deprecate it with Spark 3.0, which would allow us to actually drop it at a later 3.x release. On Sat, Sep 15, 2018 at 2:09 PM Erik Erlandson wrote: > On a separate dev@spark thread, I raised a question

Re: Python friendly API for Spark 3.0

2018-09-14 Thread Nicholas Chammas
Do we need to ditch Python 2 support to provide type hints? I don’t think so. Python lets you specify typing stubs that provide the same benefit without forcing Python 3. 2018년 9월 14일 (금) 오후 8:01, Holden Karau 님이 작성: > > > On Fri, Sep 14, 2018, 3:26 PM Erik Erlandson wrote: > >> To be clear,

Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-29 Thread Nicholas Chammas
Dunno if I made a silly mistake, but I wanted to bring some attention to this issue in case there was something serious going on here that might affect the upcoming release. https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-25150 Nick

<    1   2   3   4   5   6   7   8   9   10   >