[jira] [Created] (SPARK-46449) Add ability to create databases via Catalog API

2023-12-18 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-46449: Summary: Add ability to create databases via Catalog API Key: SPARK-46449 URL: https://issues.apache.org/jira/browse/SPARK-46449 Project: Spark

[jira] [Created] (SPARK-46437) Remove unnecessary cruft from SQL built-in functions docs

2023-12-17 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-46437: Summary: Remove unnecessary cruft from SQL built-in functions docs Key: SPARK-46437 URL: https://issues.apache.org/jira/browse/SPARK-46437 Project: Spark

Guidance for filling out "Affects Version" on Jira

2023-12-17 Thread Nicholas Chammas
The Contributing guide only mentions what to fill in for “Affects Version” for bugs. How about for improvements? This question once caused some problems when I set “Affects Version” to the last released version, and that was interpreted as a request

[jira] [Created] (SPARK-46395) Automatically generate SQL configuration tables for documentation

2023-12-13 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-46395: Summary: Automatically generate SQL configuration tables for documentation Key: SPARK-46395 URL: https://issues.apache.org/jira/browse/SPARK-46395 Project

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > > On Mon, 11 Dec 2023 at 17:11, Nicholas Chammas <mailto:nicholas.cham...@gmail.com>

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh > wrote: > spark.sql.cbo.strategy: Set to AUTO to use the CBO as the default optimizer, > or NONE to disable it completely. > Hmm, I’ve also never heard of this setting before and can’t seem to find it in the Spark docs or source code.

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh > wrote: > > By default, the CBO is enabled in Spark. Note that this is not correct. AQE is enabled

[jira] [Commented] (SPARK-45599) Percentile can produce a wrong answer if -0.0 and 0.0 are mixed in the dataset

2023-12-10 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795162#comment-17795162 ] Nicholas Chammas commented on SPARK-45599: -- Per the [contributing guide|https

[jira] [Created] (SPARK-46357) Replace use of setConf with conf.set in docs

2023-12-10 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-46357: Summary: Replace use of setConf with conf.set in docs Key: SPARK-46357 URL: https://issues.apache.org/jira/browse/SPARK-46357 Project: Spark Issue

Re: When and how does Spark use metastore statistics?

2023-12-10 Thread Nicholas Chammas
in(mode="cost")) what the cost-based optimizer does and how to enable it Would this be a welcome addition to the project’s documentation? I’m happy to work on this. > On Dec 5, 2023, at 12:12 PM, Nicholas Chammas > wrote: > > I’m interested in improving some of t

Re: Algolia search on website is broken

2023-12-10 Thread Nicholas Chammas
onsole. > On Dec 5, 2023, at 11:28 AM, Nicholas Chammas > wrote: > > Should I report this instead on Jira? Apologies if the dev list is not the > right place. > > Search on the website appears to be broken. For example, here is a search for > “analyze”: >  > >

Re: SSH Tunneling issue with Apache Spark

2023-12-06 Thread Nicholas Chammas
nks for the advice Nicholas. > > As mentioned in the original email, I have tried JDBC + SSH Tunnel using > pymysql and sshtunnel and it worked fine. The problem happens only with Spark. > > Thanks, > Venkat > > > > On Wed, Dec 6, 2023 at 10:21 PM Nicholas Cha

Re: SSH Tunneling issue with Apache Spark

2023-12-06 Thread Nicholas Chammas
This is not a question for the dev list. Moving dev to bcc. One thing I would try is to connect to this database using JDBC + SSH tunnel, but without Spark. That way you can focus on getting the JDBC connection to work without Spark complicating the picture for you. > On Dec 5, 2023, at 8:12 

When and how does Spark use metastore statistics?

2023-12-05 Thread Nicholas Chammas
I’m interested in improving some of the documentation relating to the table and column statistics that get stored in the metastore, and how Spark uses them. But I’m not clear on a few things, so I’m writing to you with some questions. 1. The documentation for 

Algolia search on website is broken

2023-12-05 Thread Nicholas Chammas
Should I report this instead on Jira? Apologies if the dev list is not the right place. Search on the website appears to be broken. For example, here is a search for “analyze”:  And here is the same search using DDG

[jira] [Commented] (SPARK-37571) decouple amplab jenkins from spark website, builds and tests

2023-12-05 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793347#comment-17793347 ] Nicholas Chammas commented on SPARK-37571: -- Since we've [retired|https://lists.apache.org

[jira] [Resolved] (SPARK-37647) Expose percentile function in Scala/Python APIs

2023-12-05 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-37647. -- Resolution: Fixed It looks like this got added as part of Spark 3.5: [https

[jira] [Commented] (SPARK-45390) Remove `distutils` usage

2023-11-17 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787268#comment-17787268 ] Nicholas Chammas commented on SPARK-45390: -- Ah, are you referring to [PySpark's Python

[jira] [Commented] (SPARK-45390) Remove `distutils` usage

2023-11-15 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786401#comment-17786401 ] Nicholas Chammas commented on SPARK-45390: -- {quote}We don't promise to support all future

Are DataFrame rows ordered without an explicit ordering clause?

2023-09-18 Thread Nicholas Chammas
I’ve always considered DataFrames to be logically equivalent to SQL tables or queries. In SQL, the result order of any query is implementation-dependent without an explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10 times in a row and get 10 different orderings. I

[jira] [Commented] (SPARK-31001) Add ability to create a partitioned table via catalog.createTable()

2022-08-31 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598403#comment-17598403 ] Nicholas Chammas commented on SPARK-31001: -- Thanks for sharing these details. This is very

[jira] [Commented] (SPARK-31001) Add ability to create a partitioned table via catalog.createTable()

2022-08-30 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598115#comment-17598115 ] Nicholas Chammas commented on SPARK-31001: -- What's {{{}__partition_columns

Allowing all Reader or Writer settings to be provided as options

2022-08-09 Thread Nicholas Chammas
Hello people, I want to bring some attention to SPARK-39630 and ask if there are any design objections to the idea proposed there. The gist of the proposal is that there are some reader or writer directives that cannot be supplied as

[jira] [Created] (SPARK-39630) Allow all Reader or Writer settings to be provided as options

2022-06-28 Thread Nicholas Chammas (Jira)
Title: Message Title Nicholas Chammas

[jira] [Created] (SPARK-39582) "Since " docs on array_agg are incorrect

2022-06-24 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-39582: Summary: "Since " docs on array_agg are incorrect Key: SPARK-39582 URL: https://issues.apache.org/jira/browse/SPARK-39582 Project: Spark

[jira] [Commented] (SPARK-37219) support AS OF syntax

2022-05-16 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537589#comment-17537589 ] Nicholas Chammas commented on SPARK-37219: -- This change will enable not just Delta, but also

[jira] [Updated] (SPARK-31001) Add ability to create a partitioned table via catalog.createTable()

2022-05-10 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-31001: - Description: There doesn't appear to be a way to create a partitioned table using

[jira] [Comment Edited] (SPARK-37222) Max iterations reached in Operator Optimization w/left_anti or left_semi join and nested structures

2022-04-26 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528233#comment-17528233 ] Nicholas Chammas edited comment on SPARK-37222 at 4/26/22 3:44 PM

[jira] [Commented] (SPARK-37222) Max iterations reached in Operator Optimization w/left_anti or left_semi join and nested structures

2022-04-26 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528233#comment-17528233 ] Nicholas Chammas commented on SPARK-37222: -- I've found a helpful log setting that causes Spark

[jira] [Updated] (SPARK-37222) Max iterations reached in Operator Optimization w/left_anti or left_semi join and nested structures

2022-04-26 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-37222: - Attachment: plan-log.log > Max iterations reached in Operator Optimization w/left_a

[jira] [Updated] (SPARK-37696) Optimizer exceeds max iterations

2022-04-25 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-37696: - Affects Version/s: 3.2.1 > Optimizer exceeds max iterati

[jira] [Commented] (SPARK-37222) Max iterations reached in Operator Optimization w/left_anti or left_semi join and nested structures

2022-04-25 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527527#comment-17527527 ] Nicholas Chammas commented on SPARK-37222: -- Thanks for the detailed report, [~ssmith]. I am

[jira] [Updated] (SPARK-37222) Max iterations reached in Operator Optimization w/left_anti or left_semi join and nested structures

2022-04-25 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-37222: - Affects Version/s: 3.2.1 > Max iterations reached in Operator Optimization w/left_a

Re: Deluge of GitBox emails

2022-04-04 Thread Nicholas Chammas
rmal Github emails - that is if we turn them off do we have anything? > > On Mon, Apr 4, 2022 at 8:44 AM Nicholas Chammas <mailto:nicholas.cham...@gmail.com>> wrote: > I assume I’m not the only one getting these new emails from GitBox. Is there > a story behind that that I misse

Deluge of GitBox emails

2022-04-04 Thread Nicholas Chammas
I assume I’m not the only one getting these new emails from GitBox. Is there a story behind that that I missed? I’d rather not get these emails on the dev list. I assume most of the list would agree with me. GitHub has a good set of options for following activity on the repo. People who want

Re: [DISCUSS] Rename 'SQL' to 'SQL / DataFrame', and 'Query' to 'Execution' in SQL UI page

2022-03-28 Thread Nicholas Chammas
+1 Understanding the close relationship between SQL and DataFrames in Spark was a key learning moment for me, but I agree that using the terms interchangeably can be confusing. > On Mar 27, 2022, at 9:27 PM, Hyukjin Kwon wrote: > > *for some reason, the image looks broken (to me). I am

[jira] [Commented] (SPARK-5997) Increase partition count without performing a shuffle

2021-12-20 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462805#comment-17462805 ] Nicholas Chammas commented on SPARK-5997: - [~tenstriker] - I believe in your case you should

[jira] (SPARK-5997) Increase partition count without performing a shuffle

2021-12-20 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-5997 ] Nicholas Chammas deleted comment on SPARK-5997: - was (Author: nchammas): [~tenstriker] - I believe in your case you should be able to set {{spark.sql.files.maxRecordsPerFile}} to some

[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-12-20 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462718#comment-17462718 ] Nicholas Chammas commented on SPARK-24853: -- I would expect something like that to yield

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-15 Thread Nicholas Chammas
er way of computing aggregations through > composition of other Expressions. > > Simeon > > > > > > On Thu, Dec 9, 2021 at 9:26 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I'm trying to create a new aggregate function. It's my first time

[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-12-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459601#comment-17459601 ] Nicholas Chammas commented on SPARK-24853: -- Assuming we are talking about the example I

[jira] [Resolved] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2021-12-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-25150. -- Fix Version/s: 3.2.0 Resolution: Fixed It looks like Spark 3.1.2 exhibits

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2021-12-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459494#comment-17459494 ] Nicholas Chammas commented on SPARK-25150: -- I re-ran my test (described in the issue

[jira] [Commented] (HADOOP-18029) Update CompressionCodecFactory to handle uppercase file extensions

2021-12-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/HADOOP-18029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459478#comment-17459478 ] Nicholas Chammas commented on HADOOP-18029: --- I have not contributed code to Hadoop before

[jira] [Updated] (HADOOP-17562) Provide mechanism for explicitly specifying the compression codec for input files

2021-12-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/HADOOP-17562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated HADOOP-17562: -- Component/s: io > Provide mechanism for explicitly specifying the compression co

[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-12-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459467#comment-17459467 ] Nicholas Chammas commented on SPARK-24853: -- [~hyukjin.kwon] - Are you still opposed

[jira] [Resolved] (SPARK-26589) proper `median` method for spark dataframe

2021-12-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-26589. -- Resolution: Won't Fix Marking this as "Won't Fix", but I suppose if some

[jira] [Commented] (SPARK-26589) proper `median` method for spark dataframe

2021-12-14 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17459455#comment-17459455 ] Nicholas Chammas commented on SPARK-26589: -- It looks like making a distributed, memory

[jira] [Created] (SPARK-37647) Expose percentile function in Scala/Python APIs

2021-12-14 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-37647: Summary: Expose percentile function in Scala/Python APIs Key: SPARK-37647 URL: https://issues.apache.org/jira/browse/SPARK-37647 Project: Spark

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-13 Thread Nicholas Chammas
> > > On Mon, Dec 13, 2021 at 6:43 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> No takers here? :) >> >> I can see now why a median function is not available in most data >> processing systems. It's pretty annoying to i

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-13 Thread Nicholas Chammas
No takers here? :) I can see now why a median function is not available in most data processing systems. It's pretty annoying to implement! On Thu, Dec 9, 2021 at 9:25 PM Nicholas Chammas wrote: > I'm trying to create a new aggregate function. It's my first time working > with Cataly

Creating a memory-efficient AggregateFunction to calculate Median

2021-12-09 Thread Nicholas Chammas
I'm trying to create a new aggregate function. It's my first time working with Catalyst, so it's exciting---but I'm also in a bit over my head. My goal is to create a function to calculate the median . As a very simple solution, I could just

[jira] [Commented] (SPARK-26589) proper `median` method for spark dataframe

2021-12-09 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456860#comment-17456860 ] Nicholas Chammas commented on SPARK-26589: -- That makes sense to me. I've been struggling

Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-06 Thread Nicholas Chammas
Farewell to Jenkins and its classic weather forecast build status icons: [image: health-80plus.png][image: health-60to79.png][image: health-40to59.png][image: health-20to39.png][image: health-00to19.png] And thank you Shane for all the help over these years. Will you be nuking all the

[jira] [Commented] (SPARK-26589) proper `median` method for spark dataframe

2021-12-01 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452081#comment-17452081 ] Nicholas Chammas commented on SPARK-26589: -- [~srowen] - I'll ask for help on the dev list

[jira] [Commented] (SPARK-26589) proper `median` method for spark dataframe

2021-12-01 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451936#comment-17451936 ] Nicholas Chammas commented on SPARK-26589: -- Just for reference, Stack Overflow provides

[jira] [Comment Edited] (SPARK-26589) proper `median` method for spark dataframe

2021-11-30 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451283#comment-17451283 ] Nicholas Chammas edited comment on SPARK-26589 at 11/30/21, 6:17 PM

[jira] [Commented] (SPARK-26589) proper `median` method for spark dataframe

2021-11-30 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451283#comment-17451283 ] Nicholas Chammas commented on SPARK-26589: -- I'm going to try to implement this using

[jira] [Updated] (SPARK-12185) Add Histogram support to Spark SQL/DataFrames

2021-11-30 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-12185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-12185: - Labels: (was: bulk-closed) > Add Histogram support to Spark SQL/DataFra

[jira] [Reopened] (SPARK-12185) Add Histogram support to Spark SQL/DataFrames

2021-11-30 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-12185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-12185: -- Reopening this because I think it's a valid improvement that mirrors the existing

[jira] [Resolved] (SPARK-37393) Inline annotations for {ml, mllib}/common.py

2021-11-22 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-37393. -- Resolution: Duplicate > Inline annotations for {ml, mllib}/common

[jira] [Updated] (SPARK-37393) Inline annotations for {ml, mllib}/common.py

2021-11-19 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-37393: - Description: This will allow us to run type checks against those files themselves

[jira] [Created] (SPARK-37393) Inline annotations for {ml, mllib}/common.py

2021-11-19 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-37393: Summary: Inline annotations for {ml, mllib}/common.py Key: SPARK-37393 URL: https://issues.apache.org/jira/browse/SPARK-37393 Project: Spark Issue

[jira] [Created] (SPARK-37380) Miscellaneous Python lint infra cleanup

2021-11-18 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-37380: Summary: Miscellaneous Python lint infra cleanup Key: SPARK-37380 URL: https://issues.apache.org/jira/browse/SPARK-37380 Project: Spark Issue Type

Re: Supports Dynamic Table Options for Spark SQL

2021-11-15 Thread Nicholas Chammas
Side note about time travel: There is a PR to add VERSION/TIMESTAMP AS OF syntax to Spark SQL. On Mon, Nov 15, 2021 at 2:23 PM Ryan Blue wrote: > I want to note that I wouldn't recommend time traveling this way by using > the hint for `snapshot-id`.

[jira] [Updated] (SPARK-37336) Migrate _java2py to SparkSession

2021-11-15 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-37336: - Summary: Migrate _java2py to SparkSession (was: Migrate common ML utils

[jira] [Created] (SPARK-37336) Migrate common ML utils to SparkSession

2021-11-15 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-37336: Summary: Migrate common ML utils to SparkSession Key: SPARK-37336 URL: https://issues.apache.org/jira/browse/SPARK-37336 Project: Spark Issue Type

[jira] [Updated] (SPARK-37335) Clarify output of FPGrowth

2021-11-15 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-37335: - Description: The association rules returned by FPGrow include more columns than

[jira] [Created] (SPARK-37335) Clarify output of FPGrowth

2021-11-15 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-37335: Summary: Clarify output of FPGrowth Key: SPARK-37335 URL: https://issues.apache.org/jira/browse/SPARK-37335 Project: Spark Issue Type: Improvement

Jira components cleanup

2021-11-15 Thread Nicholas Chammas
https://issues.apache.org/jira/projects/SPARK?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page I think the "docs" component should be merged into "Documentation". Likewise, the "k8" component should be merged into "Kubernetes". I think anyone can technically update tags, but

[jira] [Comment Edited] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-11-02 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437394#comment-17437394 ] Nicholas Chammas edited comment on SPARK-24853 at 11/2/21, 2:41 PM

[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-11-02 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437396#comment-17437396 ] Nicholas Chammas commented on SPARK-24853: -- The [contributing guide|https://spark.apache.org

[jira] [Updated] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-11-02 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-24853: - Priority: Minor (was: Major) > Support Column type for withCol

[jira] [Reopened] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-11-02 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-24853: -- > Support Column type for withColumn and withColumnRenamed a

[jira] [Commented] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-11-02 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437394#comment-17437394 ] Nicholas Chammas commented on SPARK-24853: -- [~hyukjin.kwon] - It's not just for consistency

[jira] [Updated] (SPARK-24853) Support Column type for withColumn and withColumnRenamed apis

2021-11-02 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-24853: - Affects Version/s: 3.2.0 > Support Column type for withColumn and withColumnRena

[jira] [Commented] (SPARK-33000) cleanCheckpoints config does not clean all checkpointed RDDs on shutdown

2021-04-15 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322333#comment-17322333 ] Nicholas Chammas commented on SPARK-33000: -- Per the discussion [on the dev list|http://apache

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-17 Thread Nicholas Chammas
On Tue, Mar 16, 2021 at 9:15 PM Hyukjin Kwon wrote: > I am currently thinking we will have to convert the Koalas tests to use > unittests to match with PySpark for now. > Keep in mind that pytest supports unittest-based tests out of the box , so

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-15 Thread Nicholas Chammas
On Mon, Mar 15, 2021 at 2:12 AM Reynold Xin wrote: > I don't think we should deprecate existing APIs. > +1 I strongly prefer Spark's immutable DataFrame API to the Pandas API. I could be wrong, but I wager most people who have worked with both Spark and Pandas feel the same way. For the large

Re: Shutdown cleanup of disk-based resources that Spark creates

2021-03-11 Thread Nicholas Chammas
te a > reference within a scope which is closed. For example within the body of a > function (without return value) and store it only in a local > variable. After the scope is closed in case of our function when the caller > gets the control back you have chance to see the co

[jira] [Commented] (SPARK-33436) PySpark equivalent of SparkContext.hadoopConfiguration

2021-03-10 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17299283#comment-17299283 ] Nicholas Chammas commented on SPARK-33436: -- [~hyukjin.kwon] - Can you clarify please why

Re: Shutdown cleanup of disk-based resources that Spark creates

2021-03-10 Thread Nicholas Chammas
of an > unexpected error (in this case you should keep the checkpoint data). > > This way even after an unexpected exit the next run of the same app should > be able to pick up the checkpointed data. > > Best Regards, > Attila > > > > > On Wed, Mar 10, 2021 at 8:

[jira] [Updated] (SPARK-33000) cleanCheckpoints config does not clean all checkpointed RDDs on shutdown

2021-03-10 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-33000: - Description: Maybe it's just that the documentation needs to be updated, but I found

Shutdown cleanup of disk-based resources that Spark creates

2021-03-10 Thread Nicholas Chammas
Hello people, I'm working on a fix for SPARK-33000 . Spark does not cleanup checkpointed RDDs/DataFrames on shutdown, even if the appropriate configs are set. In the course of developing a fix, another contributor pointed out

[jira] [Commented] (SPARK-33000) cleanCheckpoints config does not clean all checkpointed RDDs on shutdown

2021-03-04 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295531#comment-17295531 ] Nicholas Chammas commented on SPARK-33000: -- [~caowang888] - If you're still interested

[jira] [Created] (HADOOP-17562) Provide mechanism for explicitly specifying the compression codec for input files

2021-03-03 Thread Nicholas Chammas (Jira)
Nicholas Chammas created HADOOP-17562: - Summary: Provide mechanism for explicitly specifying the compression codec for input files Key: HADOOP-17562 URL: https://issues.apache.org/jira/browse/HADOOP-17562

[jira] [Created] (HADOOP-17562) Provide mechanism for explicitly specifying the compression codec for input files

2021-03-03 Thread Nicholas Chammas (Jira)
Nicholas Chammas created HADOOP-17562: - Summary: Provide mechanism for explicitly specifying the compression codec for input files Key: HADOOP-17562 URL: https://issues.apache.org/jira/browse/HADOOP-17562

Re: Auto-closing PRs or How to get reviewers' attention

2021-02-18 Thread Nicholas Chammas
On Thu, Feb 18, 2021 at 10:34 AM Sean Owen wrote: > There is no way to force people to review or commit something of course. > And keep in mind we get a lot of, shall we say, unuseful pull requests. > There is occasionally some blowback to closing someone's PR, so the path of > least resistance

Re: Auto-closing PRs or How to get reviewers' attention

2021-02-18 Thread Nicholas Chammas
On Thu, Feb 18, 2021 at 9:58 AM Enrico Minack wrote: > *What is the approved way to ...* > > *... prevent it from being auto-closed?* Committing and commenting to the > PR does not prevent it from being closed the next day. > Committing and commenting should prevent the PR from being closed. It

[jira] [Resolved] (SPARK-34194) Queries that only touch partition columns shouldn't scan through all files

2021-02-08 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-34194. -- Resolution: Won't Fix > Queries that only touch partition columns shouldn't s

[jira] [Commented] (SPARK-34194) Queries that only touch partition columns shouldn't scan through all files

2021-02-08 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281269#comment-17281269 ] Nicholas Chammas commented on SPARK-34194: -- It's not clear to me whether SPARK-26709

[jira] [Comment Edited] (SPARK-34194) Queries that only touch partition columns shouldn't scan through all files

2021-02-01 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276869#comment-17276869 ] Nicholas Chammas edited comment on SPARK-34194 at 2/2/21, 5:56 AM

[jira] [Commented] (SPARK-34194) Queries that only touch partition columns shouldn't scan through all files

2021-02-01 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276869#comment-17276869 ] Nicholas Chammas commented on SPARK-34194: -- Interesting reference, [~attilapiros]. It looks

[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

2021-02-01 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276862#comment-17276862 ] Nicholas Chammas commented on PARQUET-41: - Thanks for the link [~yumwang]. That [README|https

[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

2021-02-01 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276842#comment-17276842 ] Nicholas Chammas commented on PARQUET-41: - Where is the user documentation for all the bloom

[issue43094] sqlite3.create_function takes parameter named narg, not num_params

2021-02-01 Thread Nicholas Chammas
New submission from Nicholas Chammas : The doc for sqlite3.create_function shows the signature as follows: https://docs.python.org/3.9/library/sqlite3.html#sqlite3.Connection.create_function ``` create_function(name, num_params, func, *, deterministic=False) ``` But it appears

Re: [Spark SQL]: SQL, Python, Scala and R API Consistency

2021-01-28 Thread Nicholas Chammas
On Thu, Jan 28, 2021 at 3:40 PM Sean Owen wrote: > It isn't that regexp_extract_all (for example) is useless outside SQL, > just, where do you draw the line? Supporting 10s of random SQL functions > across 3 other languages has a cost, which has to be weighed against > benefit, which we can

[jira] [Commented] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

2021-01-21 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269571#comment-17269571 ] Nicholas Chammas commented on SPARK-12890: -- I've created SPARK-34194 and fleshed out

[jira] [Created] (SPARK-34194) Queries that only touch partition columns shouldn't scan through all files

2021-01-21 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-34194: Summary: Queries that only touch partition columns shouldn't scan through all files Key: SPARK-34194 URL: https://issues.apache.org/jira/browse/SPARK-34194

[jira] [Commented] (SPARK-12890) Spark SQL query related to only partition fields should not scan the whole data.

2021-01-18 Thread Nicholas Chammas (Jira)
[ https://issues.apache.org/jira/browse/SPARK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267657#comment-17267657 ] Nicholas Chammas commented on SPARK-12890: -- Sure, will do. > Spark SQL query related to o

<    1   2   3   4   5   6   7   8   9   10   >