[jira] [Resolved] (SPARK-47978) Decouple Spark Go Connect Library versioning from Spark versioning
[ https://issues.apache.org/jira/browse/SPARK-47978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BoYang resolved SPARK-47978. Resolution: Fixed > Decouple Spark Go Connect Library versioning from Spark versioning > -- > > Key: SPARK-47978 > URL: https://issues.apache.org/jira/browse/SPARK-47978 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.1 >Reporter: BoYang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1 > > > There is a recent discussion in Spark community for Spark Operator version > naming convention. People like to use version independent of Spark versions. > That applies to Spark Connect Go Client as well. Better to start from v1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45225) XML: XSD file URL support
[ https://issues.apache.org/jira/browse/SPARK-45225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45225: --- Labels: pull-request-available (was: ) > XML: XSD file URL support > - > > Key: SPARK-45225 > URL: https://issues.apache.org/jira/browse/SPARK-45225 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47991) Arrange the test cases for window frames and window functions.
[ https://issues.apache.org/jira/browse/SPARK-47991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47991. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46226 [https://github.com/apache/spark/pull/46226] > Arrange the test cases for window frames and window functions. > -- > > Key: SPARK-47991 > URL: https://issues.apache.org/jira/browse/SPARK-47991 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48000) Hash join support for strings with collation
Uroš Bojanić created SPARK-48000: Summary: Hash join support for strings with collation Key: SPARK-48000 URL: https://issues.apache.org/jira/browse/SPARK-48000 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22231) Support of map, filter, withField, dropFields in nested list of structures
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840934#comment-17840934 ] Dongjoon Hyun commented on SPARK-22231: --- I removed the outdated target version from this issue. > Support of map, filter, withField, dropFields in nested list of structures > -- > > Key: SPARK-22231 > URL: https://issues.apache.org/jira/browse/SPARK-22231 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: DB Tsai >Priority: Major > > At Netflix's algorithm team, we work on ranking problems to find the great > content to fulfill the unique tastes of our members. Before building a > recommendation algorithms, we need to prepare the training, testing, and > validation datasets in Apache Spark. Due to the nature of ranking problems, > we have a nested list of items to be ranked in one column, and the top level > is the contexts describing the setting for where a model is to be used (e.g. > profiles, country, time, device, etc.) Here is a blog post describing the > details, [Distributed Time Travel for Feature > Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907]. > > To be more concrete, for the ranks of videos for a given profile_id at a > given country, our data schema can be looked like this, > {code:java} > root > |-- profile_id: long (nullable = true) > |-- country_iso_code: string (nullable = true) > |-- items: array (nullable = false) > ||-- element: struct (containsNull = false) > |||-- title_id: integer (nullable = true) > |||-- scores: double (nullable = true) > ... > {code} > We oftentimes need to work on the nested list of structs by applying some > functions on them. Sometimes, we're dropping or adding new columns in the > nested list of structs. Currently, there is no easy solution in open source > Apache Spark to perform those operations using SQL primitives; many people > just convert the data into RDD to work on the nested level of data, and then > reconstruct the new dataframe as workaround. This is extremely inefficient > because all the optimizations like predicate pushdown in SQL can not be > performed, we can not leverage on the columnar format, and the serialization > and deserialization cost becomes really huge even we just want to add a new > column in the nested level. > We built a solution internally at Netflix which we're very happy with. We > plan to make it open source in Spark upstream. We would like to socialize the > API design to see if we miss any use-case. > The first API we added is *mapItems* on dataframe which take a function from > *Column* to *Column*, and then apply the function on nested dataframe. Here > is an example, > {code:java} > case class Data(foo: Int, bar: Double, items: Seq[Double]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)), > Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4)) > )) > val result = df.mapItems("items") { > item => item * 2.0 > } > result.printSchema() > // root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: double (containsNull = true) > result.show() > // +---+++ > // |foo| bar| items| > // +---+++ > // | 10|10.0|[20.2, 20.4, 20.6...| > // | 20|20.0|[40.2, 40.4, 40.6...| > // +---+++ > {code} > Now, with the ability of applying a function in the nested dataframe, we can > add a new function, *withColumn* in *Column* to add or replace the existing > column that has the same name in the nested list of struct. Here is two > examples demonstrating the API together with *mapItems*; the first one > replaces the existing column, > {code:java} > case class Item(a: Int, b: Double) > case class Data(foo: Int, bar: Double, items: Seq[Item]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))), > Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0))) > )) > val result = df.mapItems("items") { > item => item.withColumn(item("b") + 1 as "b") > } > result.printSchema > root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: struct (containsNull = true) > // |||-- a: integer (nullable = true) > // |||-- b: double (nullable = true) > result.show(false) > // +---++--+ > // |foo|bar |items | > // +---++--+ > // |10 |10.0|[[10,11.0], [11,12.0]]| > // |20 |20.0|[[20,21.0], [21,22.0]]| > //
[jira] [Updated] (SPARK-22231) Support of map, filter, withField, dropFields in nested list of structures
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-22231: -- Target Version/s: (was: 3.2.0) > Support of map, filter, withField, dropFields in nested list of structures > -- > > Key: SPARK-22231 > URL: https://issues.apache.org/jira/browse/SPARK-22231 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: DB Tsai >Priority: Major > > At Netflix's algorithm team, we work on ranking problems to find the great > content to fulfill the unique tastes of our members. Before building a > recommendation algorithms, we need to prepare the training, testing, and > validation datasets in Apache Spark. Due to the nature of ranking problems, > we have a nested list of items to be ranked in one column, and the top level > is the contexts describing the setting for where a model is to be used (e.g. > profiles, country, time, device, etc.) Here is a blog post describing the > details, [Distributed Time Travel for Feature > Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907]. > > To be more concrete, for the ranks of videos for a given profile_id at a > given country, our data schema can be looked like this, > {code:java} > root > |-- profile_id: long (nullable = true) > |-- country_iso_code: string (nullable = true) > |-- items: array (nullable = false) > ||-- element: struct (containsNull = false) > |||-- title_id: integer (nullable = true) > |||-- scores: double (nullable = true) > ... > {code} > We oftentimes need to work on the nested list of structs by applying some > functions on them. Sometimes, we're dropping or adding new columns in the > nested list of structs. Currently, there is no easy solution in open source > Apache Spark to perform those operations using SQL primitives; many people > just convert the data into RDD to work on the nested level of data, and then > reconstruct the new dataframe as workaround. This is extremely inefficient > because all the optimizations like predicate pushdown in SQL can not be > performed, we can not leverage on the columnar format, and the serialization > and deserialization cost becomes really huge even we just want to add a new > column in the nested level. > We built a solution internally at Netflix which we're very happy with. We > plan to make it open source in Spark upstream. We would like to socialize the > API design to see if we miss any use-case. > The first API we added is *mapItems* on dataframe which take a function from > *Column* to *Column*, and then apply the function on nested dataframe. Here > is an example, > {code:java} > case class Data(foo: Int, bar: Double, items: Seq[Double]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)), > Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4)) > )) > val result = df.mapItems("items") { > item => item * 2.0 > } > result.printSchema() > // root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: double (containsNull = true) > result.show() > // +---+++ > // |foo| bar| items| > // +---+++ > // | 10|10.0|[20.2, 20.4, 20.6...| > // | 20|20.0|[40.2, 40.4, 40.6...| > // +---+++ > {code} > Now, with the ability of applying a function in the nested dataframe, we can > add a new function, *withColumn* in *Column* to add or replace the existing > column that has the same name in the nested list of struct. Here is two > examples demonstrating the API together with *mapItems*; the first one > replaces the existing column, > {code:java} > case class Item(a: Int, b: Double) > case class Data(foo: Int, bar: Double, items: Seq[Item]) > val df: Dataset[Data] = spark.createDataset(Seq( > Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))), > Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0))) > )) > val result = df.mapItems("items") { > item => item.withColumn(item("b") + 1 as "b") > } > result.printSchema > root > // |-- foo: integer (nullable = false) > // |-- bar: double (nullable = false) > // |-- items: array (nullable = true) > // ||-- element: struct (containsNull = true) > // |||-- a: integer (nullable = true) > // |||-- b: double (nullable = true) > result.show(false) > // +---++--+ > // |foo|bar |items | > // +---++--+ > // |10 |10.0|[[10,11.0], [11,12.0]]| > // |20 |20.0|[[20,21.0], [21,22.0]]| > // +---++--+ > {code} > and the second
[jira] [Updated] (SPARK-24941) Add RDDBarrier.coalesce() function
[ https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24941: -- Target Version/s: (was: 3.2.0) > Add RDDBarrier.coalesce() function > -- > > Key: SPARK-24941 > URL: https://issues.apache.org/jira/browse/SPARK-24941 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r204917245 > The number of partitions from the input data can be unexpectedly large, eg. > if you do > {code} > sc.textFile(...).barrier().mapPartitions() > {code} > The number of input partitions is based on the hdfs input splits. We shall > provide a way in RDDBarrier to enable users to specify the number of tasks in > a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) > . -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25383) Image data source supports sample pushdown
[ https://issues.apache.org/jira/browse/SPARK-25383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25383: -- Target Version/s: (was: 3.2.0) > Image data source supports sample pushdown > -- > > Key: SPARK-25383 > URL: https://issues.apache.org/jira/browse/SPARK-25383 > Project: Spark > Issue Type: New Feature > Components: ML, SQL >Affects Versions: 3.1.0 >Reporter: Xiangrui Meng >Priority: Major > > After SPARK-25349, we should update image data source to support sampling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25752) Add trait to easily whitelist logical operators that produce named output from CleanupAliases
[ https://issues.apache.org/jira/browse/SPARK-25752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25752: -- Target Version/s: (was: 3.2.0) > Add trait to easily whitelist logical operators that produce named output > from CleanupAliases > - > > Key: SPARK-25752 > URL: https://issues.apache.org/jira/browse/SPARK-25752 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Minor > > The rule `CleanupAliases` cleans up aliases from logical operators that do > not match a whitelist. This whitelist is hardcoded inside the rule which is > cumbersome. This PR is to clean that up by making a trait `HasNamedOutput` > that will be ignored by `CleanupAliases` and other ops that require aliases > to be preserved in the operator should extend it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder
[ https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840928#comment-17840928 ] Dongjoon Hyun commented on SPARK-28629: --- I removed the outdated target version from this issue. > Capture the missing rules in HiveSessionStateBuilder > > > Key: SPARK-28629 > URL: https://issues.apache.org/jira/browse/SPARK-28629 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > A general mistake for new contributors is to forget adding the corresponding > rules into the extended extendedResolutionRules, postHocResolutionRules, > extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the > rules or capture them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840930#comment-17840930 ] Dongjoon Hyun commented on SPARK-27780: --- I removed the outdated target version from this issue. > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder
[ https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28629: -- Target Version/s: (was: 3.2.0) > Capture the missing rules in HiveSessionStateBuilder > > > Key: SPARK-28629 > URL: https://issues.apache.org/jira/browse/SPARK-28629 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > A general mistake for new contributors is to forget adding the corresponding > rules into the extended extendedResolutionRules, postHocResolutionRules, > extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the > rules or capture them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27780: -- Target Version/s: (was: 3.2.0) > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL
[ https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840927#comment-17840927 ] Dongjoon Hyun commented on SPARK-30324: --- I removed the outdated target version from this issue. > Simplify API for JSON access in DataFrames/SQL > -- > > Key: SPARK-30324 > URL: https://issues.apache.org/jira/browse/SPARK-30324 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > get_json_object() is a UDF to parse JSON fields. It is verbose and hard to > use, e.g. I wasn't expecting the path to a field to have to start with "$.". > We can simplify all of this when a column is of StringType, and a nested > field is requested. This API sugar will in the query planner be rewritten as > get_json_object. > This nested access can then be extended in the future to other > semi-structured formats. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL
[ https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30324: -- Target Version/s: (was: 3.2.0) > Simplify API for JSON access in DataFrames/SQL > -- > > Key: SPARK-30324 > URL: https://issues.apache.org/jira/browse/SPARK-30324 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > get_json_object() is a UDF to parse JSON fields. It is verbose and hard to > use, e.g. I wasn't expecting the path to a field to have to start with "$.". > We can simplify all of this when a column is of StringType, and a nested > field is requested. This API sugar will in the query planner be rewritten as > get_json_object. > This nested access can then be extended in the future to other > semi-structured formats. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30334) Add metadata around semi-structured columns to Spark
[ https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30334: -- Target Version/s: (was: 3.2.0) > Add metadata around semi-structured columns to Spark > > > Key: SPARK-30334 > URL: https://issues.apache.org/jira/browse/SPARK-30334 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > Semi-structured data is used widely in the data industry for reporting events > in a wide variety of formats. Click events in product analytics can be stored > as json. Some application logs can be in the form of delimited key=value > text. Some data may be in xml. > The goal of this project is to be able to signal Spark that such a column > exists. This will then enable Spark to "auto-parse" these columns on the fly. > The proposal is to store this information as part of the column metadata, in > the fields: > - format: The format of the semi-structured column, e.g. json, xml, avro > - options: Options for parsing these columns > Then imagine having the following data: > {code:java} > ++---++ > | ts | event |raw | > ++---++ > | 2019-10-12 | click | {"field":"value"} | > ++---++ {code} > SELECT raw.field FROM data > will return "value" > or the following data > {code:java} > ++---+--+ > | ts | event | raw | > ++---+--+ > | 2019-10-12 | click | field1=v1|field2=v2 | > ++---+--+ {code} > SELECT raw.field1 FROM data > will return v1. > > As a first step, we will introduce the function "as_json", which accomplishes > this for JSON columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30334) Add metadata around semi-structured columns to Spark
[ https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840926#comment-17840926 ] Dongjoon Hyun commented on SPARK-30334: --- I removed the outdated target version from this issue. > Add metadata around semi-structured columns to Spark > > > Key: SPARK-30334 > URL: https://issues.apache.org/jira/browse/SPARK-30334 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Burak Yavuz >Priority: Major > > Semi-structured data is used widely in the data industry for reporting events > in a wide variety of formats. Click events in product analytics can be stored > as json. Some application logs can be in the form of delimited key=value > text. Some data may be in xml. > The goal of this project is to be able to signal Spark that such a column > exists. This will then enable Spark to "auto-parse" these columns on the fly. > The proposal is to store this information as part of the column metadata, in > the fields: > - format: The format of the semi-structured column, e.g. json, xml, avro > - options: Options for parsing these columns > Then imagine having the following data: > {code:java} > ++---++ > | ts | event |raw | > ++---++ > | 2019-10-12 | click | {"field":"value"} | > ++---++ {code} > SELECT raw.field FROM data > will return "value" > or the following data > {code:java} > ++---+--+ > | ts | event | raw | > ++---+--+ > | 2019-10-12 | click | field1=v1|field2=v2 | > ++---+--+ {code} > SELECT raw.field1 FROM data > will return v1. > > As a first step, we will introduce the function "as_json", which accomplishes > this for JSON columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage
[ https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840913#comment-17840913 ] Dongjoon Hyun commented on SPARK-24942: --- I removed the outdated target version, `3.2.0`, from this Jira. For now, Apache Spark community has no target version for this issue. > Improve cluster resource management with jobs containing barrier stage > -- > > Key: SPARK-24942 > URL: https://issues.apache.org/jira/browse/SPARK-24942 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r205652317 > We shall improve cluster resource management to address the following issues: > - With dynamic resource allocation enabled, it may happen that we acquire > some executors (but not enough to launch all the tasks in a barrier stage) > and later release them due to executor idle time expire, and then acquire > again. > - There can be deadlock with two concurrent applications. Each application > may acquire some resources, but not enough to launch all the tasks in a > barrier stage. And after hitting the idle timeout and releasing them, they > may acquire resources again, but just continually trade resources between > each other. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47998) pandas-on-spark DataFrame.concat will not join a Pandas dataframe and raises a misleading error
Philip Kahn created SPARK-47998: --- Summary: pandas-on-spark DataFrame.concat will not join a Pandas dataframe and raises a misleading error Key: SPARK-47998 URL: https://issues.apache.org/jira/browse/SPARK-47998 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 3.4.3 Reporter: Philip Kahn The `concat` method has a strict type check, that raises a misleading error: !image-2024-04-25-11-33-29-208.png! Note that the type raised is of `objs`, rather than `obj`, so a list of various objects will say that it cannot concatenate objects of type list, rather than the failed internal types. Additionally, this strictly checks for pandas-on-spark Series and DataFrames; since both objects will happily convert a naive Pandas object, something like objs = [DataFrame(x) if isinstance(x, pd.Dataframe) else Series(x) if isinstance(x, pd.Series) else x for x in objs] would trivially make this work in those cases and prevent a different strange error reporting that a dataframe wasn't valid in a dataframe concatenation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage
[ https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24942: -- Target Version/s: (was: 3.2.0) > Improve cluster resource management with jobs containing barrier stage > -- > > Key: SPARK-24942 > URL: https://issues.apache.org/jira/browse/SPARK-24942 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Xingbo Jiang >Priority: Major > > https://github.com/apache/spark/pull/21758#discussion_r205652317 > We shall improve cluster resource management to address the following issues: > - With dynamic resource allocation enabled, it may happen that we acquire > some executors (but not enough to launch all the tasks in a barrier stage) > and later release them due to executor idle time expire, and then acquire > again. > - There can be deadlock with two concurrent applications. Each application > may acquire some resources, but not enough to launch all the tasks in a > barrier stage. And after hitting the idle timeout and releasing them, they > may acquire resources again, but just continually trade resources between > each other. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47997) Pandas-on-Spark incompletely implements DataFrame.drop
Philip Kahn created SPARK-47997: --- Summary: Pandas-on-Spark incompletely implements DataFrame.drop Key: SPARK-47997 URL: https://issues.apache.org/jira/browse/SPARK-47997 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 3.4.3 Reporter: Philip Kahn For Pandas v1.0+, `drop` supports the `errors` kwarg: [https://pandas.pydata.org/pandas-docs/version/1.0/reference/api/pandas.DataFrame.drop.html] Pandas-on-Spark does not implement it. This is especially glaring since the pyspark drop is a no-op on absent columns, behaving like `errors='ignore'`, so _extra_ work needed to be done to implement the raise behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47996) Pandas-on-Spark incompletely implements merge methods
Philip Kahn created SPARK-47996: --- Summary: Pandas-on-Spark incompletely implements merge methods Key: SPARK-47996 URL: https://issues.apache.org/jira/browse/SPARK-47996 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 3.4.3 Reporter: Philip Kahn For Pandas >= 1.2 ( [https://pandas.pydata.org/pandas-docs/version/1.2/reference/api/pandas.DataFrame.merge.html] ) (current = 2.2) how implements method "cross". which is absent. This breaks API compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47994) SQLServer does not support 1 and 0 as boolean values
Stefan Bukorovic created SPARK-47994: Summary: SQLServer does not support 1 and 0 as boolean values Key: SPARK-47994 URL: https://issues.apache.org/jira/browse/SPARK-47994 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3 Reporter: Stefan Bukorovic Sometimes in Spark, when a column that is generated as CASE WHEN structure is used in comparison filter, output of optimized plan will be: CASE WHEN expression THEN (1 or 0)... which is not supported in SQLServer. Exception is thrown by SQLServer that a "non-boolean expression is given when boolean was expected". For now, we should not support CASE WHEN pushdown in SQLServer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44111) Prepare Apache Spark 4.0.0
[ https://issues.apache.org/jira/browse/SPARK-44111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840853#comment-17840853 ] Dongjoon Hyun commented on SPARK-44111: --- Yes, we will provide `4.0.0-preview` in advance, [~fbiville] . Here is the discussion thread on Apache Spark dev mailing list. * [https://lists.apache.org/thread/nxmvz2j7kp96otzlnl3kd277knlb6qgb] [~cloud_fan] is the release manager who is leading Apache Spark 4.0.0 release (including preview). > Prepare Apache Spark 4.0.0 > -- > > Key: SPARK-44111 > URL: https://issues.apache.org/jira/browse/SPARK-44111 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Critical > Labels: pull-request-available > > For now, this issue aims to collect ideas for planning Apache Spark 4.0.0. > We will add more items which will be excluded from Apache Spark 3.5.0 > (Feature Freeze: July 16th, 2023). > {code} > Spark 1: 2014.05 (1.0.0) ~ 2016.11 (1.6.3) > Spark 2: 2016.07 (2.0.0) ~ 2021.05 (2.4.8) > Spark 3: 2020.06 (3.0.0) ~ 2026.xx (3.5.x) > Spark 4: 2024.06 (4.0.0, NEW) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47355) Use wildcard imports in CollationTypeCasts
[ https://issues.apache.org/jira/browse/SPARK-47355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47355: --- Labels: pull-request-available (was: ) > Use wildcard imports in CollationTypeCasts > -- > > Key: SPARK-47355 > URL: https://issues.apache.org/jira/browse/SPARK-47355 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47355) Use wildcard imports in CollationTypeCast
[ https://issues.apache.org/jira/browse/SPARK-47355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47355: - Summary: Use wildcard imports in CollationTypeCast (was: TBD) > Use wildcard imports in CollationTypeCast > - > > Key: SPARK-47355 > URL: https://issues.apache.org/jira/browse/SPARK-47355 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47355) Use wildcard imports in CollationTypeCasts
[ https://issues.apache.org/jira/browse/SPARK-47355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47355: - Summary: Use wildcard imports in CollationTypeCasts (was: Use wildcard imports in CollationTypeCast) > Use wildcard imports in CollationTypeCasts > -- > > Key: SPARK-47355 > URL: https://issues.apache.org/jira/browse/SPARK-47355 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47987) Enable `ArrowParityTests.test_createDataFrame_empty_partition`
[ https://issues.apache.org/jira/browse/SPARK-47987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47987. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46220 [https://github.com/apache/spark/pull/46220] > Enable `ArrowParityTests.test_createDataFrame_empty_partition` > -- > > Key: SPARK-47987 > URL: https://issues.apache.org/jira/browse/SPARK-47987 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47990) Upgrade `zstd-jni` to 1.5.6-3
[ https://issues.apache.org/jira/browse/SPARK-47990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47990. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46225 [https://github.com/apache/spark/pull/46225] > Upgrade `zstd-jni` to 1.5.6-3 > - > > Key: SPARK-47990 > URL: https://issues.apache.org/jira/browse/SPARK-47990 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46830) Introducing collation concept into Spark
[ https://issues.apache.org/jira/browse/SPARK-46830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840792#comment-17840792 ] Gideon P commented on SPARK-46830: -- [~uros-db] what should I work on next? > Introducing collation concept into Spark > > > Key: SPARK-46830 > URL: https://issues.apache.org/jira/browse/SPARK-46830 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Aleksandar Tomic >Priority: Major > Attachments: Collation Support in Spark.docx > > > This feature will introduce collation support to the Spark engine. This means > that: > > # Every StringType will have an associated collation. Default remains UTF8 > Binary, which will behave under the same rules as current UTF8 String > comparison. > # Collation will be respected in all collation sensitive operations - > comparisons, hashing, string operations (contains, startWith, endsWith etc.) > # Collation can be set through following ways: > ## COLLATE expression. e.g. strExpr COLLATE collation_name > ## In CREATE TABLE column definition > ## By setting session collation. > # All the Spark operators need to respect collation settings (filters, > joins, shuffles, aggs etc.) > > This is a high level description of the feature. You can find detailed design > under > [this|https://docs.google.com/document/d/1A9RQiwq-n3R3vuh571yjOLaaIuIYRTyCx7UFr0Qg-eY/edit?usp=sharing] > link (doc is in attachment as well). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44111) Prepare Apache Spark 4.0.0
[ https://issues.apache.org/jira/browse/SPARK-44111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840788#comment-17840788 ] Florent BIVILLE commented on SPARK-44111: - Is there going to be pre-releases for Spark 4 that library authors can try? Or shall we build from the `master` branch and report back? > Prepare Apache Spark 4.0.0 > -- > > Key: SPARK-44111 > URL: https://issues.apache.org/jira/browse/SPARK-44111 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Critical > Labels: pull-request-available > > For now, this issue aims to collect ideas for planning Apache Spark 4.0.0. > We will add more items which will be excluded from Apache Spark 3.5.0 > (Feature Freeze: July 16th, 2023). > {code} > Spark 1: 2014.05 (1.0.0) ~ 2016.11 (1.6.3) > Spark 2: 2016.07 (2.0.0) ~ 2021.05 (2.4.8) > Spark 3: 2020.06 (3.0.0) ~ 2026.xx (3.5.x) > Spark 4: 2024.06 (4.0.0, NEW) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47985) Simplify functions with `lit`
[ https://issues.apache.org/jira/browse/SPARK-47985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-47985. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46219 [https://github.com/apache/spark/pull/46219] > Simplify functions with `lit` > - > > Key: SPARK-47985 > URL: https://issues.apache.org/jira/browse/SPARK-47985 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47985) Simplify functions with `lit`
[ https://issues.apache.org/jira/browse/SPARK-47985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-47985: - Assignee: Ruifeng Zheng > Simplify functions with `lit` > - > > Key: SPARK-47985 > URL: https://issues.apache.org/jira/browse/SPARK-47985 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47993) Drop Python 3.8 support
[ https://issues.apache.org/jira/browse/SPARK-47993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47993: - Labels: release-notes (was: release-note) > Drop Python 3.8 support > --- > > Key: SPARK-47993 > URL: https://issues.apache.org/jira/browse/SPARK-47993 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: release-notes > > Python 3.8 is EOL in this October. Considering the release schedule, we > should better drop it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47963) Make the external Spark ecosystem can use structured logging mechanisms
[ https://issues.apache.org/jira/browse/SPARK-47963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-47963: Summary: Make the external Spark ecosystem can use structured logging mechanisms (was: Add an external LogKey usage case in UT) > Make the external Spark ecosystem can use structured logging mechanisms > > > Key: SPARK-47963 > URL: https://issues.apache.org/jira/browse/SPARK-47963 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47992) Support recursive descent path in get_json_object function
Qian Sun created SPARK-47992: Summary: Support recursive descent path in get_json_object function Key: SPARK-47992 URL: https://issues.apache.org/jira/browse/SPARK-47992 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Qian Sun JSONPath borrows recursive descent syntax from E4X. We could use it to collect json object from json map string. {code:java} // json data {"key1": {"b": {"c": "c1", "d": "d1", "e": "e1"}}} {"key2": {"b": {"c": "c2", "d": "d2", "e": "e2"}}} select get_json_object(data, '$..c'); -- [c1, c2]{code} ref: https://goessner.net/articles/JsonPath/index.html#e2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47991) Arrange the test cases for window frames and window functions.
[ https://issues.apache.org/jira/browse/SPARK-47991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47991: --- Labels: pull-request-available (was: ) > Arrange the test cases for window frames and window functions. > -- > > Key: SPARK-47991 > URL: https://issues.apache.org/jira/browse/SPARK-47991 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47991) Arrange the test cases for window frames and window functions.
Jiaan Geng created SPARK-47991: -- Summary: Arrange the test cases for window frames and window functions. Key: SPARK-47991 URL: https://issues.apache.org/jira/browse/SPARK-47991 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls
[ https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840748#comment-17840748 ] ASF GitHub Bot commented on SPARK-38958: hadoop-yetus commented on PR #6550: URL: https://github.com/apache/hadoop/pull/6550#issuecomment-2076861454 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 01s | | No case conflicting files found. | | +0 :ok: | spotbugs | 0m 00s | | spotbugs executables are not available. | | +0 :ok: | codespell | 0m 01s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 01s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 00s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 00s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 92m 11s | | trunk passed | | +1 :green_heart: | compile | 5m 02s | | trunk passed | | +1 :green_heart: | checkstyle | 4m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 5m 03s | | trunk passed | | +1 :green_heart: | javadoc | 4m 45s | | trunk passed | | +1 :green_heart: | shadedclient | 146m 50s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 2m 55s | | the patch passed | | +1 :green_heart: | compile | 2m 16s | | the patch passed | | +1 :green_heart: | javac | 2m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 00s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 2m 02s | | the patch passed | | +1 :green_heart: | mvnsite | 2m 28s | | the patch passed | | +1 :green_heart: | javadoc | 2m 14s | | the patch passed | | +1 :green_heart: | shadedclient | 159m 37s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | asflicense | 5m 25s | | The patch does not generate ASF License warnings. | | | | 421m 41s | | | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/6550 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | MINGW64_NT-10.0-17763 691d1e3161c7 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys | | Build tool | maven | | Personality | /c/hadoop/dev-support/bin/hadoop.sh | | git revision | trunk / c8168fd0bc45331bd8b55dd53b537bec4b05fba5 | | Default Java | Azul Systems, Inc.-1.8.0_332-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6550/1/testReport/ | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6550/1/console | | versions | git=2.44.0.windows.1 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Override S3 Client in Spark Write/Read calls > > > Key: SPARK-38958 > URL: https://issues.apache.org/jira/browse/SPARK-38958 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.1 >Reporter: Hershal >Priority: Major > Labels: pull-request-available > > Hello, > I have been working to use spark to read and write data to S3. Unfortunately, > there are a few S3 headers that I need to add to my spark read/write calls. > After much looking, I have not found a way to replace the S3 client that > spark uses to make the read/write calls. I also have not found a > configuration that allows me to pass in S3 headers. Here is an example of > some common S3 request headers > ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).] > Does there already exist functionality to add S3 headers to spark read/write > calls or pass in a custom client that would pass these headers on every > read/write request? Appreciate the help and feedback > > Thanks, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For
[jira] [Updated] (SPARK-47297) TBD
[ https://issues.apache.org/jira/browse/SPARK-47297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47297: - Summary: TBD (was: split (binary & lowercase collation only)) > TBD > --- > > Key: SPARK-47297 > URL: https://issues.apache.org/jira/browse/SPARK-47297 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47408) Fix mathExpressions that use StringType
[ https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47408: -- Summary: Fix mathExpressions that use StringType (was: TBD) > Fix mathExpressions that use StringType > --- > > Key: SPARK-47408 > URL: https://issues.apache.org/jira/browse/SPARK-47408 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47353) Mode (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47353: - Description: Enable collation support for the *Mode* expression in Spark. First confirm what is the expected behaviour for this expression when given collated strings, then move on to the implementation that would enable handling strings of all collation types. Implement the corresponding unit tests and E2E SQL tests to reflect how this function should be used with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with the existing functions to learn more about how they work. In addition, look into the possible use-cases and implementation of similar functions within other other open-source DBMS, such as [PostgreSQL|https://www.postgresql.org/docs/]. The goal for this Jira ticket is to implement the *Mode* expression so it supports all collation types currently supported in Spark. To understand what changes were introduced in order to enable full collation support for other existing functions in Spark, take a look at the Spark PRs and Jira tickets for completed tasks in this parent (for example: Contains, StartsWith, EndsWith). Examples: With UTF8_BINARY collation, the query SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) AS tab(col); should return 'a'. With UTF8_BINARY_LCASE collation, the query SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) AS tab(col); should return either 'B' or 'b'. Read more about ICU [Collation Concepts|http://example.com/] and [Collator|http://example.com/] class. Also, refer to the Unicode Technical Standard for [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. was: Enable collation support for the *Mode* expression in Spark. First confirm what is the expected behaviour for this expression when given collated strings, then move on to the implementation that would enable handling strings of all collation types. Implement the corresponding unit tests and E2E SQL tests to reflect how this function should be used with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with the existing functions to learn more about how they work. In addition, look into the possible use-cases and implementation of similar functions within other other open-source DBMS, such as [PostgreSQL|https://www.postgresql.org/docs/]. The goal for this Jira ticket is to implement the *Mode* expression so it supports all collation types currently supported in Spark. To understand what changes were introduced in order to enable full collation support for other existing functions in Spark, take a look at the Spark PRs and Jira tickets for completed tasks in this parent (for example: Contains, StartsWith, EndsWith). Read more about ICU [Collation Concepts|http://example.com/] and [Collator|http://example.com/] class. Also, refer to the Unicode Technical Standard for [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. > Mode (all collations) > - > > Key: SPARK-47353 > URL: https://issues.apache.org/jira/browse/SPARK-47353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *Mode* expression in Spark. First confirm > what is the expected behaviour for this expression when given collated > strings, then move on to the implementation that would enable handling > strings of all collation types. Implement the corresponding unit tests and > E2E SQL tests to reflect how this function should be used with collation in > SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *Mode* expression so it > supports all collation types currently supported in Spark. To understand what > changes were introduced in order to enable full collation support for other > existing functions in Spark, take a look at the Spark PRs and Jira tickets > for completed tasks in this parent (for example: Contains, StartsWith, > EndsWith). > Examples: > With UTF8_BINARY collation, the query > SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) > AS tab(col); > should return 'a'. > With UTF8_BINARY_LCASE collation, the query > SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’) > AS
[jira] [Updated] (SPARK-47353) Mode (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-47353: - Description: Enable collation support for the *Mode* expression in Spark. First confirm what is the expected behaviour for this expression when given collated strings, then move on to the implementation that would enable handling strings of all collation types. Implement the corresponding unit tests and E2E SQL tests to reflect how this function should be used with collation in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment with the existing functions to learn more about how they work. In addition, look into the possible use-cases and implementation of similar functions within other other open-source DBMS, such as [PostgreSQL|https://www.postgresql.org/docs/]. The goal for this Jira ticket is to implement the *Mode* expression so it supports all collation types currently supported in Spark. To understand what changes were introduced in order to enable full collation support for other existing functions in Spark, take a look at the Spark PRs and Jira tickets for completed tasks in this parent (for example: Contains, StartsWith, EndsWith). Read more about ICU [Collation Concepts|http://example.com/] and [Collator|http://example.com/] class. Also, refer to the Unicode Technical Standard for [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. > Mode (all collations) > - > > Key: SPARK-47353 > URL: https://issues.apache.org/jira/browse/SPARK-47353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *Mode* expression in Spark. First confirm > what is the expected behaviour for this expression when given collated > strings, then move on to the implementation that would enable handling > strings of all collation types. Implement the corresponding unit tests and > E2E SQL tests to reflect how this function should be used with collation in > SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *Mode* expression so it > supports all collation types currently supported in Spark. To understand what > changes were introduced in order to enable full collation support for other > existing functions in Spark, take a look at the Spark PRs and Jira tickets > for completed tasks in this parent (for example: Contains, StartsWith, > EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47353) Mode (all collations)
[ https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840728#comment-17840728 ] Uroš Bojanić commented on SPARK-47353: -- [~panbingkun] if you're looking to make some contributions to the collation effort, please check out this ticket and let me know if you want to claim it! > Mode (all collations) > - > > Key: SPARK-47353 > URL: https://issues.apache.org/jira/browse/SPARK-47353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Enable collation support for the *Mode* expression in Spark. First confirm > what is the expected behaviour for this expression when given collated > strings, then move on to the implementation that would enable handling > strings of all collation types. Implement the corresponding unit tests and > E2E SQL tests to reflect how this function should be used with collation in > SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment > with the existing functions to learn more about how they work. In addition, > look into the possible use-cases and implementation of similar functions > within other other open-source DBMS, such as > [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *Mode* expression so it > supports all collation types currently supported in Spark. To understand what > changes were introduced in order to enable full collation support for other > existing functions in Spark, take a look at the Spark PRs and Jira tickets > for completed tasks in this parent (for example: Contains, StartsWith, > EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class. Also, refer to the Unicode Technical > Standard for > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47566) SubstringIndex
[ https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47566: -- Assignee: Apache Spark > SubstringIndex > -- > > Key: SPARK-47566 > URL: https://issues.apache.org/jira/browse/SPARK-47566 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Enable collation support for the *SubstringIndex* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *SubstringIndex* functions > so that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47566) SubstringIndex
[ https://issues.apache.org/jira/browse/SPARK-47566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-47566: -- Assignee: (was: Apache Spark) > SubstringIndex > -- > > Key: SPARK-47566 > URL: https://issues.apache.org/jira/browse/SPARK-47566 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Milan Dankovic >Priority: Major > Labels: pull-request-available > > Enable collation support for the *SubstringIndex* built-in string function in > Spark. First confirm what is the expected behaviour for these functions when > given collated strings, and then move on to implementation and testing. One > way to go about this is to consider using {_}StringSearch{_}, an efficient > ICU service for string matching. Implement the corresponding unit tests > (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect > how this function should be used with collation in SparkSQL, and feel free to > use your chosen Spark SQL Editor to experiment with the existing functions to > learn more about how they work. In addition, look into the possible use-cases > and implementation of similar functions within other other open-source DBMS, > such as [PostgreSQL|https://www.postgresql.org/docs/]. > > The goal for this Jira ticket is to implement the *SubstringIndex* functions > so that they support all collation types currently supported in Spark. To > understand what changes were introduced in order to enable full collation > support for other existing functions in Spark, take a look at the Spark PRs > and Jira tickets for completed tasks in this parent (for example: Contains, > StartsWith, EndsWith). > > Read more about ICU [Collation Concepts|http://example.com/] and > [Collator|http://example.com/] class, as well as _StringSearch_ using the > [ICU user > guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html] > and [ICU > docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html]. > Also, refer to the Unicode Technical Standard for string > [searching|https://www.unicode.org/reports/tr10/#Searching] and > [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47623) Enable `QuietTest` in parity tests
[ https://issues.apache.org/jira/browse/SPARK-47623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-47623: -- Summary: Enable `QuietTest` in parity tests (was: Use `QuietTest` in parity tests) > Enable `QuietTest` in parity tests > -- > > Key: SPARK-47623 > URL: https://issues.apache.org/jira/browse/SPARK-47623 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47982) Update code style' plugins to latest version
[ https://issues.apache.org/jira/browse/SPARK-47982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47982. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46216 [https://github.com/apache/spark/pull/46216] > Update code style' plugins to latest version > > > Key: SPARK-47982 > URL: https://issues.apache.org/jira/browse/SPARK-47982 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47986) [CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
[ https://issues.apache.org/jira/browse/SPARK-47986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47986: --- Labels: pull-request-available (was: ) > [CONNECT][PYTHON] Unable to create a new session when the default session is > closed by the server > - > > Key: SPARK-47986 > URL: https://issues.apache.org/jira/browse/SPARK-47986 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.5.0, 3.5.1 >Reporter: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > > When the server closes a session, usually after a cluster restart, the client > is unaware of this until it receives an error. > Once it does so, there is no way for the client to create a new session since > the stale sessions are still recorded as default and active sessions. > The only solution currently is to restart the Python interpreter on the > client, or to reach into the session builder and change the active or default > session. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47987) Reenable `ArrowParityTests.test_createDataFrame_empty_partition`
Ruifeng Zheng created SPARK-47987: - Summary: Reenable `ArrowParityTests.test_createDataFrame_empty_partition` Key: SPARK-47987 URL: https://issues.apache.org/jira/browse/SPARK-47987 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47984) Change `MetricsAggregate/V2Aggregator`'s `serialize/deserialize` to call `SparkSerDeUtils`'s `serialize/deserialize` methods.
[ https://issues.apache.org/jira/browse/SPARK-47984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47984. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46218 [https://github.com/apache/spark/pull/46218] > Change `MetricsAggregate/V2Aggregator`'s `serialize/deserialize` to call > `SparkSerDeUtils`'s `serialize/deserialize` methods. > - > > Key: SPARK-47984 > URL: https://issues.apache.org/jira/browse/SPARK-47984 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47984) Change `MetricsAggregate/V2Aggregator`'s `serialize/deserialize` to call `SparkSerDeUtils`'s `serialize/deserialize` methods.
[ https://issues.apache.org/jira/browse/SPARK-47984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-47984: Assignee: Yang Jie > Change `MetricsAggregate/V2Aggregator`'s `serialize/deserialize` to call > `SparkSerDeUtils`'s `serialize/deserialize` methods. > - > > Key: SPARK-47984 > URL: https://issues.apache.org/jira/browse/SPARK-47984 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47970) Revisit skipped parity tests for PySpark Connect
[ https://issues.apache.org/jira/browse/SPARK-47970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-47970: -- Summary: Revisit skipped parity tests for PySpark Connect (was: Revisit skipped parity tests for PySpark) > Revisit skipped parity tests for PySpark Connect > > > Key: SPARK-47970 > URL: https://issues.apache.org/jira/browse/SPARK-47970 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47983) Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal
[ https://issues.apache.org/jira/browse/SPARK-47983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47983. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46217 [https://github.com/apache/spark/pull/46217] > Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to > internal > -- > > Key: SPARK-47983 > URL: https://issues.apache.org/jira/browse/SPARK-47983 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47985) Simplify functions with `lit`
[ https://issues.apache.org/jira/browse/SPARK-47985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47985: --- Labels: pull-request-available (was: ) > Simplify functions with `lit` > - > > Key: SPARK-47985 > URL: https://issues.apache.org/jira/browse/SPARK-47985 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47985) Simplify functions with `lit`
Ruifeng Zheng created SPARK-47985: - Summary: Simplify functions with `lit` Key: SPARK-47985 URL: https://issues.apache.org/jira/browse/SPARK-47985 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47984) Change `MetricsAggregate/V2Aggregator`'s `serialize/deserialize` to call `SparkSerDeUtils`'s `serialize/deserialize` methods.
[ https://issues.apache.org/jira/browse/SPARK-47984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47984: --- Labels: pull-request-available (was: ) > Change `MetricsAggregate/V2Aggregator`'s `serialize/deserialize` to call > `SparkSerDeUtils`'s `serialize/deserialize` methods. > - > > Key: SPARK-47984 > URL: https://issues.apache.org/jira/browse/SPARK-47984 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47984) Change `MetricsAggregate/V2Aggregator`'s `serialize/deserialize` to call `SparkSerDeUtils`'s `serialize/deserialize` methods.
Yang Jie created SPARK-47984: Summary: Change `MetricsAggregate/V2Aggregator`'s `serialize/deserialize` to call `SparkSerDeUtils`'s `serialize/deserialize` methods. Key: SPARK-47984 URL: https://issues.apache.org/jira/browse/SPARK-47984 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47983) Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal
[ https://issues.apache.org/jira/browse/SPARK-47983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47983: --- Labels: pull-request-available (was: ) > Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to > internal > -- > > Key: SPARK-47983 > URL: https://issues.apache.org/jira/browse/SPARK-47983 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47983) Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal
Kent Yao created SPARK-47983: Summary: Demote spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled to internal Key: SPARK-47983 URL: https://issues.apache.org/jira/browse/SPARK-47983 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org