[jira] [Commented] (SPARK-9213) Improve regular expression performance (via joni)
[ https://issues.apache.org/jira/browse/SPARK-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614621#comment-17614621 ] Apache Spark commented on SPARK-9213: - User 'lyy-pineapple' has created a pull request for this issue: https://github.com/apache/spark/pull/38171 > Improve regular expression performance (via joni) > - > > Key: SPARK-9213 > URL: https://issues.apache.org/jira/browse/SPARK-9213 > Project: Spark > Issue Type: Umbrella > Components: SQL >Reporter: Reynold Xin >Priority: Major > Labels: bulk-closed > > I'm creating an umbrella ticket to improve regular expression performance for > string expressions. Right now our use of regular expressions is inefficient > for two reasons: > 1. Java regex in general is slow. > 2. We have to convert everything from UTF8 encoded bytes into Java String, > and then run regex on it, and then convert it back. > There are libraries in Java that provide regex support directly on UTF8 > encoded bytes. One prominent example is joni, used in JRuby. > Note: all regex functions are in > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40663) Migrate execution errors onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614596#comment-17614596 ] Apache Spark commented on SPARK-40663: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38170 > Migrate execution errors onto error classes > --- > > Key: SPARK-40663 > URL: https://issues.apache.org/jira/browse/SPARK-40663 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > Use temporary error classes in the execution exceptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40663) Migrate execution errors onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614594#comment-17614594 ] Apache Spark commented on SPARK-40663: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38169 > Migrate execution errors onto error classes > --- > > Key: SPARK-40663 > URL: https://issues.apache.org/jira/browse/SPARK-40663 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > Use temporary error classes in the execution exceptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40663) Migrate execution errors onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614593#comment-17614593 ] Apache Spark commented on SPARK-40663: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/38169 > Migrate execution errors onto error classes > --- > > Key: SPARK-40663 > URL: https://issues.apache.org/jira/browse/SPARK-40663 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > Use temporary error classes in the execution exceptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40659) Schema evolution for protobuf (and Avro too?)
[ https://issues.apache.org/jira/browse/SPARK-40659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614568#comment-17614568 ] Sandish Kumar HN commented on SPARK-40659: -- [~rangadi] it is possible to add these options settings, just an idea. # BACKWORD: Consumers using the latest schema can process data written by producers using the latest or oldest schema. like Adding fields or deleting optional fields. # FORWARD: Consumers using the latest or oldest schema can process data written by producers using the latest schema. like Adding fields or deleting optional fields # FULL: Both BACKWORD and FORWARD between oldest and latest schema. # The default option is FULL. > Schema evolution for protobuf (and Avro too?) > - > > Key: SPARK-40659 > URL: https://issues.apache.org/jira/browse/SPARK-40659 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Raghu Angadi >Priority: Major > > Protobuf & Avro should support schema evolution in streaming. We need to > throw a specific error message when we detect newer version of the the schema > in schema registry. > A couple of options for detecting version change at runtime: > * How do we detect newer version from schema registry? It is contacted only > during planning currently. > * We could detect version id in coming messages. > ** What if the id in the incoming message is newer than what our > schema-registry reports after the restart? > *** This indicates delayed syncs between customers schema-registry servers > (should be rare). We can keep erroring out until it is fixed. > *** Make sure we log the schema id used during planning. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40658) Protobuf v2 & v3 support
[ https://issues.apache.org/jira/browse/SPARK-40658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614565#comment-17614565 ] Sandish Kumar HN commented on SPARK-40658: -- [~mposdev21] these are the changes I see between proto2 vs proto3 # The latest Proto3 also supports optional fields, the difference is optional fields which have has_foo() methods, and "singular" fields, which do not. I don't see any different treatment needed to handle this. # In contrast to proto3, Proto2 allows custom default values and required fields. # Enums: Proto3's default value is the enum 0 index value. Proto2 uses the first syntactic entry in the enum declaration as the default value if it is not specified otherwise. # Proto2 does not validate that inbound and outbound bytes are encoded in UTF-8. During parsing, all string fields in Proto3 are appropriately UTF-8 encoded. # Proto2 and proto3 are wire compatible, they will have the same binary representation. should we have an optional option setting something like PROTO_VERSION_SUPPORT=V3 or V2 or ANY? the default can be ANY. > Protobuf v2 & v3 support > > > Key: SPARK-40658 > URL: https://issues.apache.org/jira/browse/SPARK-40658 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Raghu Angadi >Priority: Major > > We want to ensure Protobuf functions support both Protobuf version 2 and > version 3 schemas (e.g. descriptor file or compiled classes with v2 and v3). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40686) Support data masking built-in functions
[ https://issues.apache.org/jira/browse/SPARK-40686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-40686: - Summary: Support data masking built-in functions (was: Support data Masking built-in Functions) > Support data masking built-in functions > --- > > Key: SPARK-40686 > URL: https://issues.apache.org/jira/browse/SPARK-40686 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vinod KC >Priority: Minor > > Support built-in data masking functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40706) IllegalStateException when querying array values inside a nested struct
[ https://issues.apache.org/jira/browse/SPARK-40706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614550#comment-17614550 ] Bruce Robbins commented on SPARK-40706: --- Same as SPARK-39854? At the very least, the suggest workaround also worked for your case: {noformat} spark-sql> set spark.sql.optimizer.nestedSchemaPruning.enabled=false; spark.sql.optimizer.nestedSchemaPruning.enabled false Time taken: 0.224 seconds, Fetched 1 row(s) spark-sql> set spark.sql.optimizer.expression.nestedPruning.enabled=false; spark.sql.optimizer.expression.nestedPruning.enabledfalse Time taken: 0.016 seconds, Fetched 1 row(s) spark-sql> SELECT response.message as message, response.timestamp as timestamp, score as risk_score, model.value as model_type FROM tbl LATERAL VIEW OUTER explode(response.data.items.attempt) AS Attempt LATERAL VIEW OUTER explode(response.data.items.attempt.risk) AS RiskModels LATERAL VIEW OUTER explode(RiskModels) AS RiskModel LATERAL VIEW OUTER explode(RiskModel.indicator) AS Model LATERAL VIEW OUTER explode(RiskModel.Score) AS Score; > > > > > > > > > > m1 09/07/2022 1 abc m1 09/07/2022 2 abc m1 09/07/2022 3 abc m1 09/07/2022 1 def m1 09/07/2022 2 def m1 09/07/2022 3 def Time taken: 1.213 seconds, Fetched 6 row(s) spark-sql> > {noformat} > IllegalStateException when querying array values inside a nested struct > --- > > Key: SPARK-40706 > URL: https://issues.apache.org/jira/browse/SPARK-40706 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Rohan Barman >Priority: Major > > We are in the process of migrating our PySpark applications from Spark > version 3.1.2 to Spark version 3.2.0. > This bug is present in version 3.2.0. We do not see this issue in version > 3.1.2. > > *Minimal example to reproduce bug* > Below is a minimal example that generates hardcoded data and queries. The > data has several nested structs and arrays. > Our real use case reads data from avro files and has more complex queries, > but this is sufficient to reproduce the error. > > {code:java} > # Generate data > data = [ > ('1',{ > 'timestamp': '09/07/2022', > 'message': 'm1', > 'data':{ > 'items': { > 'id':1, > 'attempt':[ > {'risk':[ > {'score':[1,2,3]}, > {'indicator':[ > {'code':'c1','value':'abc'}, > {'code':'c2','value':'def'} > ]} > ]} > ] > } > } > }) > ] > from pyspark.sql.types import * > schema = StructType([ > StructField('id', StringType(), True), > StructField('response', StructType([ > StructField('timestamp', StringType(), True), > StructField('message',StringType(), True), > StructField('data', StructType([ > StructField('items', StructType([ > StructField('id', StringType(), True), > StructField("attempt", ArrayType(StructType([ > StructField("risk", ArrayType(StructType([ > StructField('score', ArrayType(StringType()), True), > StructField('indicator', ArrayType(StructType([ > StructField('code', StringType(), True), > StructField('value', StringType(), True), > ]))) > ]))) >]))) > ])) > ])) > ])), > ]) > df = spark.createDataFrame(data=data, schema=schema) > df.printSchema() > df.createOrReplaceTempView("tbl") > # Execute query > query = """ > SELECT > response.message as message, > response.timestamp as timestamp, > score as risk_score, > model.value as model_type > FROM tbl > LATERAL VIEW OUTER explode(response.data.items.attempt) > AS Attempt > LATERAL VIEW OUTER explode(response.data.items.attempt.risk) > AS RiskModels > LATERAL VIEW OUTER explode(RiskModels) > AS RiskModel > LATERAL VIEW OUTER explode(RiskModel.indicator) > AS Model > LATERAL VIEW OUTER explode(RiskModel.Score) > AS Score > """ > result = spark.sql(query) > print(result.coun
[jira] [Commented] (SPARK-40713) Improve SET operation support in the proto and the server
[ https://issues.apache.org/jira/browse/SPARK-40713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614540#comment-17614540 ] Apache Spark commented on SPARK-40713: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38166 > Improve SET operation support in the proto and the server > - > > Key: SPARK-40713 > URL: https://issues.apache.org/jira/browse/SPARK-40713 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40713) Improve SET operation support in the proto and the server
[ https://issues.apache.org/jira/browse/SPARK-40713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40713: Assignee: (was: Apache Spark) > Improve SET operation support in the proto and the server > - > > Key: SPARK-40713 > URL: https://issues.apache.org/jira/browse/SPARK-40713 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40713) Improve SET operation support in the proto and the server
[ https://issues.apache.org/jira/browse/SPARK-40713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614539#comment-17614539 ] Apache Spark commented on SPARK-40713: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38166 > Improve SET operation support in the proto and the server > - > > Key: SPARK-40713 > URL: https://issues.apache.org/jira/browse/SPARK-40713 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40713) Improve SET operation support in the proto and the server
[ https://issues.apache.org/jira/browse/SPARK-40713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40713: Assignee: Apache Spark > Improve SET operation support in the proto and the server > - > > Key: SPARK-40713 > URL: https://issues.apache.org/jira/browse/SPARK-40713 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40713) Improve SET operation support in the proto and the server
Rui Wang created SPARK-40713: Summary: Improve SET operation support in the proto and the server Key: SPARK-40713 URL: https://issues.apache.org/jira/browse/SPARK-40713 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40691) Support data masking built-in function 'mask_show_last_n'
[ https://issues.apache.org/jira/browse/SPARK-40691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614508#comment-17614508 ] Vinod KC commented on SPARK-40691: -- I'm working on this sub task > Support data masking built-in function 'mask_show_last_n' > - > > Key: SPARK-40691 > URL: https://issues.apache.org/jira/browse/SPARK-40691 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vinod KC >Priority: Minor > > Support data masking built-in function '{*}mask_show_last_n{*}' > Return a masked version of str, showing the last n characters unmasked. Upper > case letters should be converted to "X", lower case letters should be > converted to "x" and numbers should be converted to "n". For example, > mask_show_last_n("1234-5678-8765-4321", 4) results in ---4321. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40692) Support data masking built-in function 'mask_hash'
[ https://issues.apache.org/jira/browse/SPARK-40692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614509#comment-17614509 ] Vinod KC commented on SPARK-40692: -- I'm working on this sub task > Support data masking built-in function 'mask_hash' > -- > > Key: SPARK-40692 > URL: https://issues.apache.org/jira/browse/SPARK-40692 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vinod KC >Priority: Minor > > Support data masking built-in function '{*}mask_hash{*}' > Return a hashed value based on str. The hash should be consistent and should > be used to join masked string values together across tables. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40690) Support data masking built-in function 'mask_show_first_n'
[ https://issues.apache.org/jira/browse/SPARK-40690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614507#comment-17614507 ] Vinod KC commented on SPARK-40690: -- I'm working on this sub task > Support data masking built-in function 'mask_show_first_n' > -- > > Key: SPARK-40690 > URL: https://issues.apache.org/jira/browse/SPARK-40690 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vinod KC >Priority: Minor > > Support data masking built-in function '{*}mask_show_first_n{*}' > Return a masked version of str, showing the first n characters unmasked . > Upper case letters should be converted to "X", lower case letters should be > converted to "x" and numbers should be converted to "n". For example, > mask_show_first_n("1234-5678-8765-4321", 4) results in 1234---. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40689) Support data masking built-in function 'mask_last_n'
[ https://issues.apache.org/jira/browse/SPARK-40689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614506#comment-17614506 ] Vinod KC commented on SPARK-40689: -- I'm working on this sub task > Support data masking built-in function 'mask_last_n' > > > Key: SPARK-40689 > URL: https://issues.apache.org/jira/browse/SPARK-40689 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vinod KC >Priority: Minor > > Support data masking built-in function *mask_last_n* > Return a masked version of str with the last n values masked. Upper case > letters should be converted to "X", lower case letters should be converted to > "x" and numbers should be converted to "n". For example, > mask_last_n("1234-5678-8765-4321", 4) results in 1234-5678-8765-. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40688) Support data masking built-in function 'mask_first_n'
[ https://issues.apache.org/jira/browse/SPARK-40688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614505#comment-17614505 ] Vinod KC commented on SPARK-40688: -- I'm working on this sub task > Support data masking built-in function 'mask_first_n' > -- > > Key: SPARK-40688 > URL: https://issues.apache.org/jira/browse/SPARK-40688 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vinod KC >Priority: Minor > > Support data masking built-in function *mask_first_n* > Return a masked version of str with the first n values masked. Upper case > letters should be converted to "X", lower case letters should be converted to > "x" and numbers should be converted to "n". For example, > mask_first_n("1234-5678-8765-4321", 4) results in -5678-8765-4321. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40712) upgrade sbt-assembly plugin to 1.2.0
[ https://issues.apache.org/jira/browse/SPARK-40712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614488#comment-17614488 ] Apache Spark commented on SPARK-40712: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38164 > upgrade sbt-assembly plugin to 1.2.0 > > > Key: SPARK-40712 > URL: https://issues.apache.org/jira/browse/SPARK-40712 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/sbt/sbt-assembly/releases/tag/v1.0.0] > * https://github.com/sbt/sbt-assembly/releases/tag/v1.1.0 > * https://github.com/sbt/sbt-assembly/releases/tag/v1.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40712) upgrade sbt-assembly plugin to 1.2.0
[ https://issues.apache.org/jira/browse/SPARK-40712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40712: Assignee: (was: Apache Spark) > upgrade sbt-assembly plugin to 1.2.0 > > > Key: SPARK-40712 > URL: https://issues.apache.org/jira/browse/SPARK-40712 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/sbt/sbt-assembly/releases/tag/v1.0.0] > * https://github.com/sbt/sbt-assembly/releases/tag/v1.1.0 > * https://github.com/sbt/sbt-assembly/releases/tag/v1.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40712) upgrade sbt-assembly plugin to 1.2.0
[ https://issues.apache.org/jira/browse/SPARK-40712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40712: Assignee: Apache Spark > upgrade sbt-assembly plugin to 1.2.0 > > > Key: SPARK-40712 > URL: https://issues.apache.org/jira/browse/SPARK-40712 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > * [https://github.com/sbt/sbt-assembly/releases/tag/v1.0.0] > * https://github.com/sbt/sbt-assembly/releases/tag/v1.1.0 > * https://github.com/sbt/sbt-assembly/releases/tag/v1.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40712) upgrade sbt-assembly plugin to 1.2.0
[ https://issues.apache.org/jira/browse/SPARK-40712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614486#comment-17614486 ] Apache Spark commented on SPARK-40712: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38164 > upgrade sbt-assembly plugin to 1.2.0 > > > Key: SPARK-40712 > URL: https://issues.apache.org/jira/browse/SPARK-40712 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/sbt/sbt-assembly/releases/tag/v1.0.0] > * https://github.com/sbt/sbt-assembly/releases/tag/v1.1.0 > * https://github.com/sbt/sbt-assembly/releases/tag/v1.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40712) upgrade sbt-assembly plugin to 1.2.0
[ https://issues.apache.org/jira/browse/SPARK-40712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-40712: - Summary: upgrade sbt-assembly plugin to 1.2.0 (was: upgra sbt-assembly plugin to 1.2.0) > upgrade sbt-assembly plugin to 1.2.0 > > > Key: SPARK-40712 > URL: https://issues.apache.org/jira/browse/SPARK-40712 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/sbt/sbt-assembly/releases/tag/v1.0.0] > * https://github.com/sbt/sbt-assembly/releases/tag/v1.1.0 > * https://github.com/sbt/sbt-assembly/releases/tag/v1.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40712) upgra sbt-assembly plugin to 1.2.0
Yang Jie created SPARK-40712: Summary: upgra sbt-assembly plugin to 1.2.0 Key: SPARK-40712 URL: https://issues.apache.org/jira/browse/SPARK-40712 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie * [https://github.com/sbt/sbt-assembly/releases/tag/v1.0.0] * https://github.com/sbt/sbt-assembly/releases/tag/v1.1.0 * https://github.com/sbt/sbt-assembly/releases/tag/v1.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40711) Add spill size metrics for window
[ https://issues.apache.org/jira/browse/SPARK-40711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40711: Assignee: Apache Spark > Add spill size metrics for window > - > > Key: SPARK-40711 > URL: https://issues.apache.org/jira/browse/SPARK-40711 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40711) Add spill size metrics for window
[ https://issues.apache.org/jira/browse/SPARK-40711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614472#comment-17614472 ] Apache Spark commented on SPARK-40711: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/38163 > Add spill size metrics for window > - > > Key: SPARK-40711 > URL: https://issues.apache.org/jira/browse/SPARK-40711 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40711) Add spill size metrics for window
[ https://issues.apache.org/jira/browse/SPARK-40711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40711: Assignee: (was: Apache Spark) > Add spill size metrics for window > - > > Key: SPARK-40711 > URL: https://issues.apache.org/jira/browse/SPARK-40711 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40677) Shade more dependency to be able to run separately
[ https://issues.apache.org/jira/browse/SPARK-40677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614464#comment-17614464 ] Apache Spark commented on SPARK-40677: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38162 > Shade more dependency to be able to run separately > -- > > Key: SPARK-40677 > URL: https://issues.apache.org/jira/browse/SPARK-40677 > Project: Spark > Issue Type: Sub-task > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > https://github.com/apache/spark/pull/38109 separated the component but found > out that there were several more jars to be shaded. See also > https://github.com/apache/spark/pull/38109#issuecomment-1269836435 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40677) Shade more dependency to be able to run separately
[ https://issues.apache.org/jira/browse/SPARK-40677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614463#comment-17614463 ] Apache Spark commented on SPARK-40677: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38162 > Shade more dependency to be able to run separately > -- > > Key: SPARK-40677 > URL: https://issues.apache.org/jira/browse/SPARK-40677 > Project: Spark > Issue Type: Sub-task > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > https://github.com/apache/spark/pull/38109 separated the component but found > out that there were several more jars to be shaded. See also > https://github.com/apache/spark/pull/38109#issuecomment-1269836435 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40711) Add spill size metrics for window
XiDuo You created SPARK-40711: - Summary: Add spill size metrics for window Key: SPARK-40711 URL: https://issues.apache.org/jira/browse/SPARK-40711 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: XiDuo You -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40710) Supplement undocumented parquet configurations in documentation
[ https://issues.apache.org/jira/browse/SPARK-40710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40710: Assignee: (was: Apache Spark) > Supplement undocumented parquet configurations in documentation > --- > > Key: SPARK-40710 > URL: https://issues.apache.org/jira/browse/SPARK-40710 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40710) Supplement undocumented parquet configurations in documentation
[ https://issues.apache.org/jira/browse/SPARK-40710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614428#comment-17614428 ] Apache Spark commented on SPARK-40710: -- User 'dcoliversun' has created a pull request for this issue: https://github.com/apache/spark/pull/38160 > Supplement undocumented parquet configurations in documentation > --- > > Key: SPARK-40710 > URL: https://issues.apache.org/jira/browse/SPARK-40710 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40710) Supplement undocumented parquet configurations in documentation
[ https://issues.apache.org/jira/browse/SPARK-40710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40710: Assignee: Apache Spark > Supplement undocumented parquet configurations in documentation > --- > > Key: SPARK-40710 > URL: https://issues.apache.org/jira/browse/SPARK-40710 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Qian Sun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40594) Eagerly release hashed relation in ShuffledHashJoin
[ https://issues.apache.org/jira/browse/SPARK-40594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614425#comment-17614425 ] Apache Spark commented on SPARK-40594: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/38159 > Eagerly release hashed relation in ShuffledHashJoin > --- > > Key: SPARK-40594 > URL: https://issues.apache.org/jira/browse/SPARK-40594 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > ShuffledHashJoin releases the built hashed relation at the end of task using > taskCompletionListener. It is not always good enough for complex sql query. > If a smj or window on the top of the shj, then the hashed relation in shj > would be leak. All rows have been consumed in sort before smj or window then > the buffer can not allocate the memory which is hold by hashed relation. Then > it causes unnecessary spill. > It is a common case in multi-join, since AQE supports convert smj to shj at > runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40594) Eagerly release hashed relation in ShuffledHashJoin
[ https://issues.apache.org/jira/browse/SPARK-40594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614424#comment-17614424 ] Apache Spark commented on SPARK-40594: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/38159 > Eagerly release hashed relation in ShuffledHashJoin > --- > > Key: SPARK-40594 > URL: https://issues.apache.org/jira/browse/SPARK-40594 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > ShuffledHashJoin releases the built hashed relation at the end of task using > taskCompletionListener. It is not always good enough for complex sql query. > If a smj or window on the top of the shj, then the hashed relation in shj > would be leak. All rows have been consumed in sort before smj or window then > the buffer can not allocate the memory which is hold by hashed relation. Then > it causes unnecessary spill. > It is a common case in multi-join, since AQE supports convert smj to shj at > runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40594) Eagerly release hashed relation in ShuffledHashJoin
[ https://issues.apache.org/jira/browse/SPARK-40594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40594: Assignee: Apache Spark > Eagerly release hashed relation in ShuffledHashJoin > --- > > Key: SPARK-40594 > URL: https://issues.apache.org/jira/browse/SPARK-40594 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > ShuffledHashJoin releases the built hashed relation at the end of task using > taskCompletionListener. It is not always good enough for complex sql query. > If a smj or window on the top of the shj, then the hashed relation in shj > would be leak. All rows have been consumed in sort before smj or window then > the buffer can not allocate the memory which is hold by hashed relation. Then > it causes unnecessary spill. > It is a common case in multi-join, since AQE supports convert smj to shj at > runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40594) Eagerly release hashed relation in ShuffledHashJoin
[ https://issues.apache.org/jira/browse/SPARK-40594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40594: Assignee: (was: Apache Spark) > Eagerly release hashed relation in ShuffledHashJoin > --- > > Key: SPARK-40594 > URL: https://issues.apache.org/jira/browse/SPARK-40594 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > ShuffledHashJoin releases the built hashed relation at the end of task using > taskCompletionListener. It is not always good enough for complex sql query. > If a smj or window on the top of the shj, then the hashed relation in shj > would be leak. All rows have been consumed in sort before smj or window then > the buffer can not allocate the memory which is hold by hashed relation. Then > it causes unnecessary spill. > It is a common case in multi-join, since AQE supports convert smj to shj at > runtime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40710) Supplement undocumented parquet configurations in documentation
Qian Sun created SPARK-40710: Summary: Supplement undocumented parquet configurations in documentation Key: SPARK-40710 URL: https://issues.apache.org/jira/browse/SPARK-40710 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.3.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org