[jira] [Updated] (SPARK-44509) Fine grained interrupt in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44509: - Fix Version/s: (was: 3.5.0) (was: 4.0.0) > Fine grained interrupt in Python Spark Connect > -- > > Key: SPARK-44509 > URL: https://issues.apache.org/jira/browse/SPARK-44509 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > Next to SparkSession.interruptAll, provide mechanism for interrupting > * individual queries > * user defined groups of queries in a session (by a tag) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44509) Fine grained interrupt in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44509: - Reporter: Hyukjin Kwon (was: Juliusz Sompolski) > Fine grained interrupt in Python Spark Connect > -- > > Key: SPARK-44509 > URL: https://issues.apache.org/jira/browse/SPARK-44509 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Next to SparkSession.interruptAll, provide mechanism for interrupting > * individual queries > * user defined groups of queries in a session (by a tag) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44509) Fine grained interrupt in Python Spark Connect
Hyukjin Kwon created SPARK-44509: Summary: Fine grained interrupt in Python Spark Connect Key: SPARK-44509 URL: https://issues.apache.org/jira/browse/SPARK-44509 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Assignee: Juliusz Sompolski Fix For: 3.5.0, 4.0.0 Next to SparkSession.interruptAll, provide mechanism for interrupting * individual queries * user defined groups of queries in a session (by a tag) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44509) Fine grained interrupt in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44509: - Description: Same as https://issues.apache.org/jira/browse/SPARK-44422 but need it for Python was: Next to SparkSession.interruptAll, provide mechanism for interrupting * individual queries * user defined groups of queries in a session (by a tag) > Fine grained interrupt in Python Spark Connect > -- > > Key: SPARK-44509 > URL: https://issues.apache.org/jira/browse/SPARK-44509 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > Same as https://issues.apache.org/jira/browse/SPARK-44422 but need it for > Python > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44509) Fine grained interrupt in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44509: - Component/s: PySpark > Fine grained interrupt in Python Spark Connect > -- > > Key: SPARK-44509 > URL: https://issues.apache.org/jira/browse/SPARK-44509 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > Same as https://issues.apache.org/jira/browse/SPARK-44422 but need it for > Python > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44509) Fine grained interrupt in Python Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44509: Assignee: (was: Juliusz Sompolski) > Fine grained interrupt in Python Spark Connect > -- > > Key: SPARK-44509 > URL: https://issues.apache.org/jira/browse/SPARK-44509 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Next to SparkSession.interruptAll, provide mechanism for interrupting > * individual queries > * user defined groups of queries in a session (by a tag) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44422) Fine grained interrupt in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44422: Assignee: Juliusz Sompolski > Fine grained interrupt in Spark Connect > --- > > Key: SPARK-44422 > URL: https://issues.apache.org/jira/browse/SPARK-44422 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > > Next to SparkSession.interruptAll, provide mechanism for interrupting > * individual queries > * user defined groups of queries in a session (by a tag) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44422) Fine grained interrupt in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44422. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42009 [https://github.com/apache/spark/pull/42009] > Fine grained interrupt in Spark Connect > --- > > Key: SPARK-44422 > URL: https://issues.apache.org/jira/browse/SPARK-44422 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Next to SparkSession.interruptAll, provide mechanism for interrupting > * individual queries > * user defined groups of queries in a session (by a tag) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39634) Allow file splitting in combination with row index generation
[ https://issues.apache.org/jira/browse/SPARK-39634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39634. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40728 [https://github.com/apache/spark/pull/40728] > Allow file splitting in combination with row index generation > - > > Key: SPARK-39634 > URL: https://issues.apache.org/jira/browse/SPARK-39634 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Ala Luszczak >Assignee: Ala Luszczak >Priority: Major > Fix For: 3.5.0 > > > This issue is a follow up for SPARK-37980 > Because of a bug in parquet-mr > https://issues.apache.org/jira/browse/PARQUET-2161 it is currently impossible > to generate row indexes for parquet files if they are split into multiple > pieces. Instead, each file must be read in a single task. > Once the version of parquet-mr with the fix is included in Spark, we should > remove the workarounds from the code (marked with this ticket number) from > the code, so that parquet files are splittable even when the row indexes need > to be generated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39634) Allow file splitting in combination with row index generation
[ https://issues.apache.org/jira/browse/SPARK-39634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39634: --- Assignee: Ala Luszczak > Allow file splitting in combination with row index generation > - > > Key: SPARK-39634 > URL: https://issues.apache.org/jira/browse/SPARK-39634 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Ala Luszczak >Assignee: Ala Luszczak >Priority: Major > > This issue is a follow up for SPARK-37980 > Because of a bug in parquet-mr > https://issues.apache.org/jira/browse/PARQUET-2161 it is currently impossible > to generate row indexes for parquet files if they are split into multiple > pieces. Instead, each file must be read in a single task. > Once the version of parquet-mr with the fix is included in Spark, we should > remove the workarounds from the code (marked with this ticket number) from > the code, so that parquet files are splittable even when the row indexes need > to be generated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44464) Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value
[ https://issues.apache.org/jira/browse/SPARK-44464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44464: - Fix Version/s: 3.4.2 > Fix applyInPandasWithStatePythonRunner to output rows that have Null as first > column value > -- > > Key: SPARK-44464 > URL: https://issues.apache.org/jira/browse/SPARK-44464 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.3 >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Major > Fix For: 3.5.0, 3.4.2 > > > The current implementation of {{ApplyInPandasWithStatePythonRunner}} cannot > deal with outputs where the first column of the row is {{{}null{}}}, as it > cannot distinguish the case where the column is null, or the field is filled > as the number of data records are smaller than state records. It causes > incorrect results for the former case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44484) Add missing json field batchDuration to StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-44484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-44484: Assignee: Wei Liu > Add missing json field batchDuration to StreamingQueryProgress > -- > > Key: SPARK-44484 > URL: https://issues.apache.org/jira/browse/SPARK-44484 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44484) Add missing json field batchDuration to StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-44484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-44484. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42077 [https://github.com/apache/spark/pull/42077] > Add missing json field batchDuration to StreamingQueryProgress > -- > > Key: SPARK-44484 > URL: https://issues.apache.org/jira/browse/SPARK-44484 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44504) Maintenance task should clean up loaded providers on stop error
[ https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-44504: - Fix Version/s: 3.5.0 > Maintenance task should clean up loaded providers on stop error > --- > > Key: SPARK-44504 > URL: https://issues.apache.org/jira/browse/SPARK-44504 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Maintenance task should clean up loaded providers on stop error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44504) Maintenance task should clean up loaded providers on stop error
[ https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-44504: Assignee: Anish Shrigondekar > Maintenance task should clean up loaded providers on stop error > --- > > Key: SPARK-44504 > URL: https://issues.apache.org/jira/browse/SPARK-44504 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > > Maintenance task should clean up loaded providers on stop error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44504) Maintenance task should clean up loaded providers on stop error
[ https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-44504. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42098 [https://github.com/apache/spark/pull/42098] > Maintenance task should clean up loaded providers on stop error > --- > > Key: SPARK-44504 > URL: https://issues.apache.org/jira/browse/SPARK-44504 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Fix For: 4.0.0 > > > Maintenance task should clean up loaded providers on stop error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44504) Maintenance task should clean up loaded providers on stop error
[ https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-44504: - Issue Type: Bug (was: Task) > Maintenance task should clean up loaded providers on stop error > --- > > Key: SPARK-44504 > URL: https://issues.apache.org/jira/browse/SPARK-44504 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Anish Shrigondekar >Priority: Major > > Maintenance task should clean up loaded providers on stop error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-44464) Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value
[ https://issues.apache.org/jira/browse/SPARK-44464 ] Jungtaek Lim deleted comment on SPARK-44464: -- was (Author: JIRAUSER39): [~kabhwan] should we backport it all the way to 11.3? Or it's OK to only fix newer versions? > Fix applyInPandasWithStatePythonRunner to output rows that have Null as first > column value > -- > > Key: SPARK-44464 > URL: https://issues.apache.org/jira/browse/SPARK-44464 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.3 >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Major > Fix For: 3.5.0 > > > The current implementation of {{ApplyInPandasWithStatePythonRunner}} cannot > deal with outputs where the first column of the row is {{{}null{}}}, as it > cannot distinguish the case where the column is null, or the field is filled > as the number of data records are smaller than state records. It causes > incorrect results for the former case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43966) Support non-deterministic Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-43966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-43966: --- Assignee: Allison Wang > Support non-deterministic Python UDTFs > -- > > Key: SPARK-43966 > URL: https://issues.apache.org/jira/browse/SPARK-43966 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Support Python UDTFs with non-deterministic function body and inputs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43966) Support non-deterministic Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-43966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43966. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 42075 [https://github.com/apache/spark/pull/42075] > Support non-deterministic Python UDTFs > -- > > Key: SPARK-43966 > URL: https://issues.apache.org/jira/browse/SPARK-43966 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.5.0 > > > Support Python UDTFs with non-deterministic function body and inputs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44365) Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, MergeRowsExec
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741837#comment-17741837 ] Vinod KC edited comment on SPARK-44365 at 7/21/23 4:15 AM: --- Raise PR : https://github.com/apache/spark/pull/42105 was (Author: vinodkc): Im working on it > Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, > MergeRowsExec > -- > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > operators > FileSourceScanExec > RowDataSourceScanExec > MergeRowsExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44508) Add user guide and documentation for Python UDTFs
Allison Wang created SPARK-44508: Summary: Add user guide and documentation for Python UDTFs Key: SPARK-44508 URL: https://issues.apache.org/jira/browse/SPARK-44508 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Allison Wang Add documentation for Python UDTFs -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used
[ https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745365#comment-17745365 ] Kent Yao commented on SPARK-42898: -- Issue resolved https://github.com/apache/spark/pull/42089 > Cast from string to date and date to string say timezone is needed, but it is > not used > -- > > Key: SPARK-42898 > URL: https://issues.apache.org/jira/browse/SPARK-42898 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > > This is really minor but SPARK-35581 removed the need for a timezone when > casting from a `StringType` to a `DateType`, but the patch didn't update the > `needsTimeZone` function to indicate that it was not longer required. > Currently Casting from a DateType to a StringType also says that it needs the > timezone, but it only uses the `DateFormatter` with it's default parameters > that do not use the time zone at all. > I think this can be fixed with just a two line change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42898) Cast from string to date and date to string say timezone is needed, but it is not used
[ https://issues.apache.org/jira/browse/SPARK-42898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-42898. -- Assignee: Robert Joseph Evans Resolution: Fixed > Cast from string to date and date to string say timezone is needed, but it is > not used > -- > > Key: SPARK-42898 > URL: https://issues.apache.org/jira/browse/SPARK-42898 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > > This is really minor but SPARK-35581 removed the need for a timezone when > casting from a `StringType` to a `DateType`, but the patch didn't update the > `needsTimeZone` function to indicate that it was not longer required. > Currently Casting from a DateType to a StringType also says that it needs the > timezone, but it only uses the `DateFormatter` with it's default parameters > that do not use the time zone at all. > I think this can be fixed with just a two line change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44507) SCSC does not depend on AnalysisException
Rui Wang created SPARK-44507: Summary: SCSC does not depend on AnalysisException Key: SPARK-44507 URL: https://issues.apache.org/jira/browse/SPARK-44507 Project: Spark Issue Type: Sub-task Components: Connect, SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44502) Add mission versionchanged field to docs
[ https://issues.apache.org/jira/browse/SPARK-44502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44502. -- Assignee: Wei Liu Resolution: Fixed Fixed in https://github.com/apache/spark/pull/42097 > Add mission versionchanged field to docs > > > Key: SPARK-44502 > URL: https://issues.apache.org/jira/browse/SPARK-44502 > Project: Spark > Issue Type: New Feature > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Fix For: 3.5.0, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44502) Add mission versionchanged field to docs
[ https://issues.apache.org/jira/browse/SPARK-44502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44502: - Fix Version/s: 3.5.0 4.0.0 > Add mission versionchanged field to docs > > > Key: SPARK-44502 > URL: https://issues.apache.org/jira/browse/SPARK-44502 > Project: Spark > Issue Type: New Feature > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Priority: Major > Fix For: 3.5.0, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44506) Upgrade mima-core & sbt-mima-plugin from 1.1.2 to 1.1.3
BingKun Pan created SPARK-44506: --- Summary: Upgrade mima-core & sbt-mima-plugin from 1.1.2 to 1.1.3 Key: SPARK-44506 URL: https://issues.apache.org/jira/browse/SPARK-44506 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42118) Wrong result when parsing a multiline JSON file with differing types for same column
[ https://issues.apache.org/jira/browse/SPARK-42118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745345#comment-17745345 ] Jia Fan commented on SPARK-42118: - Seem like already fixed on master branch. > Wrong result when parsing a multiline JSON file with differing types for same > column > > > Key: SPARK-42118 > URL: https://issues.apache.org/jira/browse/SPARK-42118 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: Dilip Biswal >Priority: Major > > Here is a simple reproduction of the problem. We have a JSON file whose > content looks like following and is in multiLine format. > {code} > [{"name":""},{"name":123.34}] > {code} > Here is the result of spark query when we read the above content. > scala> val df = spark.read.format("json").option("multiLine", > true).load("/tmp/json") > df: org.apache.spark.sql.DataFrame = [name: double] > scala> df.show(false) > ++ > |name| > ++ > |null| > ++ > scala> df.count > res5: Long = 2 > This is quite a serious problem for us as it's causing us to master corrupt > data in lake. If there is some issue with parsing the input, we expect spark > set the "_corrupt_record" so that we can act on it. Please note that df.count > is reporting 2 rows where as df.show only reports 1 row with null value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44359) Define the computing logic through PartitionEvaluator API and use it in BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort
[ https://issues.apache.org/jira/browse/SPARK-44359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44359: - Summary: Define the computing logic through PartitionEvaluator API and use it in BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort (was: Define the computing logic through PartitionEvaluator API and use it in BaseScriptTransformationExec) > Define the computing logic through PartitionEvaluator API and use it in > BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort > -- > > Key: SPARK-44359 > URL: https://issues.apache.org/jira/browse/SPARK-44359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > aggregate operators > BaseScriptTransformationExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44359) Define the computing logic through PartitionEvaluator API and use it in BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort
[ https://issues.apache.org/jira/browse/SPARK-44359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44359: - Description: Define the computing logic through PartitionEvaluator API and use it in SQL aggregate operators BaseScriptTransformationExec InMemoryTableScanExec ReferenceSort was: Define the computing logic through PartitionEvaluator API and use it in SQL aggregate operators BaseScriptTransformationExec > Define the computing logic through PartitionEvaluator API and use it in > BaseScriptTransformationExec, InMemoryTableScanExec, ReferenceSort > -- > > Key: SPARK-44359 > URL: https://issues.apache.org/jira/browse/SPARK-44359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > aggregate operators > BaseScriptTransformationExec > InMemoryTableScanExec > ReferenceSort -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44365) Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, MergeRowsExec
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44365: - Description: Define the computing logic through PartitionEvaluator API and use it in SQL operators FileSourceScanExec RowDataSourceScanExec MergeRowsExec was: Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec DataSourceScanExec MergeRowsExec ReferenceSort > Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, > MergeRowsExec > -- > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > operators > FileSourceScanExec > RowDataSourceScanExec > MergeRowsExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44365) Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, MergeRowsExec
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44365: - Summary: Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, MergeRowsExec (was: Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , ReferenceSort) > Use PartitionEvaluator API in FileSourceScanExec, RowDataSourceScanExec, > MergeRowsExec > -- > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > operators > InMemoryTableScanExec > DataSourceScanExec > MergeRowsExec > ReferenceSort -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44487) KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir
[ https://issues.apache.org/jira/browse/SPARK-44487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44487: Assignee: Jia Fan > KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir > > > Key: SPARK-44487 > URL: https://issues.apache.org/jira/browse/SPARK-44487 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.4.1 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > > KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir > > Exception encountered when invoking run on a nested suite. > java.lang.NullPointerException > at sun.nio.fs.UnixPath.normalizeAndCheck(UnixPath.java:77) > at sun.nio.fs.UnixPath.(UnixPath.java:71) > at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281) > at java.nio.file.Paths.get(Paths.java:84) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4(KubernetesSuite.scala:164) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4$adapted(KubernetesSuite.scala:163) > at scala.collection.LinearSeqOptimized.find(LinearSeqOptimized.scala:115) > at scala.collection.LinearSeqOptimized.find$(LinearSeqOptimized.scala:112) > at scala.collection.immutable.List.find(List.scala:91) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:163) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44487) KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir
[ https://issues.apache.org/jira/browse/SPARK-44487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44487. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42081 [https://github.com/apache/spark/pull/42081] > KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir > > > Key: SPARK-44487 > URL: https://issues.apache.org/jira/browse/SPARK-44487 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.4.1 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > Fix For: 4.0.0 > > > KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir > > Exception encountered when invoking run on a nested suite. > java.lang.NullPointerException > at sun.nio.fs.UnixPath.normalizeAndCheck(UnixPath.java:77) > at sun.nio.fs.UnixPath.(UnixPath.java:71) > at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281) > at java.nio.file.Paths.get(Paths.java:84) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4(KubernetesSuite.scala:164) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.$anonfun$beforeAll$4$adapted(KubernetesSuite.scala:163) > at scala.collection.LinearSeqOptimized.find(LinearSeqOptimized.scala:115) > at scala.collection.LinearSeqOptimized.find$(LinearSeqOptimized.scala:112) > at scala.collection.immutable.List.find(List.scala:91) > at > org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:163) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44477) CheckAnalysis uses error subclass as an error class
[ https://issues.apache.org/jira/browse/SPARK-44477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44477: Assignee: Bruce Robbins > CheckAnalysis uses error subclass as an error class > --- > > Key: SPARK-44477 > URL: https://issues.apache.org/jira/browse/SPARK-44477 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Minor > > {{CheckAnalysis}} treats {{TYPE_CHECK_FAILURE_WITH_HINT}} as an error class, > but it is instead an error subclass of {{{}DATATYPE_MISMATCH{}}}. > {noformat} > spark-sql (default)> select bitmap_count(12); > [INTERNAL_ERROR] Cannot find main error class 'TYPE_CHECK_FAILURE_WITH_HINT' > org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot find main error > class 'TYPE_CHECK_FAILURE_WITH_HINT' > at org.apache.spark.SparkException$.internalError(SparkException.scala:83) > at org.apache.spark.SparkException$.internalError(SparkException.scala:87) > at > org.apache.spark.ErrorClassesJsonReader.$anonfun$getMessageTemplate$1(ErrorClassesJSONReader.scala:68) > at scala.collection.immutable.HashMap$HashMap1.getOrElse0(HashMap.scala:361) > at > scala.collection.immutable.HashMap$HashTrieMap.getOrElse0(HashMap.scala:594) > at > scala.collection.immutable.HashMap$HashTrieMap.getOrElse0(HashMap.scala:589) > at scala.collection.immutable.HashMap.getOrElse(HashMap.scala:73) > {noformat} > This issue only occurs when an expression uses > {{TypeCheckResult.TypeCheckFailure}} to indicate input type check failure. > {{TypeCheckResult.TypeCheckFailure}} appears to be deprecated in favor of > {{{}TypeCheckResult.DataTypeMismatch{}}}, but recently two expressions were > added that use {{{}TypeCheckResult.TypeCheckFailure{}}}: {{BitmapCount}} and > {{{}BitmapOrAgg{}}}. > {{BitmapCount}} and {{BitmapOrAgg}} should probably be fixed to use > {{{}TypeCheckResult.DataTypeMismatch{}}}. Regardless, the code in > {{CheckAnalysis}} that handles {{TypeCheckResult.TypeCheckFailure}} should be > corrected (or removed). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44477) CheckAnalysis uses error subclass as an error class
[ https://issues.apache.org/jira/browse/SPARK-44477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44477. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42064 [https://github.com/apache/spark/pull/42064] > CheckAnalysis uses error subclass as an error class > --- > > Key: SPARK-44477 > URL: https://issues.apache.org/jira/browse/SPARK-44477 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Minor > Fix For: 3.5.0, 4.0.0 > > > {{CheckAnalysis}} treats {{TYPE_CHECK_FAILURE_WITH_HINT}} as an error class, > but it is instead an error subclass of {{{}DATATYPE_MISMATCH{}}}. > {noformat} > spark-sql (default)> select bitmap_count(12); > [INTERNAL_ERROR] Cannot find main error class 'TYPE_CHECK_FAILURE_WITH_HINT' > org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot find main error > class 'TYPE_CHECK_FAILURE_WITH_HINT' > at org.apache.spark.SparkException$.internalError(SparkException.scala:83) > at org.apache.spark.SparkException$.internalError(SparkException.scala:87) > at > org.apache.spark.ErrorClassesJsonReader.$anonfun$getMessageTemplate$1(ErrorClassesJSONReader.scala:68) > at scala.collection.immutable.HashMap$HashMap1.getOrElse0(HashMap.scala:361) > at > scala.collection.immutable.HashMap$HashTrieMap.getOrElse0(HashMap.scala:594) > at > scala.collection.immutable.HashMap$HashTrieMap.getOrElse0(HashMap.scala:589) > at scala.collection.immutable.HashMap.getOrElse(HashMap.scala:73) > {noformat} > This issue only occurs when an expression uses > {{TypeCheckResult.TypeCheckFailure}} to indicate input type check failure. > {{TypeCheckResult.TypeCheckFailure}} appears to be deprecated in favor of > {{{}TypeCheckResult.DataTypeMismatch{}}}, but recently two expressions were > added that use {{{}TypeCheckResult.TypeCheckFailure{}}}: {{BitmapCount}} and > {{{}BitmapOrAgg{}}}. > {{BitmapCount}} and {{BitmapOrAgg}} should probably be fixed to use > {{{}TypeCheckResult.DataTypeMismatch{}}}. Regardless, the code in > {{CheckAnalysis}} that handles {{TypeCheckResult.TypeCheckFailure}} should be > corrected (or removed). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44252) Add error class for the case when loading state from DFS fails
[ https://issues.apache.org/jira/browse/SPARK-44252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-44252. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 41705 [https://github.com/apache/spark/pull/41705] > Add error class for the case when loading state from DFS fails > -- > > Key: SPARK-44252 > URL: https://issues.apache.org/jira/browse/SPARK-44252 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Lucy Yao >Assignee: Lucy Yao >Priority: Major > Fix For: 4.0.0 > > > This is part of [https://github.com/apache/spark/pull/41705.] > Wrap the exception during the loading state, to assign error class properly. > With assigning error class, we can classify the errors which help us to > determine what errors customers are struggling much. > StateStoreProvider.getStore() & StateStoreProvider.getReadStore() is the > entry point. > This ticket also covers failedToReadDeltaFileError and > failedToReadSnapshotFileError from > [https://issues.apache.org/jira/browse/SPARK-36305|http://example.com/]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44252) Add error class for the case when loading state from DFS fails
[ https://issues.apache.org/jira/browse/SPARK-44252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-44252: Assignee: Lucy Yao > Add error class for the case when loading state from DFS fails > -- > > Key: SPARK-44252 > URL: https://issues.apache.org/jira/browse/SPARK-44252 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Lucy Yao >Assignee: Lucy Yao >Priority: Major > > This is part of [https://github.com/apache/spark/pull/41705.] > Wrap the exception during the loading state, to assign error class properly. > With assigning error class, we can classify the errors which help us to > determine what errors customers are struggling much. > StateStoreProvider.getStore() & StateStoreProvider.getReadStore() is the > entry point. > This ticket also covers failedToReadDeltaFileError and > failedToReadSnapshotFileError from > [https://issues.apache.org/jira/browse/SPARK-36305|http://example.com/]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44505) DataSource v2 Scans should not require planning the input partitions on explain
Martin Grund created SPARK-44505: Summary: DataSource v2 Scans should not require planning the input partitions on explain Key: SPARK-44505 URL: https://issues.apache.org/jira/browse/SPARK-44505 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Martin Grund Right now, we will always call `planInputPartitions()` for a DSv2 implementation even if there is no spark job run but only explain. We should provide a way to avoid scanning all input partitions just to determine if the input is columnar or not. The scan should provide an override. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44504) Maintenance task should clean up loaded providers on stop error
[ https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745272#comment-17745272 ] Anish Shrigondekar commented on SPARK-44504: Sent PR here: https://github.com/apache/spark/pull/42098 > Maintenance task should clean up loaded providers on stop error > --- > > Key: SPARK-44504 > URL: https://issues.apache.org/jira/browse/SPARK-44504 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Anish Shrigondekar >Priority: Major > > Maintenance task should clean up loaded providers on stop error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44504) Maintenance task should clean up loaded providers on stop error
[ https://issues.apache.org/jira/browse/SPARK-44504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745272#comment-17745272 ] Anish Shrigondekar edited comment on SPARK-44504 at 7/20/23 8:57 PM: - Sent PR here: [https://github.com/apache/spark/pull/42098] cc - [~kabhwan] was (Author: JIRAUSER287599): Sent PR here: https://github.com/apache/spark/pull/42098 > Maintenance task should clean up loaded providers on stop error > --- > > Key: SPARK-44504 > URL: https://issues.apache.org/jira/browse/SPARK-44504 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Anish Shrigondekar >Priority: Major > > Maintenance task should clean up loaded providers on stop error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44504) Maintenance task should clean up loaded providers on stop error
Anish Shrigondekar created SPARK-44504: -- Summary: Maintenance task should clean up loaded providers on stop error Key: SPARK-44504 URL: https://issues.apache.org/jira/browse/SPARK-44504 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 3.5.0 Reporter: Anish Shrigondekar Maintenance task should clean up loaded providers on stop error -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44501) Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents
[ https://issues.apache.org/jira/browse/SPARK-44501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44501. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 42094 [https://github.com/apache/spark/pull/42094] > Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents > - > > Key: SPARK-44501 > URL: https://issues.apache.org/jira/browse/SPARK-44501 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44501) Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents
[ https://issues.apache.org/jira/browse/SPARK-44501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44501: - Assignee: Dongjoon Hyun > Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents > - > > Key: SPARK-44501 > URL: https://issues.apache.org/jira/browse/SPARK-44501 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44503) Support PARTITION BY and ORDER BY clause for table arguments
[ https://issues.apache.org/jira/browse/SPARK-44503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745258#comment-17745258 ] Daniel commented on SPARK-44503: I can work on this part > Support PARTITION BY and ORDER BY clause for table arguments > > > Key: SPARK-44503 > URL: https://issues.apache.org/jira/browse/SPARK-44503 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44503) Support PARTITION BY and ORDER BY clause for table arguments
Daniel created SPARK-44503: -- Summary: Support PARTITION BY and ORDER BY clause for table arguments Key: SPARK-44503 URL: https://issues.apache.org/jira/browse/SPARK-44503 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44502) Add mission versionchanged field to docs
Wei Liu created SPARK-44502: --- Summary: Add mission versionchanged field to docs Key: SPARK-44502 URL: https://issues.apache.org/jira/browse/SPARK-44502 Project: Spark Issue Type: New Feature Components: Connect, Structured Streaming Affects Versions: 3.5.0 Reporter: Wei Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44501) Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents
Dongjoon Hyun created SPARK-44501: - Summary: Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents Key: SPARK-44501 URL: https://issues.apache.org/jira/browse/SPARK-44501 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44501) Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents
[ https://issues.apache.org/jira/browse/SPARK-44501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44501: -- Issue Type: Improvement (was: Bug) > Ignore checksum files in KubernetesLocalDiskShuffleExecutorComponents > - > > Key: SPARK-44501 > URL: https://issues.apache.org/jira/browse/SPARK-44501 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44464) Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value
[ https://issues.apache.org/jira/browse/SPARK-44464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745191#comment-17745191 ] Siying Dong commented on SPARK-44464: - [~kabhwan] should we backport it all the way to 11.3? Or it's OK to only fix newer versions? > Fix applyInPandasWithStatePythonRunner to output rows that have Null as first > column value > -- > > Key: SPARK-44464 > URL: https://issues.apache.org/jira/browse/SPARK-44464 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.3 >Reporter: Siying Dong >Assignee: Siying Dong >Priority: Major > Fix For: 3.5.0 > > > The current implementation of {{ApplyInPandasWithStatePythonRunner}} cannot > deal with outputs where the first column of the row is {{{}null{}}}, as it > cannot distinguish the case where the column is null, or the field is filled > as the number of data records are smaller than state records. It causes > incorrect results for the former case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44500) parse_url treats key as regular expression
Robert Joseph Evans created SPARK-44500: --- Summary: parse_url treats key as regular expression Key: SPARK-44500 URL: https://issues.apache.org/jira/browse/SPARK-44500 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1, 3.4.0, 3.3.0, 3.2.0 Reporter: Robert Joseph Evans To be clear I am not 100% sure that this is a bug. It might be a feature, but I don't see anywhere that it is used as a feature. If it is a feature it really should be documented, because there are pitfalls. If it is a bug it should be fixed because it is really confusing and it is simple to shoot yourself in the foot. ```scala > val urls = Seq("http://foo/bar?abc=BAD=GOOD;, > "http://foo/bar?a.c=GOOD=BAD;).toDF > urls.selectExpr("parse_url(value, 'QUERY', 'a.c')").show(false) ++ |parse_url(value, QUERY, a.c)| ++ |BAD | |GOOD| ++ > urls.selectExpr("parse_url(value, 'QUERY', 'a[c')").show(false) java.util.regex.PatternSyntaxException: Unclosed character class near index 15 (&|^)a[c=([^&]*) ^ at java.util.regex.Pattern.error(Pattern.java:1969) at java.util.regex.Pattern.clazz(Pattern.java:2562) at java.util.regex.Pattern.sequence(Pattern.java:2077) at java.util.regex.Pattern.expr(Pattern.java:2010) at java.util.regex.Pattern.compile(Pattern.java:1702) at java.util.regex.Pattern.(Pattern.java:1352) at java.util.regex.Pattern.compile(Pattern.java:1028) ``` The simple fix is to quote the key when making the pattern. ```scala private def getPattern(key: UTF8String): Pattern = { Pattern.compile(REGEXPREFIX + Pattern.quote(key.toString) + REGEXSUBFIX) } ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44499) FileSourceScanExec OutputPartitioning for non bucketed scan
Tushar Mahale created SPARK-44499: - Summary: FileSourceScanExec OutputPartitioning for non bucketed scan Key: SPARK-44499 URL: https://issues.apache.org/jira/browse/SPARK-44499 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1 Reporter: Tushar Mahale FileSourceScanExec.outputPartitioning currently is calculated for bucketed scan only and for non-bucketed scan, we return UnknownPartitioning(0). This may result into unnecessary empty tasks creation, based on the SQLConf defaultParallelism setting even though the actual file may have very low number of partitions. We need to also calculate and set the number of output partitions correctly for non-bucketed scan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44466) Exclude configs starting with SPARK_DRIVER_PREFIX and SPARK_EXECUTOR_PREFIX from modifiedConfigs
[ https://issues.apache.org/jira/browse/SPARK-44466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44466: Summary: Exclude configs starting with SPARK_DRIVER_PREFIX and SPARK_EXECUTOR_PREFIX from modifiedConfigs (was: Update initialSessionOptions to the value after supplementation) > Exclude configs starting with SPARK_DRIVER_PREFIX and SPARK_EXECUTOR_PREFIX > from modifiedConfigs > > > Key: SPARK-44466 > URL: https://issues.apache.org/jira/browse/SPARK-44466 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > Should not include this value: > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44498) Add support for Micrometer Observation
Marcin Grzejszczak created SPARK-44498: -- Summary: Add support for Micrometer Observation Key: SPARK-44498 URL: https://issues.apache.org/jira/browse/SPARK-44498 Project: Spark Issue Type: New Feature Components: Java API Affects Versions: 3.5.0 Reporter: Marcin Grzejszczak I'm a co-maintainer of Spring Cloud Sleuth and Micrometer projects (together with Tommy Ludwig and Jonatan Ivanov). [Micrometer Observation|https://micrometer.io/docs/observation] is part of the Micrometer 1.10 release and [Micrometer Tracing|https://micrometer.io/docs/tracing] is a new project. The idea of Micrometer Observation is that you instrument code once but you get multiple benefits out of it - e.g. you can get tracing, metrics, logging or whatever you see fit). I was curious if there's interest in adding Micrometer Observation support so that automatically (when on classpath) except for metrics, spans could be created and tracing context propagation could happen too. In other words metrics and tracing of this project could be created + if there are Micrometer Observation compatible projects, then they will join the whole graph (e.g. whole Spring Framework 6 is, Apache Dubbo, Apache Camel, Resilience4j etc.) If there's interest in adding that feature, I can provide a PR. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44494) K8s-it test failed
[ https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44494: -- Fix Version/s: 3.5.0 > K8s-it test failed > -- > > Key: SPARK-44494 > URL: https://issues.apache.org/jira/browse/SPARK-44494 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838] > {code:java} > [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 > minutes, 11 seconds) > 3786[info] The code passed to eventually never returned normally. Attempted > 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u > 3787[info] + myuid=185 > 3788[info] ++ id -g > 3789[info] + mygid=0 > 3790[info] + set +e > 3791[info] ++ getent passwd 185 > 3792[info] + uidentry= > 3793[info] + set -e > 3794[info] + '[' -z '' ']' > 3795[info] + '[' -w /etc/passwd ']' > 3796[info] + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > 3797[info] + '[' -z /opt/java/openjdk ']' > 3798[info] + SPARK_CLASSPATH=':/opt/spark/jars/*' > 3799[info] + grep SPARK_JAVA_OPT_ > 3800[info] + sort -t_ -k4 -n > 3801[info] + sed 's/[^=]*=\(.*\)/\1/g' > 3802[info] + env > 3803[info] ++ command -v readarray > 3804[info] + '[' readarray ']' > 3805[info] + readarray -t SPARK_EXECUTOR_JAVA_OPTS > 3806[info] + '[' -n '' ']' > 3807[info] + '[' -z ']' > 3808[info] + '[' -z ']' > 3809[info] + '[' -n '' ']' > 3810[info] + '[' -z ']' > 3811[info] + '[' -z x ']' > 3812[info] + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > 3813[info] + > SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir' > 3814[info] + case "$1" in > 3815[info] + shift 1 > 3816[info] + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf > "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" > --deploy-mode client "$@") > 3817[info] + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.244.0.45 --conf > spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client > --properties-file /opt/spark/conf/spark.properties --class > org.apache.spark.examples.MiniReadWriteTest > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3818[info] Files > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from > /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to > /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar > 3819[info] 23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 3820[info] Performing local word count from > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3821[info] File contents are List(test PVs) > 3822[info] Creating SparkSession > 3823[info] 23/07/20 06:15:15 INFO SparkContext: Running Spark version > 4.0.0-SNAPSHOT > 3824[info] 23/07/20 06:15:15 INFO SparkContext: OS info Linux, > 5.15.0-1041-azure, amd64 > 3825[info] 23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7 > 3826[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3827[info] 23/07/20 06:15:15 INFO ResourceUtils: No custom resources > configured for spark.driver. > 3828[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3829[info] 23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini > Read Write Test > 3830[info] 23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile > created, executor resources: Map(cores -> name: cores, amount: 1, script: , > vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap > -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> > name: cpus, amount: 1.0) {code} > The tests in the past two days have failed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44494) K8s-it test failed
[ https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44494: - Assignee: Yang Jie > K8s-it test failed > -- > > Key: SPARK-44494 > URL: https://issues.apache.org/jira/browse/SPARK-44494 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838] > {code:java} > [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 > minutes, 11 seconds) > 3786[info] The code passed to eventually never returned normally. Attempted > 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u > 3787[info] + myuid=185 > 3788[info] ++ id -g > 3789[info] + mygid=0 > 3790[info] + set +e > 3791[info] ++ getent passwd 185 > 3792[info] + uidentry= > 3793[info] + set -e > 3794[info] + '[' -z '' ']' > 3795[info] + '[' -w /etc/passwd ']' > 3796[info] + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > 3797[info] + '[' -z /opt/java/openjdk ']' > 3798[info] + SPARK_CLASSPATH=':/opt/spark/jars/*' > 3799[info] + grep SPARK_JAVA_OPT_ > 3800[info] + sort -t_ -k4 -n > 3801[info] + sed 's/[^=]*=\(.*\)/\1/g' > 3802[info] + env > 3803[info] ++ command -v readarray > 3804[info] + '[' readarray ']' > 3805[info] + readarray -t SPARK_EXECUTOR_JAVA_OPTS > 3806[info] + '[' -n '' ']' > 3807[info] + '[' -z ']' > 3808[info] + '[' -z ']' > 3809[info] + '[' -n '' ']' > 3810[info] + '[' -z ']' > 3811[info] + '[' -z x ']' > 3812[info] + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > 3813[info] + > SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir' > 3814[info] + case "$1" in > 3815[info] + shift 1 > 3816[info] + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf > "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" > --deploy-mode client "$@") > 3817[info] + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.244.0.45 --conf > spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client > --properties-file /opt/spark/conf/spark.properties --class > org.apache.spark.examples.MiniReadWriteTest > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3818[info] Files > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from > /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to > /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar > 3819[info] 23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 3820[info] Performing local word count from > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3821[info] File contents are List(test PVs) > 3822[info] Creating SparkSession > 3823[info] 23/07/20 06:15:15 INFO SparkContext: Running Spark version > 4.0.0-SNAPSHOT > 3824[info] 23/07/20 06:15:15 INFO SparkContext: OS info Linux, > 5.15.0-1041-azure, amd64 > 3825[info] 23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7 > 3826[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3827[info] 23/07/20 06:15:15 INFO ResourceUtils: No custom resources > configured for spark.driver. > 3828[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3829[info] 23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini > Read Write Test > 3830[info] 23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile > created, executor resources: Map(cores -> name: cores, amount: 1, script: , > vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap > -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> > name: cpus, amount: 1.0) {code} > The tests in the past two days have failed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44494) K8s-it test failed
[ https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44494. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42091 [https://github.com/apache/spark/pull/42091] > K8s-it test failed > -- > > Key: SPARK-44494 > URL: https://issues.apache.org/jira/browse/SPARK-44494 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 4.0.0 > > > * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838] > {code:java} > [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 > minutes, 11 seconds) > 3786[info] The code passed to eventually never returned normally. Attempted > 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u > 3787[info] + myuid=185 > 3788[info] ++ id -g > 3789[info] + mygid=0 > 3790[info] + set +e > 3791[info] ++ getent passwd 185 > 3792[info] + uidentry= > 3793[info] + set -e > 3794[info] + '[' -z '' ']' > 3795[info] + '[' -w /etc/passwd ']' > 3796[info] + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > 3797[info] + '[' -z /opt/java/openjdk ']' > 3798[info] + SPARK_CLASSPATH=':/opt/spark/jars/*' > 3799[info] + grep SPARK_JAVA_OPT_ > 3800[info] + sort -t_ -k4 -n > 3801[info] + sed 's/[^=]*=\(.*\)/\1/g' > 3802[info] + env > 3803[info] ++ command -v readarray > 3804[info] + '[' readarray ']' > 3805[info] + readarray -t SPARK_EXECUTOR_JAVA_OPTS > 3806[info] + '[' -n '' ']' > 3807[info] + '[' -z ']' > 3808[info] + '[' -z ']' > 3809[info] + '[' -n '' ']' > 3810[info] + '[' -z ']' > 3811[info] + '[' -z x ']' > 3812[info] + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > 3813[info] + > SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir' > 3814[info] + case "$1" in > 3815[info] + shift 1 > 3816[info] + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf > "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" > --deploy-mode client "$@") > 3817[info] + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.244.0.45 --conf > spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client > --properties-file /opt/spark/conf/spark.properties --class > org.apache.spark.examples.MiniReadWriteTest > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3818[info] Files > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from > /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to > /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar > 3819[info] 23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 3820[info] Performing local word count from > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3821[info] File contents are List(test PVs) > 3822[info] Creating SparkSession > 3823[info] 23/07/20 06:15:15 INFO SparkContext: Running Spark version > 4.0.0-SNAPSHOT > 3824[info] 23/07/20 06:15:15 INFO SparkContext: OS info Linux, > 5.15.0-1041-azure, amd64 > 3825[info] 23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7 > 3826[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3827[info] 23/07/20 06:15:15 INFO ResourceUtils: No custom resources > configured for spark.driver. > 3828[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3829[info] 23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini > Read Write Test > 3830[info] 23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile > created, executor resources: Map(cores -> name: cores, amount: 1, script: , > vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap > -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> > name: cpus, amount: 1.0) {code} > The tests in the past two days have failed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44497) Show task partition id in Task table
[ https://issues.apache.org/jira/browse/SPARK-44497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744987#comment-17744987 ] ASF GitHub Bot commented on SPARK-44497: User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/42093 > Show task partition id in Task table > > > Key: SPARK-44497 > URL: https://issues.apache.org/jira/browse/SPARK-44497 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.4.1 >Reporter: dzcxzl >Priority: Minor > > In SPARK-37831, the partition id is added in taskinfo, and the task partition > id cannot be directly seen in the ui. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44497) Show task partition id in Task table
[ https://issues.apache.org/jira/browse/SPARK-44497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dzcxzl updated SPARK-44497: --- Description: In SPARK-37831, the partition id is added in taskinfo, and the task partition id cannot be directly seen in the ui. (was: In [SPARK-37831|https://issues.apache.org/jira/browse/SPARK-37831], the partition id is added in taskinfo, and the task partition id cannot be directly seen in the ui) > Show task partition id in Task table > > > Key: SPARK-44497 > URL: https://issues.apache.org/jira/browse/SPARK-44497 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.4.1 >Reporter: dzcxzl >Priority: Minor > > In SPARK-37831, the partition id is added in taskinfo, and the task partition > id cannot be directly seen in the ui. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44497) Show task partition id in Task table
dzcxzl created SPARK-44497: -- Summary: Show task partition id in Task table Key: SPARK-44497 URL: https://issues.apache.org/jira/browse/SPARK-44497 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.4.1 Reporter: dzcxzl In [SPARK-37831|https://issues.apache.org/jira/browse/SPARK-37831], the partition id is added in taskinfo, and the task partition id cannot be directly seen in the ui -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44484) Add missing json field batchDuration to StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-44484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744978#comment-17744978 ] ASF GitHub Bot commented on SPARK-44484: User 'WweiL' has created a pull request for this issue: https://github.com/apache/spark/pull/42077 > Add missing json field batchDuration to StreamingQueryProgress > -- > > Key: SPARK-44484 > URL: https://issues.apache.org/jira/browse/SPARK-44484 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Wei Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43001) Spark last window dont flush in append mode
[ https://issues.apache.org/jira/browse/SPARK-43001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744973#comment-17744973 ] padavan commented on SPARK-43001: - [~kabhwan] ? > Spark last window dont flush in append mode > --- > > Key: SPARK-43001 > URL: https://issues.apache.org/jira/browse/SPARK-43001 > Project: Spark > Issue Type: Bug > Components: PySpark, Structured Streaming >Affects Versions: 3.3.2 >Reporter: padavan >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > The problem is very simple, when you use *TUMBLING* *window* with {*}append > mode{*}, then the window is closed +only when the next message arrives+ > ({_}+watermark logic{_}). > In the current implementation, if you *stop* *incoming* streaming data, the > *last* window will *NEVER close* and we LOSE the last window data. > > Business situation: > Worked correctly and new messages stop incoming and next message come in 5 > hours later and the client will get the message after 5 hours instead of the > 10 seconds delay of window. > !https://user-images.githubusercontent.com/61819835/226478055-dc4a123c-4397-4eb0-b6ed-1e185b6fab76.png|width=707,height=294! > The current implementation needs to be improved. Include in spark internal > mechanisms to close windows automatically. > > *What we propose:* > Add third parameter > {{{}DataFrame.{}}}{{{}withWatermark{}}}({_}eventTime{_}, {_}delayThreshold, > *maxDelayClose*{_}). And then trigger will execute > {code:java} > if(now - window.upper_bound > maxDelayClose){ > window.close().flush(); > } > {code} > I assume it can be done in a day. It wasn't expected for us that our > customers couldn't get the notifications. (the company is in the medical > field). > > simple code for problem: > {code:java} > kafka_stream_df = spark \ > .readStream \ > .format("kafka") \ > .option("kafka.bootstrap.servers", KAFKA_BROKER) \ > .option("subscribe", KAFKA_TOPIC) \ > .option("includeHeaders", "true") \ > .load() > sel = (kafka_stream_df.selectExpr("CAST(key AS STRING)", "CAST(value AS > STRING)") > .select(from_json(col("value").cast("string"), > json_schema).alias("data")) > .select("data.*") > .withWatermark("dt", "1 seconds") > .groupBy(window("dt", "10 seconds")) > .agg(sum("price")) > ) > > console = sel \ > .writeStream \ > .trigger(processingTime='10 seconds') \ > .format("console") \ > .outputMode("append")\ > .start() > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44496) Move Interfaces needed by SCSC to sql/api
Rui Wang created SPARK-44496: Summary: Move Interfaces needed by SCSC to sql/api Key: SPARK-44496 URL: https://issues.apache.org/jira/browse/SPARK-44496 Project: Spark Issue Type: Sub-task Components: Connect, SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44495) Resume to use the latest minikube for k8s-it on GitHub Action
Yang Jie created SPARK-44495: Summary: Resume to use the latest minikube for k8s-it on GitHub Action Key: SPARK-44495 URL: https://issues.apache.org/jira/browse/SPARK-44495 Project: Spark Issue Type: Task Components: Kubernetes, Project Infra, Tests Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44494) K8s-it test failed
[ https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744935#comment-17744935 ] Yang Jie commented on SPARK-44494: -- It seems that the test started to fail after Minikube upgrading to 1.31.0, before is v1.30.1 > K8s-it test failed > -- > > Key: SPARK-44494 > URL: https://issues.apache.org/jira/browse/SPARK-44494 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838] > {code:java} > [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 > minutes, 11 seconds) > 3786[info] The code passed to eventually never returned normally. Attempted > 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u > 3787[info] + myuid=185 > 3788[info] ++ id -g > 3789[info] + mygid=0 > 3790[info] + set +e > 3791[info] ++ getent passwd 185 > 3792[info] + uidentry= > 3793[info] + set -e > 3794[info] + '[' -z '' ']' > 3795[info] + '[' -w /etc/passwd ']' > 3796[info] + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > 3797[info] + '[' -z /opt/java/openjdk ']' > 3798[info] + SPARK_CLASSPATH=':/opt/spark/jars/*' > 3799[info] + grep SPARK_JAVA_OPT_ > 3800[info] + sort -t_ -k4 -n > 3801[info] + sed 's/[^=]*=\(.*\)/\1/g' > 3802[info] + env > 3803[info] ++ command -v readarray > 3804[info] + '[' readarray ']' > 3805[info] + readarray -t SPARK_EXECUTOR_JAVA_OPTS > 3806[info] + '[' -n '' ']' > 3807[info] + '[' -z ']' > 3808[info] + '[' -z ']' > 3809[info] + '[' -n '' ']' > 3810[info] + '[' -z ']' > 3811[info] + '[' -z x ']' > 3812[info] + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > 3813[info] + > SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir' > 3814[info] + case "$1" in > 3815[info] + shift 1 > 3816[info] + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf > "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" > --deploy-mode client "$@") > 3817[info] + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.244.0.45 --conf > spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client > --properties-file /opt/spark/conf/spark.properties --class > org.apache.spark.examples.MiniReadWriteTest > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3818[info] Files > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from > /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to > /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar > 3819[info] 23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 3820[info] Performing local word count from > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3821[info] File contents are List(test PVs) > 3822[info] Creating SparkSession > 3823[info] 23/07/20 06:15:15 INFO SparkContext: Running Spark version > 4.0.0-SNAPSHOT > 3824[info] 23/07/20 06:15:15 INFO SparkContext: OS info Linux, > 5.15.0-1041-azure, amd64 > 3825[info] 23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7 > 3826[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3827[info] 23/07/20 06:15:15 INFO ResourceUtils: No custom resources > configured for spark.driver. > 3828[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3829[info] 23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini > Read Write Test > 3830[info] 23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile > created, executor resources: Map(cores -> name: cores, amount: 1, script: , > vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap > -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> > name: cpus, amount: 1.0) {code} > The tests in the past two days have failed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44494) K8s-it test failed
Yang Jie created SPARK-44494: Summary: K8s-it test failed Key: SPARK-44494 URL: https://issues.apache.org/jira/browse/SPARK-44494 Project: Spark Issue Type: Bug Components: Kubernetes, Tests Affects Versions: 4.0.0 Reporter: Yang Jie * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838] {code:java} [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 minutes, 11 seconds) 3786[info] The code passed to eventually never returned normally. Attempted 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u 3787[info] + myuid=185 3788[info] ++ id -g 3789[info] + mygid=0 3790[info] + set +e 3791[info] ++ getent passwd 185 3792[info] + uidentry= 3793[info] + set -e 3794[info] + '[' -z '' ']' 3795[info] + '[' -w /etc/passwd ']' 3796[info] + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' 3797[info] + '[' -z /opt/java/openjdk ']' 3798[info] + SPARK_CLASSPATH=':/opt/spark/jars/*' 3799[info] + grep SPARK_JAVA_OPT_ 3800[info] + sort -t_ -k4 -n 3801[info] + sed 's/[^=]*=\(.*\)/\1/g' 3802[info] + env 3803[info] ++ command -v readarray 3804[info] + '[' readarray ']' 3805[info] + readarray -t SPARK_EXECUTOR_JAVA_OPTS 3806[info] + '[' -n '' ']' 3807[info] + '[' -z ']' 3808[info] + '[' -z ']' 3809[info] + '[' -n '' ']' 3810[info] + '[' -z ']' 3811[info] + '[' -z x ']' 3812[info] + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' 3813[info] + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir' 3814[info] + case "$1" in 3815[info] + shift 1 3816[info] + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") 3817[info] + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.0.45 --conf spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.MiniReadWriteTest local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar /opt/spark/pv-tests/tmp3727659354473892032.txt 3818[info] Files local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar 3819[info] 23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3820[info] Performing local word count from /opt/spark/pv-tests/tmp3727659354473892032.txt 3821[info] File contents are List(test PVs) 3822[info] Creating SparkSession 3823[info] 23/07/20 06:15:15 INFO SparkContext: Running Spark version 4.0.0-SNAPSHOT 3824[info] 23/07/20 06:15:15 INFO SparkContext: OS info Linux, 5.15.0-1041-azure, amd64 3825[info] 23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7 3826[info] 23/07/20 06:15:15 INFO ResourceUtils: == 3827[info] 23/07/20 06:15:15 INFO ResourceUtils: No custom resources configured for spark.driver. 3828[info] 23/07/20 06:15:15 INFO ResourceUtils: == 3829[info] 23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini Read Write Test 3830[info] 23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) {code} The tests in the past two days have failed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44494) K8s-it test failed
[ https://issues.apache.org/jira/browse/SPARK-44494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17744933#comment-17744933 ] Yang Jie commented on SPARK-44494: -- cc [~yikunkero] Do you have any suggestions? > K8s-it test failed > -- > > Key: SPARK-44494 > URL: https://issues.apache.org/jira/browse/SPARK-44494 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/apache/spark/actions/runs/5607397734/jobs/10258527838] > {code:java} > [info] - PVs with local hostpath storage on statefulsets *** FAILED *** (3 > minutes, 11 seconds) > 3786[info] The code passed to eventually never returned normally. Attempted > 7921 times over 3.000105988813 minutes. Last failure message: "++ id -u > 3787[info] + myuid=185 > 3788[info] ++ id -g > 3789[info] + mygid=0 > 3790[info] + set +e > 3791[info] ++ getent passwd 185 > 3792[info] + uidentry= > 3793[info] + set -e > 3794[info] + '[' -z '' ']' > 3795[info] + '[' -w /etc/passwd ']' > 3796[info] + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > 3797[info] + '[' -z /opt/java/openjdk ']' > 3798[info] + SPARK_CLASSPATH=':/opt/spark/jars/*' > 3799[info] + grep SPARK_JAVA_OPT_ > 3800[info] + sort -t_ -k4 -n > 3801[info] + sed 's/[^=]*=\(.*\)/\1/g' > 3802[info] + env > 3803[info] ++ command -v readarray > 3804[info] + '[' readarray ']' > 3805[info] + readarray -t SPARK_EXECUTOR_JAVA_OPTS > 3806[info] + '[' -n '' ']' > 3807[info] + '[' -z ']' > 3808[info] + '[' -z ']' > 3809[info] + '[' -n '' ']' > 3810[info] + '[' -z ']' > 3811[info] + '[' -z x ']' > 3812[info] + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > 3813[info] + > SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*:/opt/spark/work-dir' > 3814[info] + case "$1" in > 3815[info] + shift 1 > 3816[info] + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --conf > "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS" > --deploy-mode client "$@") > 3817[info] + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.244.0.45 --conf > spark.executorEnv.SPARK_DRIVER_POD_IP=10.244.0.45 --deploy-mode client > --properties-file /opt/spark/conf/spark.properties --class > org.apache.spark.examples.MiniReadWriteTest > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3818[info] Files > local:///opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar from > /opt/spark/examples/jars/spark-examples_2.12-4.0.0-SNAPSHOT.jar to > /opt/spark/work-dir/spark-examples_2.12-4.0.0-SNAPSHOT.jar > 3819[info] 23/07/20 06:15:15 WARN NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 3820[info] Performing local word count from > /opt/spark/pv-tests/tmp3727659354473892032.txt > 3821[info] File contents are List(test PVs) > 3822[info] Creating SparkSession > 3823[info] 23/07/20 06:15:15 INFO SparkContext: Running Spark version > 4.0.0-SNAPSHOT > 3824[info] 23/07/20 06:15:15 INFO SparkContext: OS info Linux, > 5.15.0-1041-azure, amd64 > 3825[info] 23/07/20 06:15:15 INFO SparkContext: Java version 17.0.7 > 3826[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3827[info] 23/07/20 06:15:15 INFO ResourceUtils: No custom resources > configured for spark.driver. > 3828[info] 23/07/20 06:15:15 INFO ResourceUtils: > == > 3829[info] 23/07/20 06:15:15 INFO SparkContext: Submitted application: Mini > Read Write Test > 3830[info] 23/07/20 06:15:16 INFO ResourceProfile: Default ResourceProfile > created, executor resources: Map(cores -> name: cores, amount: 1, script: , > vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap > -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> > name: cpus, amount: 1.0) {code} > The tests in the past two days have failed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44493) Extract pushable predicates from disjunctive predicates
[ https://issues.apache.org/jira/browse/SPARK-44493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44493: Attachment: before.png > Extract pushable predicates from disjunctive predicates > --- > > Key: SPARK-44493 > URL: https://issues.apache.org/jira/browse/SPARK-44493 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Attachments: after.png, before.png > > > Example: > {code:sql} > select count(*) > from > db.very_large_table > where > session_start_dt between date_sub('2023-07-15', 1) and > date_add('2023-07-16', 1) > and type = 'event' > and date(event_timestamp) between '2023-07-15' and '2023-07-16' > and ( > ( > page_id in (2627, 2835, 2402999) > and -- other predicates > and rdt = 0 > ) or ( > page_id in (2616, 3411350) > and rdt = 0 > ) or ( > page_id = 2403006 > ) or ( > page_id in (2208336, 2356359) > and -- other predicates > and rdt = 0 > ) > ) > {code} > We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, > 2208336, 2356359)}} to datasource. > Before: > After: -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44493) Extract pushable predicates from disjunctive predicates
Yuming Wang created SPARK-44493: --- Summary: Extract pushable predicates from disjunctive predicates Key: SPARK-44493 URL: https://issues.apache.org/jira/browse/SPARK-44493 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Yuming Wang Attachments: after.png, before.png Example: {code:sql} select count(*) from db.very_large_table where session_start_dt between date_sub('2023-07-15', 1) and date_add('2023-07-16', 1) and type = 'event' and date(event_timestamp) between '2023-07-15' and '2023-07-16' and ( ( page_id in (2627, 2835, 2402999) and -- other predicates and rdt = 0 ) or ( page_id in (2616, 3411350) and rdt = 0 ) or ( page_id = 2403006 ) or ( page_id in (2208336, 2356359) and -- other predicates and rdt = 0 ) ) {code} We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 2208336, 2356359)}} to datasource. Before: After: -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44493) Extract pushable predicates from disjunctive predicates
[ https://issues.apache.org/jira/browse/SPARK-44493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44493: Description: Example: {code:sql} select count(*) from db.very_large_table where session_start_dt between date_sub('2023-07-15', 1) and date_add('2023-07-16', 1) and type = 'event' and date(event_timestamp) between '2023-07-15' and '2023-07-16' and ( ( page_id in (2627, 2835, 2402999) and -- other predicates and rdt = 0 ) or ( page_id in (2616, 3411350) and rdt = 0 ) or ( page_id = 2403006 ) or ( page_id in (2208336, 2356359) and -- other predicates and rdt = 0 ) ) {code} We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 2208336, 2356359)}} to datasource. Before: !before.png! After: !after.png! was: Example: {code:sql} select count(*) from db.very_large_table where session_start_dt between date_sub('2023-07-15', 1) and date_add('2023-07-16', 1) and type = 'event' and date(event_timestamp) between '2023-07-15' and '2023-07-16' and ( ( page_id in (2627, 2835, 2402999) and -- other predicates and rdt = 0 ) or ( page_id in (2616, 3411350) and rdt = 0 ) or ( page_id = 2403006 ) or ( page_id in (2208336, 2356359) and -- other predicates and rdt = 0 ) ) {code} We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, 2208336, 2356359)}} to datasource. Before: After: > Extract pushable predicates from disjunctive predicates > --- > > Key: SPARK-44493 > URL: https://issues.apache.org/jira/browse/SPARK-44493 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Attachments: after.png, before.png > > > Example: > {code:sql} > select count(*) > from > db.very_large_table > where > session_start_dt between date_sub('2023-07-15', 1) and > date_add('2023-07-16', 1) > and type = 'event' > and date(event_timestamp) between '2023-07-15' and '2023-07-16' > and ( > ( > page_id in (2627, 2835, 2402999) > and -- other predicates > and rdt = 0 > ) or ( > page_id in (2616, 3411350) > and rdt = 0 > ) or ( > page_id = 2403006 > ) or ( > page_id in (2208336, 2356359) > and -- other predicates > and rdt = 0 > ) > ) > {code} > We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, > 2208336, 2356359)}} to datasource. > Before: > !before.png! > After: > !after.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44493) Extract pushable predicates from disjunctive predicates
[ https://issues.apache.org/jira/browse/SPARK-44493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44493: Attachment: after.png > Extract pushable predicates from disjunctive predicates > --- > > Key: SPARK-44493 > URL: https://issues.apache.org/jira/browse/SPARK-44493 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Attachments: after.png, before.png > > > Example: > {code:sql} > select count(*) > from > db.very_large_table > where > session_start_dt between date_sub('2023-07-15', 1) and > date_add('2023-07-16', 1) > and type = 'event' > and date(event_timestamp) between '2023-07-15' and '2023-07-16' > and ( > ( > page_id in (2627, 2835, 2402999) > and -- other predicates > and rdt = 0 > ) or ( > page_id in (2616, 3411350) > and rdt = 0 > ) or ( > page_id = 2403006 > ) or ( > page_id in (2208336, 2356359) > and -- other predicates > and rdt = 0 > ) > ) > {code} > We can push down {{page_id in(2627, 2835, 2402999, 2616, 3411350, 2403006, > 2208336, 2356359)}} to datasource. > Before: > After: -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44475) Relocate DataType and Parser to sql/api
[ https://issues.apache.org/jira/browse/SPARK-44475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44475. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41928 [https://github.com/apache/spark/pull/41928] > Relocate DataType and Parser to sql/api > --- > > Key: SPARK-44475 > URL: https://issues.apache.org/jira/browse/SPARK-44475 > Project: Spark > Issue Type: Sub-task > Components: Connect, SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44491) Add `branch-3.5` to `publish_snapshot` GitHub Action job
[ https://issues.apache.org/jira/browse/SPARK-44491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44491: Assignee: BingKun Pan > Add `branch-3.5` to `publish_snapshot` GitHub Action job > > > Key: SPARK-44491 > URL: https://issues.apache.org/jira/browse/SPARK-44491 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44491) Add `branch-3.5` to `publish_snapshot` GitHub Action job
[ https://issues.apache.org/jira/browse/SPARK-44491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44491. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42088 [https://github.com/apache/spark/pull/42088] > Add `branch-3.5` to `publish_snapshot` GitHub Action job > > > Key: SPARK-44491 > URL: https://issues.apache.org/jira/browse/SPARK-44491 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44492) Resolve remaining AnalysisException
Haejoon Lee created SPARK-44492: --- Summary: Resolve remaining AnalysisException Key: SPARK-44492 URL: https://issues.apache.org/jira/browse/SPARK-44492 Project: Spark Issue Type: Sub-task Components: Connect, Pandas API on Spark Affects Versions: 3.5.0 Reporter: Haejoon Lee We addressed most of AnalysisException from SPARK-43611, but there are still some remaining tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org