[jira] [Updated] (SPARK-49856) Refactor the compileExpression of JdbcDialect for simplify the subclass
[ https://issues.apache.org/jira/browse/SPARK-49856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-49856: --- Summary: Refactor the compileExpression of JdbcDialect for simplify the subclass (was: Refactor the compileExpression of JdbcDialect) > Refactor the compileExpression of JdbcDialect for simplify the subclass > --- > > Key: SPARK-49856 > URL: https://issues.apache.org/jira/browse/SPARK-49856 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49856) Refactor the compileExpression of JdbcDialect
Jiaan Geng created SPARK-49856: -- Summary: Refactor the compileExpression of JdbcDialect Key: SPARK-49856 URL: https://issues.apache.org/jira/browse/SPARK-49856 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49756) Postgres dialect supports pushdown datetime functions.
Jiaan Geng created SPARK-49756: -- Summary: Postgres dialect supports pushdown datetime functions. Key: SPARK-49756 URL: https://issues.apache.org/jira/browse/SPARK-49756 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49488) MySQL dialect supports pushdown datetime functions.
[ https://issues.apache.org/jira/browse/SPARK-49488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-49488: --- Description: (was: Currently, the DS V2 pushdown framework translate DayOfWeek and WeekDay by the way adapted to the H2 database. However, many database supports the built-in function DayOfWeek and WeekDay directly and the behavior is different from H2. The V2 pushdown framework should translate them in a more neutral way.) > MySQL dialect supports pushdown datetime functions. > --- > > Key: SPARK-49488 > URL: https://issues.apache.org/jira/browse/SPARK-49488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49488) MySQL dialect supports pushdown datetime functions.
[ https://issues.apache.org/jira/browse/SPARK-49488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-49488: --- Summary: MySQL dialect supports pushdown datetime functions. (was: Improve the DS V2 pushdown framework for DayOfWeek and WeekDay.) > MySQL dialect supports pushdown datetime functions. > --- > > Key: SPARK-49488 > URL: https://issues.apache.org/jira/browse/SPARK-49488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, the DS V2 pushdown framework translate DayOfWeek and WeekDay by > the way adapted to the H2 database. However, many database supports the > built-in function DayOfWeek and WeekDay directly and the behavior is > different from H2. > The V2 pushdown framework should translate them in a more neutral way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49488) Improve the translate for DayOfWeek and WeekDay.
[ https://issues.apache.org/jira/browse/SPARK-49488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-49488: --- Description: Currently, the DS V2 pushdown framework translate > Improve the translate for DayOfWeek and WeekDay. > > > Key: SPARK-49488 > URL: https://issues.apache.org/jira/browse/SPARK-49488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, the DS V2 pushdown framework translate -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-49488) Improve the translate for DayOfWeek and WeekDay.
Jiaan Geng created SPARK-49488: -- Summary: Improve the translate for DayOfWeek and WeekDay. Key: SPARK-49488 URL: https://issues.apache.org/jira/browse/SPARK-49488 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48820) Correct the examples for Collate function
Jiaan Geng created SPARK-48820: -- Summary: Correct the examples for Collate function Key: SPARK-48820 URL: https://issues.apache.org/jira/browse/SPARK-48820 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47991) Arrange the test cases for window frames and window functions.
Jiaan Geng created SPARK-47991: -- Summary: Arrange the test cases for window frames and window functions. Key: SPARK-47991 URL: https://issues.apache.org/jira/browse/SPARK-47991 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47795) Supplement the doc of job schedule for K8S
Jiaan Geng created SPARK-47795: -- Summary: Supplement the doc of job schedule for K8S Key: SPARK-47795 URL: https://issues.apache.org/jira/browse/SPARK-47795 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47391) Remove the test case workaround for JDK 8
Jiaan Geng created SPARK-47391: -- Summary: Remove the test case workaround for JDK 8 Key: SPARK-47391 URL: https://issues.apache.org/jira/browse/SPARK-47391 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng Spark SQL test case in ExpressionEncoderSuite fails in windows operation system. {code:java} Internal error (java.io.FileNotFoundException): D:\Users\gja\git-forks\spark\sql\catalyst\target\scala-2.13\test-classes\org\apache\spark\sql\catalyst\encoders\ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$OuterLevelWithVeryVeryVeryLongClassName2$OuterLevelWithVeryVeryVeryLongClassName3$OuterLevelWithVeryVeryVeryLongClassName4$OuterLevelWithVeryVeryVeryLongClassName5$OuterLevelWithVeryVeryVeryLongClassName6$.class (文件名、目录名或卷标语法不正确。) java.io.FileNotFoundException: D:\Users\gja\git-forks\spark\sql\catalyst\target\scala-2.13\test-classes\org\apache\spark\sql\catalyst\encoders\ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$OuterLevelWithVeryVeryVeryLongClassName2$OuterLevelWithVeryVeryVeryLongClassName3$OuterLevelWithVeryVeryVeryLongClassName4$OuterLevelWithVeryVeryVeryLongClassName5$OuterLevelWithVeryVeryVeryLongClassName6$.class (文件名、目录名或卷标语法不正确。) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:216) at java.base/java.io.FileInputStream.(FileInputStream.java:157) at com.intellij.openapi.util.io.FileUtil.loadFileBytes(FileUtil.java:211) at org.jetbrains.jps.incremental.scala.local.LazyCompiledClass.$anonfun$getContent$1(LazyCompiledClass.scala:18) at scala.Option.getOrElse(Option.scala:201) at org.jetbrains.jps.incremental.scala.local.LazyCompiledClass.getContent(LazyCompiledClass.scala:17) at org.jetbrains.jps.incremental.instrumentation.BaseInstrumentingBuilder.performBuild(BaseInstrumentingBuilder.java:38) at org.jetbrains.jps.incremental.instrumentation.ClassProcessingBuilder.build(ClassProcessingBuilder.java:80) at org.jetbrains.jps.incremental.IncProjectBuilder.runModuleLevelBuilders(IncProjectBuilder.java:1569) at org.jetbrains.jps.incremental.IncProjectBuilder.runBuildersForChunk(IncProjectBuilder.java:1198) at org.jetbrains.jps.incremental.IncProjectBuilder.buildTargetsChunk(IncProjectBuilder.java:1349) at org.jetbrains.jps.incremental.IncProjectBuilder.buildChunkIfAffected(IncProjectBuilder.java:1163) at org.jetbrains.jps.incremental.IncProjectBuilder$BuildParallelizer$1.run(IncProjectBuilder.java:1129) at com.intellij.util.concurrency.BoundedTaskExecutor.doRun(BoundedTaskExecutor.java:244) at com.intellij.util.concurrency.BoundedTaskExecutor.access$200(BoundedTaskExecutor.java:30) at com.intellij.util.concurrency.BoundedTaskExecutor$1.executeFirstTaskAndHelpQueue(BoundedTaskExecutor.java:222) at com.intellij.util.ConcurrencyUtil.runUnderThreadName(ConcurrencyUtil.java:218) at com.intellij.util.concurrency.BoundedTaskExecutor$1.run(BoundedTaskExecutor.java:210) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:842) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46929) Use ThreadUtils.shutdown to close thread pools
[ https://issues.apache.org/jira/browse/SPARK-46929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46929: --- Component/s: Connect Spark Core SS (was: SQL) > Use ThreadUtils.shutdown to close thread pools > -- > > Key: SPARK-46929 > URL: https://issues.apache.org/jira/browse/SPARK-46929 > Project: Spark > Issue Type: Improvement > Components: Connect, Spark Core, SS >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46929) Use ThreadUtils.shutdown to close thread pools
Jiaan Geng created SPARK-46929: -- Summary: Use ThreadUtils.shutdown to close thread pools Key: SPARK-46929 URL: https://issues.apache.org/jira/browse/SPARK-46929 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46895) Replace Timer with single thread scheduled executor
Jiaan Geng created SPARK-46895: -- Summary: Replace Timer with single thread scheduled executor Key: SPARK-46895 URL: https://issues.apache.org/jira/browse/SPARK-46895 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng Spark exists some Timer. We should replace Timer with single thread scheduled executor -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46882) Remove unnecessary AtomicInteger
[ https://issues.apache.org/jira/browse/SPARK-46882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46882: --- Summary: Remove unnecessary AtomicInteger (was: Remove unnessary AtomicInteger) > Remove unnecessary AtomicInteger > > > Key: SPARK-46882 > URL: https://issues.apache.org/jira/browse/SPARK-46882 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46882) Remove unnessary AtomicInteger
Jiaan Geng created SPARK-46882: -- Summary: Remove unnessary AtomicInteger Key: SPARK-46882 URL: https://issues.apache.org/jira/browse/SPARK-46882 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46760) Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer
Jiaan Geng created SPARK-46760: -- Summary: Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer Key: SPARK-46760 URL: https://issues.apache.org/jira/browse/SPARK-46760 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46611) Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter
Jiaan Geng created SPARK-46611: -- Summary: Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter Key: SPARK-46611 URL: https://issues.apache.org/jira/browse/SPARK-46611 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46494) Remove the parse rule of First, Last and Any_value
[ https://issues.apache.org/jira/browse/SPARK-46494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-46494. Resolution: Won't Fix > Remove the parse rule of First, Last and Any_value > -- > > Key: SPARK-46494 > URL: https://issues.apache.org/jira/browse/SPARK-46494 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Spark have separate parse rule for First, Last and Any_value. > In fact, the parse rule for general function call support works well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46494) Remove the parse rule of First, Last and Any_value
[ https://issues.apache.org/jira/browse/SPARK-46494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46494: --- Description: Spark have separate parse rule for First, Last and Any_value. In fact, the parse rule for general function call support works well. was:Spark have separate parse rule for Merge the parse rule of PercentileCont and PercentileDisc into functionCall > Remove the parse rule of First, Last and Any_value > -- > > Key: SPARK-46494 > URL: https://issues.apache.org/jira/browse/SPARK-46494 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Spark have separate parse rule for First, Last and Any_value. > In fact, the parse rule for general function call support works well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46494) Remove the parse rule of First, Last and Any_value
[ https://issues.apache.org/jira/browse/SPARK-46494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46494: --- Description: Spark have separate parse rule for Merge the parse rule of PercentileCont and PercentileDisc into functionCall (was: Spark have separate parse rule for ) > Remove the parse rule of First, Last and Any_value > -- > > Key: SPARK-46494 > URL: https://issues.apache.org/jira/browse/SPARK-46494 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Spark have separate parse rule for Merge the parse rule of PercentileCont and > PercentileDisc into functionCall -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46494) Remove the parse rule of First, Last and Any_value
[ https://issues.apache.org/jira/browse/SPARK-46494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46494: --- Description: Spark have separate parse rule for > Remove the parse rule of First, Last and Any_value > -- > > Key: SPARK-46494 > URL: https://issues.apache.org/jira/browse/SPARK-46494 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Spark have separate parse rule for -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46494) Remove the parse rule of First, Last and Any_value
Jiaan Geng created SPARK-46494: -- Summary: Remove the parse rule of First, Last and Any_value Key: SPARK-46494 URL: https://issues.apache.org/jira/browse/SPARK-46494 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46491) Eliminate the aggregation if the group keys is the subset of the partition keys
Jiaan Geng created SPARK-46491: -- Summary: Eliminate the aggregation if the group keys is the subset of the partition keys Key: SPARK-46491 URL: https://issues.apache.org/jira/browse/SPARK-46491 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46207) Support MergeInto in DataFrameWriterV2
[ https://issues.apache.org/jira/browse/SPARK-46207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng reassigned SPARK-46207: -- Assignee: Huaxin Gao > Support MergeInto in DataFrameWriterV2 > -- > > Key: SPARK-46207 > URL: https://issues.apache.org/jira/browse/SPARK-46207 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46207) Support MergeInto in DataFrameWriterV2
[ https://issues.apache.org/jira/browse/SPARK-46207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-46207. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44119 [https://github.com/apache/spark/pull/44119] > Support MergeInto in DataFrameWriterV2 > -- > > Key: SPARK-46207 > URL: https://issues.apache.org/jira/browse/SPARK-46207 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46443) Decimal precision and scale should decided by JDBC dialect.
[ https://issues.apache.org/jira/browse/SPARK-46443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46443: --- Summary: Decimal precision and scale should decided by JDBC dialect. (was: Ensure Decimal precision and scale should decided by JDBC dialect.) > Decimal precision and scale should decided by JDBC dialect. > --- > > Key: SPARK-46443 > URL: https://issues.apache.org/jira/browse/SPARK-46443 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46442) DS V2 supports push down PERCENTILE_CONT and PERCENTILE_DISC
Jiaan Geng created SPARK-46442: -- Summary: DS V2 supports push down PERCENTILE_CONT and PERCENTILE_DISC Key: SPARK-46442 URL: https://issues.apache.org/jira/browse/SPARK-46442 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46406) Assign a name to the error class _LEGACY_ERROR_TEMP_1023
[ https://issues.apache.org/jira/browse/SPARK-46406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-46406. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44355 [https://github.com/apache/spark/pull/44355] > Assign a name to the error class _LEGACY_ERROR_TEMP_1023 > > > Key: SPARK-46406 > URL: https://issues.apache.org/jira/browse/SPARK-46406 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45795) DS V2 supports push down Mode
[ https://issues.apache.org/jira/browse/SPARK-45795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45795. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43661 [https://github.com/apache/spark/pull/43661] > DS V2 supports push down Mode > - > > Key: SPARK-45795 > URL: https://issues.apache.org/jira/browse/SPARK-45795 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Many databases support the aggregate function mode. So DS V2 could push down > it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46403) Decode parquet binary with getBytesUnsafe method
[ https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng reassigned SPARK-46403: -- Assignee: Wan Kun > Decode parquet binary with getBytesUnsafe method > > > Key: SPARK-46403 > URL: https://issues.apache.org/jira/browse/SPARK-46403 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wan Kun >Assignee: Wan Kun >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-14-16-30-39-104.png > > > Now spark will get a parquet binary object with getBytes() method. > The *Binary.getBytes()* method will always make a new copy of the internal > bytes. > We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has > already been called getBytes() and has the cached bytes. > !image-2023-12-14-16-30-39-104.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46403) Decode parquet binary with getBytesUnsafe method
[ https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-46403. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44351 [https://github.com/apache/spark/pull/44351] > Decode parquet binary with getBytesUnsafe method > > > Key: SPARK-46403 > URL: https://issues.apache.org/jira/browse/SPARK-46403 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wan Kun >Assignee: Wan Kun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2023-12-14-16-30-39-104.png > > > Now spark will get a parquet binary object with getBytes() method. > The *Binary.getBytes()* method will always make a new copy of the internal > bytes. > We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has > already been called getBytes() and has the cached bytes. > !image-2023-12-14-16-30-39-104.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46406) Assign a name to the error class _LEGACY_ERROR_TEMP_1023
Jiaan Geng created SPARK-46406: -- Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_1023 Key: SPARK-46406 URL: https://issues.apache.org/jira/browse/SPARK-46406 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)
[ https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45796. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44184 [https://github.com/apache/spark/pull/44184] > Support MODE() WITHIN GROUP (ORDER BY col) > --- > > Key: SPARK-45796 > URL: https://issues.apache.org/jira/browse/SPARK-45796 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Many mainstream databases supports the syntax show below. > { MODE() WITHIN GROUP (ORDER BY sortSpecification) } > [FILTER (WHERE expression)] [OVER windowNameOrSpecification] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46363) Improve the java text block with java15 feature
Jiaan Geng created SPARK-46363: -- Summary: Improve the java text block with java15 feature Key: SPARK-46363 URL: https://issues.apache.org/jira/browse/SPARK-46363 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45649) Unify the prepare framework for `OffsetWindowFunctionFrame`
[ https://issues.apache.org/jira/browse/SPARK-45649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45649. Resolution: Fixed > Unify the prepare framework for `OffsetWindowFunctionFrame` > --- > > Key: SPARK-45649 > URL: https://issues.apache.org/jira/browse/SPARK-45649 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Attachments: test_table.parquet.zip > > > Currently, the implementation the `prepare` of all the > `OffsetWindowFunctionFrame` have the same code logic show below. > ``` > override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { > if (offset > rows.length) { > fillDefaultValue(EmptyRow) > } else { > resetStates(rows) > if (ignoreNulls) { > ... > } else { > ... > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46270) Use java14 instanceof expressions to replace the java8 instanceof statement
Jiaan Geng created SPARK-46270: -- Summary: Use java14 instanceof expressions to replace the java8 instanceof statement Key: SPARK-46270 URL: https://issues.apache.org/jira/browse/SPARK-46270 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall
[ https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-46009. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43910 [https://github.com/apache/spark/pull/43910] > Merge the parse rule of PercentileCont and PercentileDisc into functionCall > --- > > Key: SPARK-46009 > URL: https://issues.apache.org/jira/browse/SPARK-46009 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark SQL parser have a special rule to parse > [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). > We should merge this rule into the functionCall. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46101) Replace (string|array).size with (string|array).length in all the modules
[ https://issues.apache.org/jira/browse/SPARK-46101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46101: --- Summary: Replace (string|array).size with (string|array).length in all the modules (was: Replace (string|array).size with (string|array).length in module SQL) > Replace (string|array).size with (string|array).length in all the modules > - > > Key: SPARK-46101 > URL: https://issues.apache.org/jira/browse/SPARK-46101 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46100) Replace (string|array).size with (string|array).length in all the modules
[ https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46100: --- Summary: Replace (string|array).size with (string|array).length in all the modules (was: Replace (string|array).size with (string|array).length in module core) > Replace (string|array).size with (string|array).length in all the modules > - > > Key: SPARK-46100 > URL: https://issues.apache.org/jira/browse/SPARK-46100 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46100) Replace (string|array).size with (string|array).length in module core
[ https://issues.apache.org/jira/browse/SPARK-46100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46100: --- Summary: Replace (string|array).size with (string|array).length in module core (was: Replace (string|array).size with (string|array).length in all the modules) > Replace (string|array).size with (string|array).length in module core > - > > Key: SPARK-46100 > URL: https://issues.apache.org/jira/browse/SPARK-46100 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46101) Fix these issue in module sql
Jiaan Geng created SPARK-46101: -- Summary: Fix these issue in module sql Key: SPARK-46101 URL: https://issues.apache.org/jira/browse/SPARK-46101 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46100) Fix these issue in module core
Jiaan Geng created SPARK-46100: -- Summary: Fix these issue in module core Key: SPARK-46100 URL: https://issues.apache.org/jira/browse/SPARK-46100 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length
[ https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46098: --- Description: There are a lot of (string|array).size called. In fact, the size calls the underlying length, this behavior increase the stack length. We should call (string|array).length directly. We also get the compile waring Replace .size with .length on arrays and strings was: There are a lot of (string|array).size called. In fact, the size calls the underlying length, this behavior increase the stack length. We should call # Replace .size with .length on arrays and strings > Reduce stack depth by replace (string|array).size with (string|array).length > > > Key: SPARK-46098 > URL: https://issues.apache.org/jira/browse/SPARK-46098 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > There are a lot of (string|array).size called. > In fact, the size calls the underlying length, this behavior increase the > stack length. > We should call (string|array).length directly. > We also get the compile waring Replace .size with .length on arrays and > strings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length
[ https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46098: --- Description: There are a lot of (string|array).size called. In fact, the size calls the underlying length, this behavior increase the stack length. We should call # Replace .size with .length on arrays and strings was: There are a lot of # Replace .size with .length on arrays and strings # Replace .size with .length on arrays and strings > Reduce stack depth by replace (string|array).size with (string|array).length > > > Key: SPARK-46098 > URL: https://issues.apache.org/jira/browse/SPARK-46098 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > There are a lot of (string|array).size called. > In fact, the size calls the underlying length, this behavior increase the > stack length. > We should call > # Replace .size with .length on arrays and strings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46098) Reduce stack depth by replace (string|array).size with (string|array).length
[ https://issues.apache.org/jira/browse/SPARK-46098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46098: --- Description: There are a lot of # Replace .size with .length on arrays and strings # Replace .size with .length on arrays and strings was: There are a lot of # Replace .size with .length on arrays and strings > Reduce stack depth by replace (string|array).size with (string|array).length > > > Key: SPARK-46098 > URL: https://issues.apache.org/jira/browse/SPARK-46098 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > There are a lot of # Replace .size with .length on arrays and strings > # Replace .size with .length on arrays and strings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45649) Unify the prepare framework for `OffsetWindowFunctionFrame`
[ https://issues.apache.org/jira/browse/SPARK-45649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788329#comment-17788329 ] Jiaan Geng commented on SPARK-45649: [~cloud_fan]I see. I will investigate this bug. > Unify the prepare framework for `OffsetWindowFunctionFrame` > --- > > Key: SPARK-45649 > URL: https://issues.apache.org/jira/browse/SPARK-45649 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: test_table.parquet.zip > > > Currently, the implementation the `prepare` of all the > `OffsetWindowFunctionFrame` have the same code logic show below. > ``` > override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { > if (offset > rows.length) { > fillDefaultValue(EmptyRow) > } else { > resetStates(rows) > if (ignoreNulls) { > ... > } else { > ... > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46029) Escape the single quote, _ and % for DS V2 pushdown
[ https://issues.apache.org/jira/browse/SPARK-46029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46029: --- Summary: Escape the single quote, _ and % for DS V2 pushdown (was: Escape the ', _ and % for DS V2 pushdown) > Escape the single quote, _ and % for DS V2 pushdown > --- > > Key: SPARK-46029 > URL: https://issues.apache.org/jira/browse/SPARK-46029 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0, 3.5.0, 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Spark supports push down startsWith, endWith and contains to JDBC database > with DS V2 pushdown. > But the V2ExpressionSQLBuilder didn't escape the single quote, _ and %, it > can cause unexpected result. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall
[ https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46009: --- Description: Spark SQL parser have a special rule to parse [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). We should merge this rule into the functionCall. was: Spark SQL parse have a special rule to parse [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). We should merge this rule into the functionCall. > Merge the parse rule of PercentileCont and PercentileDisc into functionCall > --- > > Key: SPARK-46009 > URL: https://issues.apache.org/jira/browse/SPARK-46009 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Spark SQL parser have a special rule to parse > [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). > We should merge this rule into the functionCall. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall
[ https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46009: --- Description: Spark SQL parse have a special rule to parse [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). We should merge this rule into the functionCall. was: Spark SQL parse have a special rule to parse [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). We should merge this rule into the > Merge the parse rule of PercentileCont and PercentileDisc into functionCall > --- > > Key: SPARK-46009 > URL: https://issues.apache.org/jira/browse/SPARK-46009 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Spark SQL parse have a special rule to parse > [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). > We should merge this rule into the functionCall. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall
[ https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-46009: --- Description: Spark SQL parse have a special rule to parse [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). We should merge this rule into the > Merge the parse rule of PercentileCont and PercentileDisc into functionCall > --- > > Key: SPARK-46009 > URL: https://issues.apache.org/jira/browse/SPARK-46009 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Spark SQL parse have a special rule to parse > [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v). > We should merge this rule into the -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall
Jiaan Geng created SPARK-46009: -- Summary: Merge the parse rule of PercentileCont and PercentileDisc into functionCall Key: SPARK-46009 URL: https://issues.apache.org/jira/browse/SPARK-46009 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45933) Runtime filter should infers more application side.
Jiaan Geng created SPARK-45933: -- Summary: Runtime filter should infers more application side. Key: SPARK-45933 URL: https://issues.apache.org/jira/browse/SPARK-45933 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45904) Mode function supports sort direction
Jiaan Geng created SPARK-45904: -- Summary: Mode function supports sort direction Key: SPARK-45904 URL: https://issues.apache.org/jira/browse/SPARK-45904 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng Currently, mode function doesn't support sort. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45840) Fix these issue in module sql/hive, sql/hive-thriftserver
[ https://issues.apache.org/jira/browse/SPARK-45840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45840: --- Summary: Fix these issue in module sql/hive, sql/hive-thriftserver (was: Fix these issue in module sql/hive) > Fix these issue in module sql/hive, sql/hive-thriftserver > - > > Key: SPARK-45840 > URL: https://issues.apache.org/jira/browse/SPARK-45840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45840) Fix these issue in module sql/hive
Jiaan Geng created SPARK-45840: -- Summary: Fix these issue in module sql/hive Key: SPARK-45840 URL: https://issues.apache.org/jira/browse/SPARK-45840 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45839) Fix these issue in module sql/api
Jiaan Geng created SPARK-45839: -- Summary: Fix these issue in module sql/api Key: SPARK-45839 URL: https://issues.apache.org/jira/browse/SPARK-45839 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45838) Fix these issue in module sql/core
Jiaan Geng created SPARK-45838: -- Summary: Fix these issue in module sql/core Key: SPARK-45838 URL: https://issues.apache.org/jira/browse/SPARK-45838 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45825) Fix these issue in module sql/catalyst
[ https://issues.apache.org/jira/browse/SPARK-45825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45825: --- Summary: Fix these issue in module sql/catalyst (was: Fix these issue in package sql/catalyst) > Fix these issue in module sql/catalyst > -- > > Key: SPARK-45825 > URL: https://issues.apache.org/jira/browse/SPARK-45825 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45816) Return null when overflowing during casting from timestamp to integers
[ https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45816. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43694 [https://github.com/apache/spark/pull/43694] > Return null when overflowing during casting from timestamp to integers > -- > > Key: SPARK-45816 > URL: https://issues.apache.org/jira/browse/SPARK-45816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark cast works in two modes: ansi and non-ansi. When overflowing during > casting, the common behavior under non-ansi mode is to return null. However, > casting from Timestamp to Int/Short/Byte returns a wrapping value now. The > behavior to silently overflow doesn't make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45816) Return null when overflowing during casting from timestamp to integers
[ https://issues.apache.org/jira/browse/SPARK-45816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng reassigned SPARK-45816: -- Assignee: L. C. Hsieh > Return null when overflowing during casting from timestamp to integers > -- > > Key: SPARK-45816 > URL: https://issues.apache.org/jira/browse/SPARK-45816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > > Spark cast works in two modes: ansi and non-ansi. When overflowing during > casting, the common behavior under non-ansi mode is to return null. However, > casting from Timestamp to Int/Short/Byte returns a wrapping value now. The > behavior to silently overflow doesn't make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45606) Release restrictions on multi-layer runtime filter
[ https://issues.apache.org/jira/browse/SPARK-45606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45606. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43449 [https://github.com/apache/spark/pull/43449] > Release restrictions on multi-layer runtime filter > -- > > Key: SPARK-45606 > URL: https://issues.apache.org/jira/browse/SPARK-45606 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Before https://issues.apache.org/jira/browse/SPARK-41674, Spark only supports > insert runtime filter for application side of shuffle join on single-layer. > Considered it's not worth to insert more runtime filter if one side of the > shuffle join already exists runtime filter, Spark restricts it. > After https://issues.apache.org/jira/browse/SPARK-41674, Spark supports > insert runtime filter for one side of any shuffle join on multi-layer. But > the restrictions on multi-layer runtime filter looks outdated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45832) Fix 'Super method + is deprecated.'
Jiaan Geng created SPARK-45832: -- Summary: Fix 'Super method + is deprecated.' Key: SPARK-45832 URL: https://issues.apache.org/jira/browse/SPARK-45832 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng @deprecated("Consider requiring an immutable Map or fall back to Map.concat.", "2.13.0") def + [V1 >: V](kv: (K, V1)): CC[K, V1] = mapFactory.from(new View.Appended(this, kv)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45825) Fix these issue in package sql/catalyst
Jiaan Geng created SPARK-45825: -- Summary: Fix these issue in package sql/catalyst Key: SPARK-45825 URL: https://issues.apache.org/jira/browse/SPARK-45825 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some scala compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Description: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .nonEmpty # Replace with .isDefined # Unnecessary parentheses # Replace with .isEmpty was: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .nonEmpty # Replace with .isDefined # Unnecessary parentheses > Fix some scala compile warnings > --- > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # The enclosing block is redundant > # Replace with .head > # Replace with .nonEmpty > # Replace with .isDefined > # Unnecessary parentheses > # Replace with .isEmpty -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some scala compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Description: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .nonEmpty # Replace with .isDefined # Unnecessary parentheses was: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .nonEmpty # Replace with .isDefined > Fix some scala compile warnings > --- > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # The enclosing block is redundant > # Replace with .head > # Replace with .nonEmpty > # Replace with .isDefined > # Unnecessary parentheses -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some scala compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Description: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .nonEmpty # Replace with .isDefined was: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .nonEmpty > Fix some scala compile warnings > --- > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # The enclosing block is redundant > # Replace with .head > # Replace with .nonEmpty > # Replace with .isDefined -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some scala compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Description: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .nonEmpty was: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .contains > Fix some scala compile warnings > --- > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # The enclosing block is redundant > # Replace with .head > # Replace with .nonEmpty -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some scala compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Description: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head # Replace with .contains was: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head > Fix some scala compile warnings > --- > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # The enclosing block is redundant > # Replace with .head > # Replace with .contains -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some scala compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Description: # Replace .size with .length on arrays and strings # The enclosing block is redundant # Replace with .head was: # Replace .size with .length on arrays and strings # The enclosing block is redundant # > Fix some scala compile warnings > --- > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # The enclosing block is redundant > # Replace with .head -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some scala compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Description: # Replace .size with .length on arrays and strings # The enclosing block is redundant # was: # Replace .size with .length on arrays and strings # > Fix some scala compile warnings > --- > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # The enclosing block is redundant > # -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some scala compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Summary: Fix some scala compile warnings (was: Fix some compile warnings) > Fix some scala compile warnings > --- > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix some compile warnings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Summary: Fix some compile warnings (was: Fix Replace .size with .length on arrays and strings) > Fix some compile warnings > - > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > # Replace .size with .length on arrays and strings > # -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45823) Fix Replace .size with .length on arrays and strings
[ https://issues.apache.org/jira/browse/SPARK-45823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45823: --- Summary: Fix Replace .size with .length on arrays and strings (was: Fix The enclosing block is redundant) > Fix Replace .size with .length on arrays and strings > > > Key: SPARK-45823 > URL: https://issues.apache.org/jira/browse/SPARK-45823 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45823) Fix The enclosing block is redundant
Jiaan Geng created SPARK-45823: -- Summary: Fix The enclosing block is redundant Key: SPARK-45823 URL: https://issues.apache.org/jira/browse/SPARK-45823 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45793) Improve the built-in compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45793. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43659 [https://github.com/apache/spark/pull/43659] > Improve the built-in compression codecs > --- > > Key: SPARK-45793 > URL: https://issues.apache.org/jira/browse/SPARK-45793 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, Spark supported many built-in compression codecs used for I/O and > storage. > There are a lot of magic strings copy from built-in compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45758) Introduce a mapper for hadoop compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45758. Resolution: Resolved > Introduce a mapper for hadoop compression codecs > > > Key: SPARK-45758 > URL: https://issues.apache.org/jira/browse/SPARK-45758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported partial Hadoop compression codecs, but the Hadoop > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce two fake compression codecs none and > uncompress. > There are a lot of magic strings copy from Hadoop compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45758) Introduce a mapper for hadoop compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783169#comment-17783169 ] Jiaan Geng commented on SPARK-45758: Resolved by https://github.com/apache/spark/pull/43620 > Introduce a mapper for hadoop compression codecs > > > Key: SPARK-45758 > URL: https://issues.apache.org/jira/browse/SPARK-45758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported partial Hadoop compression codecs, but the Hadoop > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce two fake compression codecs none and > uncompress. > There are a lot of magic strings copy from Hadoop compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)
[ https://issues.apache.org/jira/browse/SPARK-45796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45796: --- Description: Many mainstream databases supports the syntax show below. { MODE() WITHIN GROUP (ORDER BY sortSpecification) } [FILTER (WHERE expression)] [OVER windowNameOrSpecification] > Support MODE() WITHIN GROUP (ORDER BY col) > --- > > Key: SPARK-45796 > URL: https://issues.apache.org/jira/browse/SPARK-45796 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Many mainstream databases supports the syntax show below. > { MODE() WITHIN GROUP (ORDER BY sortSpecification) } > [FILTER (WHERE expression)] [OVER windowNameOrSpecification] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45796) Support MODE() WITHIN GROUP (ORDER BY col)
Jiaan Geng created SPARK-45796: -- Summary: Support MODE() WITHIN GROUP (ORDER BY col) Key: SPARK-45796 URL: https://issues.apache.org/jira/browse/SPARK-45796 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45795) DS V2 supports push down Mode
Jiaan Geng created SPARK-45795: -- Summary: DS V2 supports push down Mode Key: SPARK-45795 URL: https://issues.apache.org/jira/browse/SPARK-45795 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45793) Improve the built-in compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45793: --- Description: Currently, Spark supported many built-in compression codecs used for I/O and storage. There are a lot of magic strings copy from built-in compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Improve the built-in compression codecs > --- > > Key: SPARK-45793 > URL: https://issues.apache.org/jira/browse/SPARK-45793 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, Spark supported many built-in compression codecs used for I/O and > storage. > There are a lot of magic strings copy from built-in compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45758) Introduce a mapper for hadoop compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45758: --- Description: Currently, Spark supported partial Hadoop compression codecs, but the Hadoop supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two fake compression codecs none and uncompress. There are a lot of magic strings copy from Hadoop compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for hadoop compression codecs > > > Key: SPARK-45758 > URL: https://issues.apache.org/jira/browse/SPARK-45758 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, Spark supported partial Hadoop compression codecs, but the Hadoop > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce two fake compression codecs none and > uncompress. > There are a lot of magic strings copy from Hadoop compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45758) Introduce a mapper for hadoop compression codecs
Jiaan Geng created SPARK-45758: -- Summary: Introduce a mapper for hadoop compression codecs Key: SPARK-45758 URL: https://issues.apache.org/jira/browse/SPARK-45758 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45755) Push down limit through Dataset.isEmpty()
[ https://issues.apache.org/jira/browse/SPARK-45755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng reassigned SPARK-45755: -- Assignee: Yuming Wang > Push down limit through Dataset.isEmpty() > - > > Key: SPARK-45755 > URL: https://issues.apache.org/jira/browse/SPARK-45755 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Labels: pull-request-available > > Push down LocalLimit can not optimize the case of distinct. > {code:scala} > def isEmpty: Boolean = withAction("isEmpty", > withTypedPlan { LocalLimit(Literal(1), select().logicalPlan) > }.queryExecution) { plan => > plan.executeTake(1).isEmpty > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45755) Push down limit through Dataset.isEmpty()
[ https://issues.apache.org/jira/browse/SPARK-45755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45755. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43617 [https://github.com/apache/spark/pull/43617] > Push down limit through Dataset.isEmpty() > - > > Key: SPARK-45755 > URL: https://issues.apache.org/jira/browse/SPARK-45755 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Push down LocalLimit can not optimize the case of distinct. > {code:scala} > def isEmpty: Boolean = withAction("isEmpty", > withTypedPlan { LocalLimit(Literal(1), select().logicalPlan) > }.queryExecution) { plan => > plan.executeTake(1).isEmpty > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45711) Introduce a mapper for avro compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45711. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43562 [https://github.com/apache/spark/pull/43562] > Introduce a mapper for avro compression codecs > -- > > Key: SPARK-45711 > URL: https://issues.apache.org/jira/browse/SPARK-45711 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, Spark supported all the avro compression codecs, but the avro > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce the compression codecs UNCOMPRESSED. > There are a lot of magic strings copy from avro compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45711) Introduce a mapper for avro compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45711: --- Description: Currently, Spark supported all the avro compression codecs, but the avro supported compression codecs and spark supported are not completely one-on-one due to Spark introduce the compression codecs UNCOMPRESSED. There are a lot of magic strings copy from avro compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. was: Currently, Spark supported all the avro compression codecs, but the avro supported compression codecs and spark supported are not completely one-on-one due to Spark introduce the compression codecs UNCOMPRESSED. There are a lot of magic strings copy from orc compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for avro compression codecs > -- > > Key: SPARK-45711 > URL: https://issues.apache.org/jira/browse/SPARK-45711 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, Spark supported all the avro compression codecs, but the avro > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce the compression codecs UNCOMPRESSED. > There are a lot of magic strings copy from avro compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45711) Introduce a mapper for avro compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45711: --- Description: Currently, Spark supported all the avro compression codecs, but the avro supported compression codecs and spark supported are not completely one-on-one due to Spark introduce the compression codecs UNCOMPRESSED. There are a lot of magic strings copy from orc compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. was: Currently, Spark supported all the orc compression codecs, but the orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs none and UNCOMPRESSED. There are a lot of magic strings copy from orc compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for avro compression codecs > -- > > Key: SPARK-45711 > URL: https://issues.apache.org/jira/browse/SPARK-45711 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, Spark supported all the avro compression codecs, but the avro > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce the compression codecs UNCOMPRESSED. > There are a lot of magic strings copy from orc compression codecs. This issue > lead to developers need to manually maintain its consistency. It is easy to > make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45711) Introduce a mapper for avro compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45711: --- Description: Currently, Spark supported all the orc compression codecs, but the orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs none and UNCOMPRESSED. There are a lot of magic strings copy from orc compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for avro compression codecs > -- > > Key: SPARK-45711 > URL: https://issues.apache.org/jira/browse/SPARK-45711 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, Spark supported all the orc compression codecs, but the orc > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce two compression codecs none and > UNCOMPRESSED. > There are a lot of magic strings copy from orc compression codecs. This issue > lead to developers need to manually maintain its consistency. It is easy to > make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45481. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43308 [https://github.com/apache/spark/pull/43308] > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45664) Introduce a mapper for orc compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45664: --- Description: Currently, Spark supported all the orc compression codecs, but the orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs none and UNCOMPRESSED. There are a lot of magic strings copy from orc compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. was: Currently, Spark supported all the orc compression codecs, but the orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs none and UNCOMPRESSED. There are a lot of magic strings copy from parquet compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for orc compression codecs > - > > Key: SPARK-45664 > URL: https://issues.apache.org/jira/browse/SPARK-45664 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, Spark supported all the orc compression codecs, but the orc > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce two compression codecs none and > UNCOMPRESSED. > There are a lot of magic strings copy from orc compression codecs. This issue > lead to developers need to manually maintain its consistency. It is easy to > make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45664) Introduce a mapper for orc compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45664: --- Description: Currently, Spark supported all the orc compression codecs, but the orc supported compression codecs and spark supported are not completely one-on-one due to Spark introduce two compression codecs none and UNCOMPRESSED. There are a lot of magic strings copy from parquet compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for orc compression codecs > - > > Key: SPARK-45664 > URL: https://issues.apache.org/jira/browse/SPARK-45664 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, Spark supported all the orc compression codecs, but the orc > supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce two compression codecs none and > UNCOMPRESSED. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45481: --- Description: Currently, Spark supported all the parquet compression codecs, but the parquet supported compression codecs and spark supported are not completely one-on-one due to Spark introduce a fake compression codecs none. There are a lot of magic strings copy from parquet compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. was: Currently, Spark supported most of all parquet compression codecs, the parquet supported compression codecs and spark supported are not completely one-on-one. There are a lot of magic strings copy from parquet compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45649) Unify the prepare framework for `OffsetWindowFunctionFrame`
[ https://issues.apache.org/jira/browse/SPARK-45649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45649: --- Summary: Unify the prepare framework for `OffsetWindowFunctionFrame` (was: Unified the prepare framework for `OffsetWindowFunctionFrame`) > Unify the prepare framework for `OffsetWindowFunctionFrame` > --- > > Key: SPARK-45649 > URL: https://issues.apache.org/jira/browse/SPARK-45649 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, the implementation the `prepare` of all the > `OffsetWindowFunctionFrame` have the same code logic show below. > ``` > override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { > if (offset > rows.length) { > fillDefaultValue(EmptyRow) > } else { > resetStates(rows) > if (ignoreNulls) { > ... > } else { > ... > } > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45649) Unified the prepare framework for `OffsetWindowFunctionFrame`
Jiaan Geng created SPARK-45649: -- Summary: Unified the prepare framework for `OffsetWindowFunctionFrame` Key: SPARK-45649 URL: https://issues.apache.org/jira/browse/SPARK-45649 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng Currently, the implementation the `prepare` of all the `OffsetWindowFunctionFrame` have the same code logic show below. ``` override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { if (offset > rows.length) { fillDefaultValue(EmptyRow) } else { resetStates(rows) if (ignoreNulls) { ... } else { ... } } } ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45543) InferWindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions
[ https://issues.apache.org/jira/browse/SPARK-45543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45543. Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 43385 [https://github.com/apache/spark/pull/43385] > InferWindowGroupLimit causes bug if the other window functions haven't the > same window frame as the rank-like functions > --- > > Key: SPARK-45543 > URL: https://issues.apache.org/jira/browse/SPARK-45543 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 3.5.0 >Reporter: Ron Serruya >Assignee: Jiaan Geng >Priority: Critical > Labels: correctness, data-loss, pull-request-available > Fix For: 3.5.1, 4.0.0 > > > First, it's my first bug, so I'm hoping I'm doing it right, also, as I'm not > very knowledgeable about spark internals, I hope I diagnosed the problem > correctly > I found the degradation in spark version 3.5.0: > When using multiple windows that share the same partition and ordering (but > with different "frame boundaries", where one window is a ranking function, > "WindowGroupLimit" is added to the plan causing wrong values to be created > from the other windows. > *This behavior didn't exist in versions 3.3 and 3.4.* > Example: > > {code:python} > import pysparkfrom pyspark.sql import functions as F, Window > df = spark.createDataFrame([ > {'row_id': 1, 'name': 'Dave', 'score': 1, 'year': 2020}, > {'row_id': 1, 'name': 'Dave', 'score': 2, 'year': 2022}, > {'row_id': 1, 'name': 'Dave', 'score': 3, 'year': 2023}, > {'row_id': 2, 'name': 'Amy', 'score': 6, 'year': 2021}, > ]) > # Create first window for row number > window_spec = Window.partitionBy('row_id', 'name').orderBy(F.desc('year')) > # Create additional window from the first window with unbounded frame > unbound_spec = window_spec.rowsBetween(Window.unboundedPreceding, > Window.unboundedFollowing) > # Try to keep the first row by year, and also collect all scores into a list > df2 = df.withColumn( > 'rn', > F.row_number().over(window_spec) > ).withColumn( > 'all_scores', > F.collect_list('score').over(unbound_spec) > ){code} > So far everything works, and if we display df2: > > {noformat} > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3, 2, 1] | > |Dave|1 |2|2022|2 |[3, 2, 1] | > |Dave|1 |1|2020|3 |[3, 2, 1] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+{noformat} > > However, once we filter to keep only the first row number: > > {noformat} > df2.filter("rn=1").show(truncate=False) > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+{noformat} > As you can see just filtering changed the "all_scores" array for Dave. > (This example uses `collect_list`, however, the same result happens with > other functions, such as max, min, count, etc) > > Now, if instead of using the two windows we used, I will use the first window > and a window with different ordering, or create a completely new window with > same partition but no ordering, it will work fine: > {code:python} > new_window = Window.partitionBy('row_id', > 'name').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) > df3 = df.withColumn( > 'rn', > F.row_number().over(window_spec) > ).withColumn( > 'all_scores', > F.collect_list('score').over(new_window) > ) > df3.filter("rn=1").show(truncate=False){code} > {noformat} > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3, 2, 1] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+ > {noformat} > In addition, if we use all 3 windows to create 3 different columns, it will > also work ok. So it seems the issue happens only when all the windows used > share the same partition and ordering. > Here is the final plan for the faulty dataframe: > {noformat} > df2.filter("rn=1").explain() > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Filter (rn#9 = 1) > +- Window [row_number() windowspecdefinition(row_id#1L, name#0, year#3L > DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), > currentrow$())) AS rn#9, collect_list(score#2L, 0, 0) > windowspecdefinition(row_id#1L, name#0, year#3L DESC NULLS LA
[jira] [Created] (SPARK-45606) Release restrictions on multi-layer runtime filter
Jiaan Geng created SPARK-45606: -- Summary: Release restrictions on multi-layer runtime filter Key: SPARK-45606 URL: https://issues.apache.org/jira/browse/SPARK-45606 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng Before https://issues.apache.org/jira/browse/SPARK-41674, Spark only supports insert runtime filter for application side of shuffle join on single-layer. Considered it's not worth to insert more runtime filter if one side of the shuffle join already exists runtime filter, Spark restricts it. After https://issues.apache.org/jira/browse/SPARK-41674, Spark supports insert runtime filter for one side of any shuffle join on multi-layer. But the restrictions on multi-layer runtime filter looks outdated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45543) InferWindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions
[ https://issues.apache.org/jira/browse/SPARK-45543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45543: --- Summary: InferWindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions (was: InferWindowGroupLimit causes bug if the window frame is different between rank-like functions and others) > InferWindowGroupLimit causes bug if the other window functions haven't the > same window frame as the rank-like functions > --- > > Key: SPARK-45543 > URL: https://issues.apache.org/jira/browse/SPARK-45543 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 3.5.0 >Reporter: Ron Serruya >Assignee: Jiaan Geng >Priority: Critical > Labels: correctness, data-loss, pull-request-available > > First, it's my first bug, so I'm hoping I'm doing it right, also, as I'm not > very knowledgeable about spark internals, I hope I diagnosed the problem > correctly > I found the degradation in spark version 3.5.0: > When using multiple windows that share the same partition and ordering (but > with different "frame boundaries", where one window is a ranking function, > "WindowGroupLimit" is added to the plan causing wrong values to be created > from the other windows. > *This behavior didn't exist in versions 3.3 and 3.4.* > Example: > > {code:python} > import pysparkfrom pyspark.sql import functions as F, Window > df = spark.createDataFrame([ > {'row_id': 1, 'name': 'Dave', 'score': 1, 'year': 2020}, > {'row_id': 1, 'name': 'Dave', 'score': 2, 'year': 2022}, > {'row_id': 1, 'name': 'Dave', 'score': 3, 'year': 2023}, > {'row_id': 2, 'name': 'Amy', 'score': 6, 'year': 2021}, > ]) > # Create first window for row number > window_spec = Window.partitionBy('row_id', 'name').orderBy(F.desc('year')) > # Create additional window from the first window with unbounded frame > unbound_spec = window_spec.rowsBetween(Window.unboundedPreceding, > Window.unboundedFollowing) > # Try to keep the first row by year, and also collect all scores into a list > df2 = df.withColumn( > 'rn', > F.row_number().over(window_spec) > ).withColumn( > 'all_scores', > F.collect_list('score').over(unbound_spec) > ){code} > So far everything works, and if we display df2: > > {noformat} > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3, 2, 1] | > |Dave|1 |2|2022|2 |[3, 2, 1] | > |Dave|1 |1|2020|3 |[3, 2, 1] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+{noformat} > > However, once we filter to keep only the first row number: > > {noformat} > df2.filter("rn=1").show(truncate=False) > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+{noformat} > As you can see just filtering changed the "all_scores" array for Dave. > (This example uses `collect_list`, however, the same result happens with > other functions, such as max, min, count, etc) > > Now, if instead of using the two windows we used, I will use the first window > and a window with different ordering, or create a completely new window with > same partition but no ordering, it will work fine: > {code:python} > new_window = Window.partitionBy('row_id', > 'name').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) > df3 = df.withColumn( > 'rn', > F.row_number().over(window_spec) > ).withColumn( > 'all_scores', > F.collect_list('score').over(new_window) > ) > df3.filter("rn=1").show(truncate=False){code} > {noformat} > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3, 2, 1] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+ > {noformat} > In addition, if we use all 3 windows to create 3 different columns, it will > also work ok. So it seems the issue happens only when all the windows used > share the same partition and ordering. > Here is the final plan for the faulty dataframe: > {noformat} > df2.filter("rn=1").explain() > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Filter (rn#9 = 1) > +- Window [row_number() windowspecdefinition(row_id#1L, name#0, year#3L > DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), > currentrow$())) AS rn#9, collect_list(score#2L, 0, 0) > windowspec
[jira] [Updated] (SPARK-45543) InferWindowGroupLimit causes bug if the window frame is different between rank-like functions and others
[ https://issues.apache.org/jira/browse/SPARK-45543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45543: --- Summary: InferWindowGroupLimit causes bug if the window frame is different between rank-like functions and others (was: InferWindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions) > InferWindowGroupLimit causes bug if the window frame is different between > rank-like functions and others > > > Key: SPARK-45543 > URL: https://issues.apache.org/jira/browse/SPARK-45543 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core, SQL >Affects Versions: 3.5.0 >Reporter: Ron Serruya >Assignee: Jiaan Geng >Priority: Critical > Labels: correctness, data-loss, pull-request-available > > First, it's my first bug, so I'm hoping I'm doing it right, also, as I'm not > very knowledgeable about spark internals, I hope I diagnosed the problem > correctly > I found the degradation in spark version 3.5.0: > When using multiple windows that share the same partition and ordering (but > with different "frame boundaries", where one window is a ranking function, > "WindowGroupLimit" is added to the plan causing wrong values to be created > from the other windows. > *This behavior didn't exist in versions 3.3 and 3.4.* > Example: > > {code:python} > import pysparkfrom pyspark.sql import functions as F, Window > df = spark.createDataFrame([ > {'row_id': 1, 'name': 'Dave', 'score': 1, 'year': 2020}, > {'row_id': 1, 'name': 'Dave', 'score': 2, 'year': 2022}, > {'row_id': 1, 'name': 'Dave', 'score': 3, 'year': 2023}, > {'row_id': 2, 'name': 'Amy', 'score': 6, 'year': 2021}, > ]) > # Create first window for row number > window_spec = Window.partitionBy('row_id', 'name').orderBy(F.desc('year')) > # Create additional window from the first window with unbounded frame > unbound_spec = window_spec.rowsBetween(Window.unboundedPreceding, > Window.unboundedFollowing) > # Try to keep the first row by year, and also collect all scores into a list > df2 = df.withColumn( > 'rn', > F.row_number().over(window_spec) > ).withColumn( > 'all_scores', > F.collect_list('score').over(unbound_spec) > ){code} > So far everything works, and if we display df2: > > {noformat} > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3, 2, 1] | > |Dave|1 |2|2022|2 |[3, 2, 1] | > |Dave|1 |1|2020|3 |[3, 2, 1] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+{noformat} > > However, once we filter to keep only the first row number: > > {noformat} > df2.filter("rn=1").show(truncate=False) > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+{noformat} > As you can see just filtering changed the "all_scores" array for Dave. > (This example uses `collect_list`, however, the same result happens with > other functions, such as max, min, count, etc) > > Now, if instead of using the two windows we used, I will use the first window > and a window with different ordering, or create a completely new window with > same partition but no ordering, it will work fine: > {code:python} > new_window = Window.partitionBy('row_id', > 'name').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) > df3 = df.withColumn( > 'rn', > F.row_number().over(window_spec) > ).withColumn( > 'all_scores', > F.collect_list('score').over(new_window) > ) > df3.filter("rn=1").show(truncate=False){code} > {noformat} > ++--+-++---+--+ > |name|row_id|score|year|rn |all_scores| > ++--+-++---+--+ > |Dave|1 |3|2023|1 |[3, 2, 1] | > |Amy |2 |6|2021|1 |[6] | > ++--+-++---+--+ > {noformat} > In addition, if we use all 3 windows to create 3 different columns, it will > also work ok. So it seems the issue happens only when all the windows used > share the same partition and ordering. > Here is the final plan for the faulty dataframe: > {noformat} > df2.filter("rn=1").explain() > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Filter (rn#9 = 1) > +- Window [row_number() windowspecdefinition(row_id#1L, name#0, year#3L > DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), > currentrow$())) AS rn#9, collect_list(score#2L, 0, 0) > windowspecdefinition(row_id#1L, name#0,