Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on PR #43958: URL: https://github.com/apache/spark/pull/43958#issuecomment-1849916583 Merged to Master. @cloud-fan Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer closed pull request #43958: [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame URL: https://github.com/apache/spark/pull/43958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421875728 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -230,7 +242,9 @@ class FrameLessOffsetWindowFunctionFrame( // 7. current row -> z, next selected row -> empty, output: null; // ... next selected row is empty, all following return null. (current: InternalRow) => - if (nextSelectedRow == EmptyRow) { + if (absOffset > input.length) { Review Comment: you are right. Shall we do it in the `def write` then? ``` if (absOffset > input.length) { // do nothing } else { doWrite(current) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421871101 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -230,7 +242,9 @@ class FrameLessOffsetWindowFunctionFrame( // 7. current row -> z, next selected row -> empty, output: null; // ... next selected row is empty, all following return null. (current: InternalRow) => - if (nextSelectedRow == EmptyRow) { + if (absOffset > input.length) { Review Comment: We can't do this. The `input` has not been initialized yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421686370 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -230,7 +242,9 @@ class FrameLessOffsetWindowFunctionFrame( // 7. current row -> z, next selected row -> empty, output: null; // ... next selected row is empty, all following return null. (current: InternalRow) => - if (nextSelectedRow == EmptyRow) { + if (absOffset > input.length) { Review Comment: let's not repeat this code in all branches, we can just have a new branch for this ``` if (absOffset > input.length) { (current: InternalRow) => _ } else if (ignoreNulls && offset > 0) { ... } ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421675231 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -176,6 +179,31 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) +if (absOffset > rows.length) { Review Comment: should do nothing. Let's optimize them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421674337 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (absOffset > input.length) { Review Comment: Good idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421504478 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (absOffset > input.length) { Review Comment: should the entire write path do nothing if `absOffset > input.length`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421504291 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -176,6 +179,31 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) +if (absOffset > rows.length) { Review Comment: what does the write path may need to do if `absOffset > rows.length`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421503302 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (absOffset > input.length) { Review Comment: Then let's restore this behavior -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421503302 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (absOffset > input.length) { Review Comment: Then let's restore this behavior -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421502414 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -176,6 +179,31 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) +if (absOffset > rows.length) { + fillDefaultValue(EmptyRow) +} else { + if (ignoreNulls) { +prepareForIgnoreNulls() + } else { +prepareForRespectNulls() + } +} + } + + protected def prepareForIgnoreNulls(): Unit = findNextRowWithNonNullInput() + + protected def prepareForRespectNulls(): Unit = { +// drain the first few rows if offset is larger than one +while (inputIndex < offset) { + nextSelectedRow = WindowFunctionFrame.getNextOrNull(inputIterator) + inputIndex += 1 +} +// In order to match the correct index during the write phase even if the offset is negative. Review Comment: ``` // `inputIndex` starts as 0, but the `offset` can be negative and we may not enter the // while loop at all. We need to make sure `inputIndex` ends up as `offset` to meet the // assumption of the write path. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421502414 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -176,6 +179,31 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) +if (absOffset > rows.length) { + fillDefaultValue(EmptyRow) +} else { + if (ignoreNulls) { +prepareForIgnoreNulls() + } else { +prepareForRespectNulls() + } +} + } + + protected def prepareForIgnoreNulls(): Unit = findNextRowWithNonNullInput() + + protected def prepareForRespectNulls(): Unit = { +// drain the first few rows if offset is larger than one +while (inputIndex < offset) { + nextSelectedRow = WindowFunctionFrame.getNextOrNull(inputIterator) + inputIndex += 1 +} +// In order to match the correct index during the write phase even if the offset is negative. Review Comment: Without any context, it seems reasonable to do `inputIndex = offset`, so that even if we do not enter the while loop, we still make sure `inputIndex` will end up as `offset ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405787265 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) Review Comment: I improved the behavior. only `FrameLessOffsetWindowFunctionFrame` need `resetStates` here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421411335 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -317,34 +342,24 @@ class UnboundedOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { assert(offset > 0) - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -if (offset > rows.length) { + override def prepareForIgnoreNulls(): Unit = { +super.prepareForIgnoreNulls() +if (nextSelectedRow == EmptyRow) { + // Use default values since the offset row whose input value is not null does not exist. fillDefaultValue(EmptyRow) } else { - resetStates(rows) - if (ignoreNulls) { -findNextRowWithNonNullInput() -if (nextSelectedRow == EmptyRow) { - // Use default values since the offset row whose input value is not null does not exist. - fillDefaultValue(EmptyRow) -} else { - projection(nextSelectedRow) -} - } else { -var selectedRow: UnsafeRow = null -// drain the first few rows if offset is larger than one -while (inputIndex < offset) { - selectedRow = WindowFunctionFrame.getNextOrNull(inputIterator) - inputIndex += 1 -} -projection(selectedRow) - } + projection(nextSelectedRow) Review Comment: Because this is unbounded frame. So the output for each row is the same. We can get the offset value of unbounded frame in read phase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421410047 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (absOffset > input.length) { Review Comment: This refactor need the change. Before this refactor, `inputIndex = offset` if `offset > rows.length`. After this refactor, `inputIndex = 0` if `offset > rows.length`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421397347 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { + override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +// During the write phase, always access the current window group. So need reset the states. Review Comment: SGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421258729 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -317,34 +342,24 @@ class UnboundedOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { assert(offset > 0) - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -if (offset > rows.length) { + override def prepareForIgnoreNulls(): Unit = { +super.prepareForIgnoreNulls() +if (nextSelectedRow == EmptyRow) { + // Use default values since the offset row whose input value is not null does not exist. fillDefaultValue(EmptyRow) } else { - resetStates(rows) - if (ignoreNulls) { -findNextRowWithNonNullInput() -if (nextSelectedRow == EmptyRow) { - // Use default values since the offset row whose input value is not null does not exist. - fillDefaultValue(EmptyRow) -} else { - projection(nextSelectedRow) -} - } else { -var selectedRow: UnsafeRow = null -// drain the first few rows if offset is larger than one -while (inputIndex < offset) { - selectedRow = WindowFunctionFrame.getNextOrNull(inputIterator) - inputIndex += 1 -} -projection(selectedRow) - } + projection(nextSelectedRow) Review Comment: not related to this PR, but let's fully understand the code before refactoring. Why does this window frame need to do `projection` at the read path? What's special about this window frame? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421255750 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (absOffset > input.length) { Review Comment: Let's avoid changing the write path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421255704 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { + override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +// During the write phase, always access the current window group. So need reset the states. resetStates(rows) -if (ignoreNulls) { - if (Math.abs(offset) > rows.length) { -fillDefaultValue(EmptyRow) - } else { -findNextRowWithNonNullInput() - } -} else { - // drain the first few rows if offset is larger than zero - while (inputIndex < offset) { -if (inputIterator.hasNext) inputIterator.next() -inputIndex += 1 - } - inputIndex = offset -} +super.prepareForDefaultValue(rows) + } + + override def prepareForRespectNulls(): Unit = { +super.prepareForRespectNulls() +// In order to match the correct index during the write phase even if the offset is lager than +// the window group size or is negative. Review Comment: ditto, why can't we always do it? IMO it's more overhead to add these abstractions for such subtle perf difference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421255497 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { + override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +// During the write phase, always access the current window group. So need reset the states. Review Comment: `resetStates` is very cheap, why not we always call it at the beginning? `if (absOffset > rows.length)` is not a common case and perf doesn't matter there. ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { + override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +// During the write phase, always access the current window group. So need reset the states. Review Comment: `resetStates` is very cheap, why not we always call it at the beginning? `if (absOffset > rows.length)` is not a common case and perf doesn't matter too much there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1413464275 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -317,32 +339,17 @@ class UnboundedOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { assert(offset > 0) - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -if (offset > rows.length) { - fillDefaultValue(EmptyRow) -} else { - resetStates(rows) - if (ignoreNulls) { -findNextRowWithNonNullInput() -if (nextSelectedRow == EmptyRow) { - // Use default values since the offset row whose input value is not null does not exist. - fillDefaultValue(EmptyRow) Review Comment: After `findNextRowWithNonNullInput`, `nextSelectedRow` must be one of `EmptyRow` and others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1413423471 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -317,32 +339,17 @@ class UnboundedOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { assert(offset > 0) - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -if (offset > rows.length) { - fillDefaultValue(EmptyRow) -} else { - resetStates(rows) - if (ignoreNulls) { -findNextRowWithNonNullInput() -if (nextSelectedRow == EmptyRow) { - // Use default values since the offset row whose input value is not null does not exist. - fillDefaultValue(EmptyRow) Review Comment: This code logic is gone after refactor. Is it safe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1413422198 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,16 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { + override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { resetStates(rows) -if (ignoreNulls) { - if (Math.abs(offset) > rows.length) { -fillDefaultValue(EmptyRow) - } else { -findNextRowWithNonNullInput() - } -} else { - // drain the first few rows if offset is larger than zero - while (inputIndex < offset) { -if (inputIterator.hasNext) inputIterator.next() -inputIndex += 1 - } - inputIndex = offset -} +super.prepareForDefaultValue(rows) + } + + override def prepareForRespectNulls(): Unit = { +super.prepareForRespectNulls() +inputIndex = offset Review Comment: same here, we should add comments to explain the reason of override. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1413421839 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,16 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { + override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { resetStates(rows) Review Comment: can we add comments to explain why this window frame need to reset states in `prepareForDefaultValue`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on PR #43958: URL: https://github.com/apache/spark/pull/43958#issuecomment-1837895892 In fact, not only can it reduce code, but it can also reuse code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on PR #43958: URL: https://github.com/apache/spark/pull/43958#issuecomment-1836985636 > I don't see much value of this refactor. It doesn't reduce the code size much, and doesn't make the code more readable to me. OK. This PR also suppose to add two test cases. We missing the bug at origin refactor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on PR #43958: URL: https://github.com/apache/spark/pull/43958#issuecomment-1836129316 I don't see much value of this refactor. It doesn't reduce the code size much, and doesn't make the code more readable to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on PR #43958: URL: https://github.com/apache/spark/pull/43958#issuecomment-1828916353 The GA failure is unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405787265 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) Review Comment: I improved the behavior. only `FrameLessOffsetWindowFunctionFrame` need `resetStates` here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405543679 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -resetStates(rows) -if (ignoreNulls) { - if (Math.abs(offset) > rows.length) { -fillDefaultValue(EmptyRow) - } else { -findNextRowWithNonNullInput() - } -} else { - // drain the first few rows if offset is larger than zero - while (inputIndex < offset) { Review Comment: It's true. Before the refactor, we don't call `fillDefaultValue` if `ignoreNulls` is false. Then we increase the `inputIndex`. If offset is negative, so the `inputIndex` is negative too. We will call `fillDefaultValue` at the write phase. If offset is positive and greater than or equals to `rows.length`, `inputIndex` is the same as offset value. We will call `fillDefaultValue` at the write phase too. Otherwise, `inputIndex` is in range [0, offset). We could iterator `inputIterator` safely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405543679 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -resetStates(rows) -if (ignoreNulls) { - if (Math.abs(offset) > rows.length) { -fillDefaultValue(EmptyRow) - } else { -findNextRowWithNonNullInput() - } -} else { - // drain the first few rows if offset is larger than zero - while (inputIndex < offset) { Review Comment: It's true. Before the refactor, we don't call `fillDefaultValue` if `ignoreNulls` is false. Then we increase the `inputIndex`. If offset is negative, so the `inputIndex` is negative too. We will call `fillDefaultValue` at the write phase. If offset is positive and greater than or equals to `rows.length`, `inputIndex` is the same as offset value. We will `fillDefaultValue` at the write phase too. Otherwise, `inputIndex` is in range [0, offset). We could iterator inputIterator safely. ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -resetStates(rows) -if (ignoreNulls) { - if (Math.abs(offset) > rows.length) { -fillDefaultValue(EmptyRow) - } else { -findNextRowWithNonNullInput() - } -} else { - // drain the first few rows if offset is larger than zero - while (inputIndex < offset) { Review Comment: It's true. Before the refactor, we don't call `fillDefaultValue` if `ignoreNulls` is false. Then we increase the `inputIndex`. If offset is negative, so the `inputIndex` is negative too. We will call `fillDefaultValue` at the write phase. If offset is positive and greater than or equals to `rows.length`, `inputIndex` is the same as offset value. We will `fillDefaultValue` at the write phase too. Otherwise, `inputIndex` is in range [0, offset). We could iterator `inputIterator` safely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405407470 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -resetStates(rows) -if (ignoreNulls) { - if (Math.abs(offset) > rows.length) { -fillDefaultValue(EmptyRow) - } else { -findNextRowWithNonNullInput() - } -} else { - // drain the first few rows if offset is larger than zero - while (inputIndex < offset) { Review Comment: I'll review super carefully this time. So before the refactor, we don't call `fillDefaultValue` if `ignoreNulls` is false. Can you explain why your refactor is safe to change it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405397072 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) Review Comment: Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405122383 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +resetStates(rows) Review Comment: so the fix is always call `resetStates`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer commented on PR #43958: URL: https://github.com/apache/spark/pull/43958#issuecomment-1825562137 > Can you explain what gets fixed in this new attempt? I added some description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
cloud-fan commented on PR #43958: URL: https://github.com/apache/spark/pull/43958#issuecomment-1824445121 Can you explain what gets fixed in this new attempt? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]
beliefer opened a new pull request, #43958: URL: https://github.com/apache/spark/pull/43958 ### What changes were proposed in this pull request? Currently, the implementation of the `prepare` of all the `OffsetWindowFunctionFrame` have the same code logic show below. ``` override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { resetStates(rows) if (offset > rows.length) { fillDefaultValue(EmptyRow) } else { if (ignoreNulls) { ... } else { ... } } } ``` This PR want unify the prepare framework for `OffsetWindowFunctionFrame` ### Why are the changes needed? Unify the prepare framework for `OffsetWindowFunctionFrame` ### Does this PR introduce _any_ user-facing change? 'No'. Inner update. ### How was this patch tested? Exists test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1400162514 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +224,15 @@ class FrameLessOffsetWindowFunctionFrame( offset: Int, ignoreNulls: Boolean = false) extends OffsetWindowFunctionFrameBase( -target, ordinal, expressions, inputSchema, newMutableProjection, offset) { +target, ordinal, expressions, inputSchema, newMutableProjection, offset, ignoreNulls) { - override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { -resetStates(rows) -if (ignoreNulls) { - if (Math.abs(offset) > rows.length) { -fillDefaultValue(EmptyRow) Review Comment: This PR is not a pure refactor. Some logic is changed. e.g. before this PR, `fillDefaultValue(EmptyRow)` is never called if `ignoreNulls == false`. We need to understand the code better before making these changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan commented on PR #43507: URL: https://github.com/apache/spark/pull/43507#issuecomment-1820405407 This has been reverted. See the reason in https://issues.apache.org/jira/browse/SPARK-45649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
beliefer commented on PR #43507: URL: https://github.com/apache/spark/pull/43507#issuecomment-1783758706 @cloud-fan Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan closed pull request #43507: [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` URL: https://github.com/apache/spark/pull/43507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan commented on PR #43507: URL: https://github.com/apache/spark/pull/43507#issuecomment-1782942277 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
beliefer commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1374123448 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { Review Comment: Got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1374095537 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (inputIndex >= 0 && input != null && inputIndex < input.length) { Review Comment: I feel the code is a bit subtle now. In the branch above here, there is no null check: `while (nextSelectedRow == EmptyRow && inputIndex < input.length) {` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1374093866 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { Review Comment: shall we overwrite `doWrite` do skip if `input == null`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
beliefer commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1374004543 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (inputIndex >= 0 && input != null && inputIndex < input.length) { Review Comment: Because the origin code always call `resetStates`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1373100286 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (inputIndex >= 0 && input != null && inputIndex < input.length) { Review Comment: Why we don't need the null check before? where did we instantiate `input`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
beliefer commented on PR #43507: URL: https://github.com/apache/spark/pull/43507#issuecomment-1780978940 The GA failure is unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
beliefer commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1372717736 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (inputIndex >= 0 && input != null && inputIndex < input.length) { Review Comment: If `Math.abs(offset) > rows.length` is true, no need to assign a value to `input`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1372650267 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame( } } else { (current: InternalRow) => - if (inputIndex >= 0 && inputIndex < input.length) { + if (inputIndex >= 0 && input != null && inputIndex < input.length) { Review Comment: why this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
cloud-fan commented on code in PR #43507: URL: https://github.com/apache/spark/pull/43507#discussion_r1372646818 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +176,23 @@ abstract class OffsetWindowFunctionFrameBase( } } + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +if (offset > rows.length) { Review Comment: shall we use `Math.abs(offset)`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
beliefer commented on PR #43507: URL: https://github.com/apache/spark/pull/43507#issuecomment-1778923842 ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
beliefer commented on PR #43507: URL: https://github.com/apache/spark/pull/43507#issuecomment-1778721347 The GA failure is unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]
beliefer opened a new pull request, #43507: URL: https://github.com/apache/spark/pull/43507 ### What changes were proposed in this pull request? Currently, the implementation of the `prepare` of all the `OffsetWindowFunctionFrame` have the same code logic show below. ``` override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { if (offset > rows.length) { fillDefaultValue(EmptyRow) } else { resetStates(rows) if (ignoreNulls) { ... } else { ... } } } ``` This PR want unify the prepare framework for `OffsetWindowFunctionFrame` ### Why are the changes needed? Unify the prepare framework for `OffsetWindowFunctionFrame` ### Does this PR introduce _any_ user-facing change? 'No'. Inner update. ### How was this patch tested? Exists test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org