Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-11 Thread via GitHub


beliefer commented on PR #43958:
URL: https://github.com/apache/spark/pull/43958#issuecomment-1849916583

   Merged to Master.
   @cloud-fan Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-11 Thread via GitHub


beliefer closed pull request #43958: [SPARK-45649][SQL] Unify the prepare 
framework for OffsetWindowFunctionFrame
URL: https://github.com/apache/spark/pull/43958


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-10 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421875728


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -230,7 +242,9 @@ class FrameLessOffsetWindowFunctionFrame(
 // 7. current row -> z, next selected row -> empty, output: null;
 // ... next selected row is empty, all following return null.
 (current: InternalRow) =>
-  if (nextSelectedRow == EmptyRow) {
+  if (absOffset > input.length) {

Review Comment:
   you are right. Shall we do it in the `def write` then?
   ```
   if (absOffset > input.length) {
 // do nothing
   } else {
 doWrite(current)
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-10 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421871101


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -230,7 +242,9 @@ class FrameLessOffsetWindowFunctionFrame(
 // 7. current row -> z, next selected row -> empty, output: null;
 // ... next selected row is empty, all following return null.
 (current: InternalRow) =>
-  if (nextSelectedRow == EmptyRow) {
+  if (absOffset > input.length) {

Review Comment:
   We can't do this. The `input` has not been initialized yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421686370


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -230,7 +242,9 @@ class FrameLessOffsetWindowFunctionFrame(
 // 7. current row -> z, next selected row -> empty, output: null;
 // ... next selected row is empty, all following return null.
 (current: InternalRow) =>
-  if (nextSelectedRow == EmptyRow) {
+  if (absOffset > input.length) {

Review Comment:
   let's not repeat this code in all branches, we can just have a new branch 
for this
   ```
   if (absOffset > input.length) {
 (current: InternalRow) => _
   } else if (ignoreNulls && offset > 0) {
 ...
   } ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421675231


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -176,6 +179,31 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+resetStates(rows)
+if (absOffset > rows.length) {

Review Comment:
   should do nothing.
   Let's optimize them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421674337


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (absOffset > input.length) {

Review Comment:
   Good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421504478


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (absOffset > input.length) {

Review Comment:
   should the entire write path do nothing if `absOffset > input.length`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421504291


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -176,6 +179,31 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+resetStates(rows)
+if (absOffset > rows.length) {

Review Comment:
   what does the write path may need to do if `absOffset > rows.length`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421503302


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (absOffset > input.length) {

Review Comment:
   Then let's restore this behavior



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421503302


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (absOffset > input.length) {

Review Comment:
   Then let's restore this behavior



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421502414


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -176,6 +179,31 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+resetStates(rows)
+if (absOffset > rows.length) {
+  fillDefaultValue(EmptyRow)
+} else {
+  if (ignoreNulls) {
+prepareForIgnoreNulls()
+  } else {
+prepareForRespectNulls()
+  }
+}
+  }
+
+  protected def prepareForIgnoreNulls(): Unit = findNextRowWithNonNullInput()
+
+  protected def prepareForRespectNulls(): Unit = {
+// drain the first few rows if offset is larger than one
+while (inputIndex < offset) {
+  nextSelectedRow = WindowFunctionFrame.getNextOrNull(inputIterator)
+  inputIndex += 1
+}
+// In order to match the correct index during the write phase even if the 
offset is negative.

Review Comment:
   ```
   // `inputIndex` starts as 0, but the `offset` can be negative and we may not 
enter the
   // while loop at all. We need to make sure `inputIndex` ends up as `offset` 
to meet the
   // assumption of the write path.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421502414


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -176,6 +179,31 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+resetStates(rows)
+if (absOffset > rows.length) {
+  fillDefaultValue(EmptyRow)
+} else {
+  if (ignoreNulls) {
+prepareForIgnoreNulls()
+  } else {
+prepareForRespectNulls()
+  }
+}
+  }
+
+  protected def prepareForIgnoreNulls(): Unit = findNextRowWithNonNullInput()
+
+  protected def prepareForRespectNulls(): Unit = {
+// drain the first few rows if offset is larger than one
+while (inputIndex < offset) {
+  nextSelectedRow = WindowFunctionFrame.getNextOrNull(inputIterator)
+  inputIndex += 1
+}
+// In order to match the correct index during the write phase even if the 
offset is negative.

Review Comment:
   Without any context, it seems reasonable to do `inputIndex = offset`, so 
that even if we do not enter the while loop, we still make sure `inputIndex` 
will end up as `offset `



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1405787265


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+resetStates(rows)

Review Comment:
   I improved the behavior. only `FrameLessOffsetWindowFunctionFrame` need 
`resetStates` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421411335


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -317,34 +342,24 @@ class UnboundedOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
   assert(offset > 0)
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-if (offset > rows.length) {
+  override def prepareForIgnoreNulls(): Unit = {
+super.prepareForIgnoreNulls()
+if (nextSelectedRow == EmptyRow) {
+  // Use default values since the offset row whose input value is not null 
does not exist.
   fillDefaultValue(EmptyRow)
 } else {
-  resetStates(rows)
-  if (ignoreNulls) {
-findNextRowWithNonNullInput()
-if (nextSelectedRow == EmptyRow) {
-  // Use default values since the offset row whose input value is not 
null does not exist.
-  fillDefaultValue(EmptyRow)
-} else {
-  projection(nextSelectedRow)
-}
-  } else {
-var selectedRow: UnsafeRow = null
-// drain the first few rows if offset is larger than one
-while (inputIndex < offset) {
-  selectedRow = WindowFunctionFrame.getNextOrNull(inputIterator)
-  inputIndex += 1
-}
-projection(selectedRow)
-  }
+  projection(nextSelectedRow)

Review Comment:
   Because this is unbounded frame. So the output for each row is the same.
   We can get the offset value of unbounded frame in read phase. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421410047


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (absOffset > input.length) {

Review Comment:
   This refactor need the change.
   Before this refactor, `inputIndex = offset` if `offset > rows.length`. 
   After this refactor, `inputIndex = 0` if `offset > rows.length`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-09 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421397347


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+  override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): 
Unit = {
+// During the write phase, always access the current window group. So need 
reset the states.

Review Comment:
   SGTM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-08 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421258729


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -317,34 +342,24 @@ class UnboundedOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
   assert(offset > 0)
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-if (offset > rows.length) {
+  override def prepareForIgnoreNulls(): Unit = {
+super.prepareForIgnoreNulls()
+if (nextSelectedRow == EmptyRow) {
+  // Use default values since the offset row whose input value is not null 
does not exist.
   fillDefaultValue(EmptyRow)
 } else {
-  resetStates(rows)
-  if (ignoreNulls) {
-findNextRowWithNonNullInput()
-if (nextSelectedRow == EmptyRow) {
-  // Use default values since the offset row whose input value is not 
null does not exist.
-  fillDefaultValue(EmptyRow)
-} else {
-  projection(nextSelectedRow)
-}
-  } else {
-var selectedRow: UnsafeRow = null
-// drain the first few rows if offset is larger than one
-while (inputIndex < offset) {
-  selectedRow = WindowFunctionFrame.getNextOrNull(inputIterator)
-  inputIndex += 1
-}
-projection(selectedRow)
-  }
+  projection(nextSelectedRow)

Review Comment:
   not related to this PR, but let's fully understand the code before 
refactoring. Why does this window frame need to do `projection` at the read 
path? What's special about this window frame? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-08 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421255750


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (absOffset > input.length) {

Review Comment:
   Let's avoid changing the write path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-08 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421255704


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+  override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): 
Unit = {
+// During the write phase, always access the current window group. So need 
reset the states.
 resetStates(rows)
-if (ignoreNulls) {
-  if (Math.abs(offset) > rows.length) {
-fillDefaultValue(EmptyRow)
-  } else {
-findNextRowWithNonNullInput()
-  }
-} else {
-  // drain the first few rows if offset is larger than zero
-  while (inputIndex < offset) {
-if (inputIterator.hasNext) inputIterator.next()
-inputIndex += 1
-  }
-  inputIndex = offset
-}
+super.prepareForDefaultValue(rows)
+  }
+
+  override def prepareForRespectNulls(): Unit = {
+super.prepareForRespectNulls()
+// In order to match the correct index during the write phase even if the 
offset is lager than
+// the window group size or is negative.

Review Comment:
   ditto, why can't we always do it? IMO it's more overhead to add these 
abstractions for such subtle perf difference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-08 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1421255497


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+  override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): 
Unit = {
+// During the write phase, always access the current window group. So need 
reset the states.

Review Comment:
   `resetStates` is very cheap, why not we always call it at the beginning? `if 
(absOffset > rows.length)` is not a common case and perf doesn't matter there.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+  override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): 
Unit = {
+// During the write phase, always access the current window group. So need 
reset the states.

Review Comment:
   `resetStates` is very cheap, why not we always call it at the beginning? `if 
(absOffset > rows.length)` is not a common case and perf doesn't matter too 
much there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1413464275


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -317,32 +339,17 @@ class UnboundedOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
   assert(offset > 0)
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-if (offset > rows.length) {
-  fillDefaultValue(EmptyRow)
-} else {
-  resetStates(rows)
-  if (ignoreNulls) {
-findNextRowWithNonNullInput()
-if (nextSelectedRow == EmptyRow) {
-  // Use default values since the offset row whose input value is not 
null does not exist.
-  fillDefaultValue(EmptyRow)

Review Comment:
   After `findNextRowWithNonNullInput`, `nextSelectedRow` must be one of 
`EmptyRow` and others.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1413423471


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -317,32 +339,17 @@ class UnboundedOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
   assert(offset > 0)
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-if (offset > rows.length) {
-  fillDefaultValue(EmptyRow)
-} else {
-  resetStates(rows)
-  if (ignoreNulls) {
-findNextRowWithNonNullInput()
-if (nextSelectedRow == EmptyRow) {
-  // Use default values since the offset row whose input value is not 
null does not exist.
-  fillDefaultValue(EmptyRow)

Review Comment:
   This code logic is gone after refactor. Is it safe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1413422198


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +225,16 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+  override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): 
Unit = {
 resetStates(rows)
-if (ignoreNulls) {
-  if (Math.abs(offset) > rows.length) {
-fillDefaultValue(EmptyRow)
-  } else {
-findNextRowWithNonNullInput()
-  }
-} else {
-  // drain the first few rows if offset is larger than zero
-  while (inputIndex < offset) {
-if (inputIterator.hasNext) inputIterator.next()
-inputIndex += 1
-  }
-  inputIndex = offset
-}
+super.prepareForDefaultValue(rows)
+  }
+
+  override def prepareForRespectNulls(): Unit = {
+super.prepareForRespectNulls()
+inputIndex = offset

Review Comment:
   same here, we should add comments to explain the reason of override.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1413421839


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +225,16 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+  override def prepareForDefaultValue(rows: ExternalAppendOnlyUnsafeRowArray): 
Unit = {
 resetStates(rows)

Review Comment:
   can we add comments to explain why this window frame need to reset states in 
`prepareForDefaultValue`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub


beliefer commented on PR #43958:
URL: https://github.com/apache/spark/pull/43958#issuecomment-1837895892

   In fact, not only can it reduce code, but it can also reuse code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-01 Thread via GitHub


beliefer commented on PR #43958:
URL: https://github.com/apache/spark/pull/43958#issuecomment-1836985636

   > I don't see much value of this refactor. It doesn't reduce the code size 
much, and doesn't make the code more readable to me.
   
   OK. This PR also suppose to add two test cases. We missing the bug at origin 
refactor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-01 Thread via GitHub


cloud-fan commented on PR #43958:
URL: https://github.com/apache/spark/pull/43958#issuecomment-1836129316

   I don't see much value of this refactor. It doesn't reduce the code size 
much, and doesn't make the code more readable to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-27 Thread via GitHub


beliefer commented on PR #43958:
URL: https://github.com/apache/spark/pull/43958#issuecomment-1828916353

   The GA failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-27 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1405787265


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+resetStates(rows)

Review Comment:
   I improved the behavior. only `FrameLessOffsetWindowFunctionFrame` need 
`resetStates` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-26 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1405543679


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-resetStates(rows)
-if (ignoreNulls) {
-  if (Math.abs(offset) > rows.length) {
-fillDefaultValue(EmptyRow)
-  } else {
-findNextRowWithNonNullInput()
-  }
-} else {
-  // drain the first few rows if offset is larger than zero
-  while (inputIndex < offset) {

Review Comment:
   It's true. Before the refactor, we don't call `fillDefaultValue` if 
`ignoreNulls` is false.
   Then we increase the `inputIndex`.
   If offset is negative, so the `inputIndex` is negative too. We will call 
`fillDefaultValue` at the write phase.
   If offset is positive and greater than or equals to `rows.length`, 
`inputIndex` is the same as offset value. We will call `fillDefaultValue` at 
the write phase too.
   Otherwise, `inputIndex` is in range [0, offset). We could iterator 
`inputIterator` safely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-26 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1405543679


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-resetStates(rows)
-if (ignoreNulls) {
-  if (Math.abs(offset) > rows.length) {
-fillDefaultValue(EmptyRow)
-  } else {
-findNextRowWithNonNullInput()
-  }
-} else {
-  // drain the first few rows if offset is larger than zero
-  while (inputIndex < offset) {

Review Comment:
   It's true. Before the refactor, we don't call `fillDefaultValue` if 
`ignoreNulls` is false.
   Then we increase the `inputIndex`.
   If offset is negative, so the `inputIndex` is negative too. We will call 
`fillDefaultValue` at the write phase.
   If offset is positive and greater than or equals to `rows.length`, 
`inputIndex` is the same as offset value. We will `fillDefaultValue` at the 
write phase too.
   Otherwise, `inputIndex` is in range [0, offset). We could iterator 
inputIterator safely.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-resetStates(rows)
-if (ignoreNulls) {
-  if (Math.abs(offset) > rows.length) {
-fillDefaultValue(EmptyRow)
-  } else {
-findNextRowWithNonNullInput()
-  }
-} else {
-  // drain the first few rows if offset is larger than zero
-  while (inputIndex < offset) {

Review Comment:
   It's true. Before the refactor, we don't call `fillDefaultValue` if 
`ignoreNulls` is false.
   Then we increase the `inputIndex`.
   If offset is negative, so the `inputIndex` is negative too. We will call 
`fillDefaultValue` at the write phase.
   If offset is positive and greater than or equals to `rows.length`, 
`inputIndex` is the same as offset value. We will `fillDefaultValue` at the 
write phase too.
   Otherwise, `inputIndex` is in range [0, offset). We could iterator 
`inputIterator` safely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-26 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1405407470


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-resetStates(rows)
-if (ignoreNulls) {
-  if (Math.abs(offset) > rows.length) {
-fillDefaultValue(EmptyRow)
-  } else {
-findNextRowWithNonNullInput()
-  }
-} else {
-  // drain the first few rows if offset is larger than zero
-  while (inputIndex < offset) {

Review Comment:
   I'll review super carefully this time. So before the refactor, we don't call 
`fillDefaultValue` if `ignoreNulls` is false. Can you explain why your refactor 
is safe to change it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-26 Thread via GitHub


beliefer commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1405397072


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+resetStates(rows)

Review Comment:
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-25 Thread via GitHub


cloud-fan commented on code in PR #43958:
URL: https://github.com/apache/spark/pull/43958#discussion_r1405122383


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+resetStates(rows)

Review Comment:
   so the fix is always call `resetStates`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-24 Thread via GitHub


beliefer commented on PR #43958:
URL: https://github.com/apache/spark/pull/43958#issuecomment-1825562137

   > Can you explain what gets fixed in this new attempt?
   
   I added some description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-23 Thread via GitHub


cloud-fan commented on PR #43958:
URL: https://github.com/apache/spark/pull/43958#issuecomment-1824445121

   Can you explain what gets fixed in this new attempt?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-22 Thread via GitHub


beliefer opened a new pull request, #43958:
URL: https://github.com/apache/spark/pull/43958

   ### What changes were proposed in this pull request?
   Currently, the implementation of the `prepare` of all the 
`OffsetWindowFunctionFrame` have the same code logic show below.
   ```
 override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
   resetStates(rows)
   if (offset > rows.length) {
 fillDefaultValue(EmptyRow)
   } else {
 if (ignoreNulls) {
   ...
 } else {
   ...
 }
   }
 }
   ```
   This PR want unify the prepare framework for `OffsetWindowFunctionFrame`
   
   ### Why are the changes needed?
   Unify the prepare framework for `OffsetWindowFunctionFrame`
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Inner update.
   
   
   ### How was this patch tested?
   Exists test cases.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-11-21 Thread via GitHub


cloud-fan commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1400162514


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -196,24 +224,15 @@ class FrameLessOffsetWindowFunctionFrame(
 offset: Int,
 ignoreNulls: Boolean = false)
   extends OffsetWindowFunctionFrameBase(
-target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
+target, ordinal, expressions, inputSchema, newMutableProjection, offset, 
ignoreNulls) {
 
-  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
-resetStates(rows)
-if (ignoreNulls) {
-  if (Math.abs(offset) > rows.length) {
-fillDefaultValue(EmptyRow)

Review Comment:
   This PR is not a pure refactor. Some logic is changed. e.g. before this PR, 
`fillDefaultValue(EmptyRow)` is never called if `ignoreNulls == false`. We need 
to understand the code better before making these changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-11-20 Thread via GitHub


cloud-fan commented on PR #43507:
URL: https://github.com/apache/spark/pull/43507#issuecomment-1820405407

   This has been reverted. See the reason in 
https://issues.apache.org/jira/browse/SPARK-45649


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-28 Thread via GitHub


beliefer commented on PR #43507:
URL: https://github.com/apache/spark/pull/43507#issuecomment-1783758706

   @cloud-fan Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-27 Thread via GitHub


cloud-fan closed pull request #43507: [SPARK-45649][SQL] Unify the prepare 
framework for `OffsetWindowFunctionFrame`
URL: https://github.com/apache/spark/pull/43507


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-27 Thread via GitHub


cloud-fan commented on PR #43507:
URL: https://github.com/apache/spark/pull/43507#issuecomment-1782942277

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-27 Thread via GitHub


beliefer commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1374123448


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {

Review Comment:
   Got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-26 Thread via GitHub


cloud-fan commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1374095537


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (inputIndex >= 0 && input != null && inputIndex < input.length) {

Review Comment:
   I feel the code is a bit subtle now. In the branch above here, there is no 
null check: `while (nextSelectedRow == EmptyRow && inputIndex < input.length) {`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-26 Thread via GitHub


cloud-fan commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1374093866


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {

Review Comment:
   shall we overwrite `doWrite` do skip if `input == null`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-26 Thread via GitHub


beliefer commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1374004543


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (inputIndex >= 0 && input != null && inputIndex < input.length) {

Review Comment:
   Because the origin code always call `resetStates`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-26 Thread via GitHub


cloud-fan commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1373100286


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (inputIndex >= 0 && input != null && inputIndex < input.length) {

Review Comment:
   Why we don't need the null check before? where did we instantiate `input`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-26 Thread via GitHub


beliefer commented on PR #43507:
URL: https://github.com/apache/spark/pull/43507#issuecomment-1780978940

   The GA failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-26 Thread via GitHub


beliefer commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1372717736


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (inputIndex >= 0 && input != null && inputIndex < input.length) {

Review Comment:
   If `Math.abs(offset) > rows.length` is true, no need to assign a value to 
`input`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-26 Thread via GitHub


cloud-fan commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1372650267


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -284,7 +293,7 @@ class FrameLessOffsetWindowFunctionFrame(
   }
   } else {
 (current: InternalRow) =>
-  if (inputIndex >= 0 && inputIndex < input.length) {
+  if (inputIndex >= 0 && input != null && inputIndex < input.length) {

Review Comment:
   why this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-26 Thread via GitHub


cloud-fan commented on code in PR #43507:
URL: https://github.com/apache/spark/pull/43507#discussion_r1372646818


##
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala:
##
@@ -175,6 +176,23 @@ abstract class OffsetWindowFunctionFrameBase(
 }
   }
 
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+if (offset > rows.length) {

Review Comment:
   shall we use `Math.abs(offset)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-25 Thread via GitHub


beliefer commented on PR #43507:
URL: https://github.com/apache/spark/pull/43507#issuecomment-1778923842

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-25 Thread via GitHub


beliefer commented on PR #43507:
URL: https://github.com/apache/spark/pull/43507#issuecomment-1778721347

   The GA failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-45649][SQL] Unify the prepare framework for `OffsetWindowFunctionFrame` [spark]

2023-10-24 Thread via GitHub


beliefer opened a new pull request, #43507:
URL: https://github.com/apache/spark/pull/43507

   ### What changes were proposed in this pull request?
   Currently, the implementation of the `prepare` of all the 
`OffsetWindowFunctionFrame` have the same code logic show below.
   ```
 override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
   if (offset > rows.length) {
 fillDefaultValue(EmptyRow)
   } else {
 resetStates(rows)
 if (ignoreNulls) {
   ...
 } else {
   ...
 }
   }
 }
   ```
   This PR want unify the prepare framework for `OffsetWindowFunctionFrame`
   
   ### Why are the changes needed?
   Unify the prepare framework for `OffsetWindowFunctionFrame`
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Inner update.
   
   
   ### How was this patch tested?
   Exists test cases.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org