[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16818 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r102115145 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- cc @hvanhovell and @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r101948606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r101451202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- cc @hvanhovell any ideas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r101187326 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- In fact, the type of `start` and `end` should not be `Long` here, but we can not change it for compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r101121242 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- yea, the doc is in `rangeBetween` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r101121167 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -180,16 +180,20 @@ class WindowSpec private[sql]( private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow - case Long.MinValue => UnboundedPreceding - case x if x < 0 => ValuePreceding(-start.toInt) - case x if x > 0 => ValueFollowing(start.toInt) + case x if x < Int.MinValue => UnboundedPreceding --- End diff -- shall we throw an exception if `x < Int.MinValue` and `x > Long.MinValue`? @hvanhovell what do you think? BTW I remember we have document to explain this behavior, we should update that too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r100539109 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/BoundOrdering.scala --- @@ -25,18 +25,22 @@ import org.apache.spark.sql.catalyst.expressions.Projection * Function for comparing boundary values. */ private[window] abstract class BoundOrdering { - def compare(inputRow: InternalRow, inputIndex: Int, outputRow: InternalRow, outputIndex: Int): Int + def compare( + inputRow: InternalRow, + inputIndex: Long, + outputRow: InternalRow, + outputIndex: Long): Long } /** * Compare the input index to the bound of the output index. */ -private[window] final case class RowBoundOrdering(offset: Int) extends BoundOrdering { +private[window] final case class RowBoundOrdering(offset: Long) extends BoundOrdering { --- End diff -- It does not make any sense to make the offsets longs. This is an execution detail, buffer indexes (which are integer bound), and you really should not be messing with those. Try to keep your change more local, and only modify `WindowExec.createBoundOrdering` and the code generating the `WindowExec.windowFrameExpressionFactoryPairs`. That should be enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r100049000 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/BoundOrdering.scala --- @@ -25,18 +25,22 @@ import org.apache.spark.sql.catalyst.expressions.Projection * Function for comparing boundary values. */ private[window] abstract class BoundOrdering { - def compare(inputRow: InternalRow, inputIndex: Int, outputRow: InternalRow, outputIndex: Int): Int + def compare( + inputRow: InternalRow, + inputIndex: Long, + outputRow: InternalRow, + outputIndex: Long): Long } /** * Compare the input index to the bound of the output index. */ -private[window] final case class RowBoundOrdering(offset: Int) extends BoundOrdering { +private[window] final case class RowBoundOrdering(offset: Long) extends BoundOrdering { --- End diff -- @hvanhovell `rowBuffer` is used to buffer all the rows of one partition. As you said, we only support 32-bit values (less than `(1 << 31) - 1`). But the row frame or range frame is just to restrict the range to fetch proper value from `rowBuffer`. This PR changes the type of offset from `Int` to `Long`. It will make the `rowBuffer` overflows. Besides, how to support a 64-bit buffer is another topic. Let me know if I understand the wrong. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r100046288 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/BoundOrdering.scala --- @@ -25,18 +25,22 @@ import org.apache.spark.sql.catalyst.expressions.Projection * Function for comparing boundary values. */ private[window] abstract class BoundOrdering { - def compare(inputRow: InternalRow, inputIndex: Int, outputRow: InternalRow, outputIndex: Int): Int + def compare( + inputRow: InternalRow, + inputIndex: Long, + outputRow: InternalRow, + outputIndex: Long): Long } /** * Compare the input index to the bound of the output index. */ -private[window] final case class RowBoundOrdering(offset: Int) extends BoundOrdering { +private[window] final case class RowBoundOrdering(offset: Long) extends BoundOrdering { --- End diff -- If we are going to support 64-bit values in a row frame then we also need to support a buffer that can store that many rows. `WindowExec` in its current form assumes that a buffer contains less than `(1 << 31) - 1` values (which is actually smaller than an 32-bit range can be). I have yet to see a use case where the buffer needs to be larger. The current PR does not make all the necessary changes to make `WindowExec` support a 64-bit buffer (see RowBuffer.size for instance), and I am slightly worried about overflows. It will also be a daunting task to test this properly (you will need to create a buffer with more than 2 billion elements). So I prefer to keep this change local to range frames only. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16818#discussion_r100043629 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/BoundOrdering.scala --- @@ -25,18 +25,22 @@ import org.apache.spark.sql.catalyst.expressions.Projection * Function for comparing boundary values. */ private[window] abstract class BoundOrdering { - def compare(inputRow: InternalRow, inputIndex: Int, outputRow: InternalRow, outputIndex: Int): Int + def compare( + inputRow: InternalRow, + inputIndex: Long, + outputRow: InternalRow, + outputIndex: Long): Long } /** * Compare the input index to the bound of the output index. */ -private[window] final case class RowBoundOrdering(offset: Int) extends BoundOrdering { +private[window] final case class RowBoundOrdering(offset: Long) extends BoundOrdering { --- End diff -- @hvanhovell do you mean this is unnecessary as we can only support int anyway? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16818 [SPARK-19451][SQL][Core] Underlying integer overflow in Window function ## What changes were proposed in this pull request? reproduce code: ``` val tw = Window.orderBy("date") .partitionBy("id") .rangeBetween( from , 0) ``` Everything seems ok, while from value is not too large... Even if the rangeBetween() method supports Long parameters. But If i set -216000L value to from it does not work ! It seems like there is an underlying integer overflow issue here, i.e. convert Long to Int: ``` private def between(typ: FrameType, start: Long, end: Long): WindowSpec = { val boundaryStart = start match { case 0 => CurrentRow case Long.MinValue => UnboundedPreceding case x if x < 0 => ValuePreceding(-start) case x if x > 0 => ValueFollowing(start) } val boundaryEnd = end match { case 0 => CurrentRow case Long.MaxValue => UnboundedFollowing case x if x < 0 => ValuePreceding(-end.toInt) case x if x > 0 => ValueFollowing(end.toInt) } new WindowSpec( partitionSpec, orderSpec, SpecifiedWindowFrame(typ, boundaryStart, boundaryEnd)) } ``` This pr changes the type of index from Int to Long. BTW: Is there any reason why the type of index is Int? I do not find any strong point to set like this. ## How was this patch tested? existing ut You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19451 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16818.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16818 commit ea1f44026dc9f5b0f8e660b5adf8824b8deb94df Author: uncleGenDate: 2017-02-06T01:36:29Z change the type of index from Int to Long --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org