[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-20 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r102115145
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

cc @hvanhovell and @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-19 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r101948606
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r101451202
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

cc @hvanhovell any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-14 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r101187326
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

In fact, the type of `start` and `end` should not be `Long` here, but we 
can not change it for compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r101121242
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

yea, the doc is in `rangeBetween`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r101121167
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

shall we throw an exception if `x < Int.MinValue` and `x > Long.MinValue`? 
@hvanhovell  what do you think?

BTW I remember we have document to explain this behavior, we should update 
that too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r100539109
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/BoundOrdering.scala
 ---
@@ -25,18 +25,22 @@ import 
org.apache.spark.sql.catalyst.expressions.Projection
  * Function for comparing boundary values.
  */
 private[window] abstract class BoundOrdering {
-  def compare(inputRow: InternalRow, inputIndex: Int, outputRow: 
InternalRow, outputIndex: Int): Int
+  def compare(
+  inputRow: InternalRow,
+  inputIndex: Long,
+  outputRow: InternalRow,
+  outputIndex: Long): Long
 }
 
 /**
  * Compare the input index to the bound of the output index.
  */
-private[window] final case class RowBoundOrdering(offset: Int) extends 
BoundOrdering {
+private[window] final case class RowBoundOrdering(offset: Long) extends 
BoundOrdering {
--- End diff --

It does not make any sense to make the offsets longs. This is an execution 
detail, buffer indexes (which are integer bound), and you really should not be 
messing with those.

Try to keep your change more local, and only modify 
`WindowExec.createBoundOrdering` and the code generating the 
`WindowExec.windowFrameExpressionFactoryPairs`. That should be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-08 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r100049000
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/BoundOrdering.scala
 ---
@@ -25,18 +25,22 @@ import 
org.apache.spark.sql.catalyst.expressions.Projection
  * Function for comparing boundary values.
  */
 private[window] abstract class BoundOrdering {
-  def compare(inputRow: InternalRow, inputIndex: Int, outputRow: 
InternalRow, outputIndex: Int): Int
+  def compare(
+  inputRow: InternalRow,
+  inputIndex: Long,
+  outputRow: InternalRow,
+  outputIndex: Long): Long
 }
 
 /**
  * Compare the input index to the bound of the output index.
  */
-private[window] final case class RowBoundOrdering(offset: Int) extends 
BoundOrdering {
+private[window] final case class RowBoundOrdering(offset: Long) extends 
BoundOrdering {
--- End diff --

@hvanhovell `rowBuffer` is used to buffer all the rows of one partition. As 
you said, we only support 32-bit values (less than `(1 << 31) - 1`). But the 
row frame or range frame is just to restrict the range to fetch proper value 
from `rowBuffer`. This PR changes the type of offset from `Int` to `Long`. It 
will make the `rowBuffer` overflows. Besides, how to support a 64-bit buffer is 
another topic. Let me know if I understand the wrong. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r100046288
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/BoundOrdering.scala
 ---
@@ -25,18 +25,22 @@ import 
org.apache.spark.sql.catalyst.expressions.Projection
  * Function for comparing boundary values.
  */
 private[window] abstract class BoundOrdering {
-  def compare(inputRow: InternalRow, inputIndex: Int, outputRow: 
InternalRow, outputIndex: Int): Int
+  def compare(
+  inputRow: InternalRow,
+  inputIndex: Long,
+  outputRow: InternalRow,
+  outputIndex: Long): Long
 }
 
 /**
  * Compare the input index to the bound of the output index.
  */
-private[window] final case class RowBoundOrdering(offset: Int) extends 
BoundOrdering {
+private[window] final case class RowBoundOrdering(offset: Long) extends 
BoundOrdering {
--- End diff --

If we are going to support 64-bit values in a row frame then we also need 
to support a buffer that can store that many rows. `WindowExec` in its current 
form assumes that a buffer contains less than `(1 << 31) - 1` values (which is 
actually smaller than an 32-bit range can be). I have yet to see a use case 
where the buffer needs to be larger.

The current PR does not make all the necessary changes to make `WindowExec` 
support a 64-bit buffer (see RowBuffer.size for instance), and I am slightly 
worried about overflows. It will also be a daunting task to test this properly 
(you will need to create a buffer with more than 2 billion elements). So I 
prefer to keep this change local to range frames only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r100043629
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/BoundOrdering.scala
 ---
@@ -25,18 +25,22 @@ import 
org.apache.spark.sql.catalyst.expressions.Projection
  * Function for comparing boundary values.
  */
 private[window] abstract class BoundOrdering {
-  def compare(inputRow: InternalRow, inputIndex: Int, outputRow: 
InternalRow, outputIndex: Int): Int
+  def compare(
+  inputRow: InternalRow,
+  inputIndex: Long,
+  outputRow: InternalRow,
+  outputIndex: Long): Long
 }
 
 /**
  * Compare the input index to the bound of the output index.
  */
-private[window] final case class RowBoundOrdering(offset: Int) extends 
BoundOrdering {
+private[window] final case class RowBoundOrdering(offset: Long) extends 
BoundOrdering {
--- End diff --

@hvanhovell do you mean this is unnecessary as we can only support int 
anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-06 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16818

[SPARK-19451][SQL][Core] Underlying integer overflow in Window function

## What changes were proposed in this pull request?

reproduce code:

```
val tw =  Window.orderBy("date")
  .partitionBy("id")
  .rangeBetween( from , 0)
```

Everything seems ok, while from value is not too large... Even if the 
rangeBetween() method supports Long parameters.
But If i set -216000L value to from it does not work !

It seems like there is an underlying integer overflow issue here, i.e. 
convert Long to Int:

```
private def between(typ: FrameType, start: Long, end: Long): WindowSpec = {
val boundaryStart = start match {
  case 0 => CurrentRow
  case Long.MinValue => UnboundedPreceding
  case x if x < 0 => ValuePreceding(-start)
  case x if x > 0 => ValueFollowing(start)
}

val boundaryEnd = end match {
  case 0 => CurrentRow
  case Long.MaxValue => UnboundedFollowing
  case x if x < 0 => ValuePreceding(-end.toInt)
  case x if x > 0 => ValueFollowing(end.toInt)
}

new WindowSpec(
  partitionSpec,
  orderSpec,
  SpecifiedWindowFrame(typ, boundaryStart, boundaryEnd))
  }
```

This pr changes the type of index from Int to Long.

BTW: Is there any reason why the type of index is Int? I do not find any 
strong point to set like this.

## How was this patch tested?

existing ut


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19451

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16818


commit ea1f44026dc9f5b0f8e660b5adf8824b8deb94df
Author: uncleGen 
Date:   2017-02-06T01:36:29Z

change the type of index from Int to Long




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org