[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18540 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130224334 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) + e } -// Create the FrameBoundary ctx.boundType.getType match { case SqlBaseParser.PRECEDING if ctx.UNBOUNDED != null => UnboundedPreceding case SqlBaseParser.PRECEDING => -ValuePreceding(value) +UnaryMinus(value) case SqlBaseParser.CURRENT => CurrentRow case SqlBaseParser.FOLLOWING if ctx.UNBOUNDED != null => UnboundedFollowing case SqlBaseParser.FOLLOWING => -ValueFollowing(value) +value --- End diff -- It sounds like we already allowed it in the previous release. Thus, we need to follow what we have now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130217250 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql]( */ // Note: when updating the doc for this method, also update Window.rangeBetween. def rangeBetween(start: Long, end: Long): WindowSpec = { --- End diff -- We'll get and compute empty frames. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130217244 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) + e } -// Create the FrameBoundary ctx.boundType.getType match { case SqlBaseParser.PRECEDING if ctx.UNBOUNDED != null => UnboundedPreceding case SqlBaseParser.PRECEDING => -ValuePreceding(value) +UnaryMinus(value) case SqlBaseParser.CURRENT => CurrentRow case SqlBaseParser.FOLLOWING if ctx.UNBOUNDED != null => UnboundedFollowing case SqlBaseParser.FOLLOWING => -ValueFollowing(value) +value --- End diff -- May I ask how should we parse it into an `unsigned-integer`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130217194 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) --- End diff -- added --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130217171 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -351,20 +356,25 @@ abstract class OffsetWindowFunction override def nullable: Boolean = default == null || default.nullable || input.nullable - override lazy val frame = { -// This will be triggered by the Analyzer. -val offsetValue = offset.eval() match { - case o: Int => o - case x => throw new AnalysisException( -s"Offset expression must be a foldable integer expression: $x") -} + override lazy val frame: WindowFrame = { val boundary = direction match { - case Ascending => ValueFollowing(offsetValue) - case Descending => ValuePreceding(offsetValue) + case Ascending => offset + case Descending => UnaryMinus(offset) } SpecifiedWindowFrame(RowFrame, boundary, boundary) } + override def checkInputDataTypes(): TypeCheckResult = { +val check = super.checkInputDataTypes() +if (check.isFailure) { + check +} else if (!offset.foldable) { + TypeCheckFailure(s"Offset expression '$offset' must be a literal.") --- End diff -- Currently it's only used by lead()/lag() functions, that both checked the input types, so we're not able to test this from sql. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130217065 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,161 +101,171 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType - -/** - * The trait used to represent the type of a Window Frame Boundary. - */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" } /** - * Extractor for making working with frame boundaries easier. + * The trait used to represent special boundaries used in a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override def children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } - - override def toString: String = "UNBOUNDED PRECEDING" +/** UNBOUNDED boundary. */ +case object UnboundedPreceding extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED PRECEDING" } -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => false -case ValuePreceding(anotherValue) => value >= anotherValue -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } - - override def toString: String = s"$value PRECEDING" +case object UnboundedFollowing extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED FOLLOWING" } /** CURRENT ROW boundary. */ -case object CurrentRow extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => false -case vp: ValuePreceding => false -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130216990 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,58 +42,54 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec :+ frameSpecification override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cannot use an UnspecifiedFrame. This should have been converted during analysis. " + +"Please file a bug report.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && !f.isUnbounded && +orderSpec.isEmpty => +TypeCheckFailure( + "A range window frame cannot be used in an unordered window specification.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +orderSpec.size > 1 => +TypeCheckFailure( + s"A range window frame with value boundaries cannot be used in a window specification " + +s"with multiple order by expressions: ${orderSpec.mkString(",")}") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +!isValidFrameType(f.valueBoundary.head.dataType) => +TypeCheckFailure( + s"The data type '${orderSpec.head.dataType}' used in the order specification does " + +s"not match the data type '${f.valueBoundary.head.dataType}' which is used in the " + +"range frame.") --- End diff -- The first case is for defensive guard, I'll add test sql for the two negative cases related to orderBy in RangeFrame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130216635 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -121,15 +119,10 @@ trait CheckAnalysis extends PredicateHelper { // function. e match { case _: AggregateExpression | _: OffsetWindowFunction | _: AggregateWindowFunction => +w case _ => failAnalysis(s"Expression '$e' not supported within a window function.") } -// Make sure the window specification is valid. -s.validate match { --- End diff -- yea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215803 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql]( */ // Note: when updating the doc for this method, also update Window.rangeBetween. def rangeBetween(start: Long, end: Long): WindowSpec = { --- End diff -- What happens if `start` is larger than `end`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215695 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) + e } -// Create the FrameBoundary ctx.boundType.getType match { case SqlBaseParser.PRECEDING if ctx.UNBOUNDED != null => UnboundedPreceding case SqlBaseParser.PRECEDING => -ValuePreceding(value) +UnaryMinus(value) --- End diff -- The same here. Do we allow users assign a negative value? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215692 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) + e } -// Create the FrameBoundary ctx.boundType.getType match { case SqlBaseParser.PRECEDING if ctx.UNBOUNDED != null => UnboundedPreceding case SqlBaseParser.PRECEDING => -ValuePreceding(value) +UnaryMinus(value) case SqlBaseParser.CURRENT => CurrentRow case SqlBaseParser.FOLLOWING if ctx.UNBOUNDED != null => UnboundedFollowing case SqlBaseParser.FOLLOWING => -ValueFollowing(value) +value --- End diff -- It should be an `unsigned-integer` based on ANSI SQL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215611 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) --- End diff -- Any test case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215526 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -351,20 +356,25 @@ abstract class OffsetWindowFunction override def nullable: Boolean = default == null || default.nullable || input.nullable - override lazy val frame = { -// This will be triggered by the Analyzer. -val offsetValue = offset.eval() match { - case o: Int => o - case x => throw new AnalysisException( -s"Offset expression must be a foldable integer expression: $x") -} + override lazy val frame: WindowFrame = { val boundary = direction match { - case Ascending => ValueFollowing(offsetValue) - case Descending => ValuePreceding(offsetValue) + case Ascending => offset + case Descending => UnaryMinus(offset) } SpecifiedWindowFrame(RowFrame, boundary, boundary) } + override def checkInputDataTypes(): TypeCheckResult = { +val check = super.checkInputDataTypes() +if (check.isFailure) { + check +} else if (!offset.foldable) { + TypeCheckFailure(s"Offset expression '$offset' must be a literal.") --- End diff -- Having a test case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215469 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,161 +101,171 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType - -/** - * The trait used to represent the type of a Window Frame Boundary. - */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" } /** - * Extractor for making working with frame boundaries easier. + * The trait used to represent special boundaries used in a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override def children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } - - override def toString: String = "UNBOUNDED PRECEDING" +/** UNBOUNDED boundary. */ +case object UnboundedPreceding extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED PRECEDING" } -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => false -case ValuePreceding(anotherValue) => value >= anotherValue -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } - - override def toString: String = s"$value PRECEDING" +case object UnboundedFollowing extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED FOLLOWING" } /** CURRENT ROW boundary. */ -case object CurrentRow extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => false -case vp: ValuePreceding => false -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing =>
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215418 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,161 +101,171 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType - -/** - * The trait used to represent the type of a Window Frame Boundary. - */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval --- End diff -- uh, we also support `CalendarInterval`. Do we have a test case to verify it works on `CalendarInterval`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215342 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,58 +42,54 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec :+ frameSpecification override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cannot use an UnspecifiedFrame. This should have been converted during analysis. " + +"Please file a bug report.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && !f.isUnbounded && +orderSpec.isEmpty => +TypeCheckFailure( + "A range window frame cannot be used in an unordered window specification.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +orderSpec.size > 1 => +TypeCheckFailure( + s"A range window frame with value boundaries cannot be used in a window specification " + +s"with multiple order by expressions: ${orderSpec.mkString(",")}") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +!isValidFrameType(f.valueBoundary.head.dataType) => +TypeCheckFailure( + s"The data type '${orderSpec.head.dataType}' used in the order specification does " + +s"not match the data type '${f.valueBoundary.head.dataType}' which is used in the " + +"range frame.") --- End diff -- Just want to confirm whether we have at least four negatives test cases to respectively cover these cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130215242 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,58 +42,54 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec :+ frameSpecification override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cannot use an UnspecifiedFrame. This should have been converted during analysis. " + +"Please file a bug report.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && !f.isUnbounded && +orderSpec.isEmpty => +TypeCheckFailure( + "A range window frame cannot be used in an unordered window specification.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +orderSpec.size > 1 => +TypeCheckFailure( + s"A range window frame with value boundaries cannot be used in a window specification " + +s"with multiple order by expressions: ${orderSpec.mkString(",")}") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +!isValidFrameType(f.valueBoundary.head.dataType) => --- End diff -- Personally, I do not like many long `if` conditions. Let us add extra two spaces in line 71, 66, and 62. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130214990 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -805,4 +806,27 @@ object TypeCoercion { Option(ret) } } + + /** + * Cast WindowFrame boundaries to the type they operate upon. + */ + object WindowFrameCoercion extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case s @ WindowSpecDefinition(_, Seq(order), SpecifiedWindowFrame(RangeFrame, lower, upper)) +if order.resolved => --- End diff -- Nit: add two more spaces before `if` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130214960 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -805,4 +806,27 @@ object TypeCoercion { Option(ret) } } + + /** + * Cast WindowFrame boundaries to the type they operate upon. + */ + object WindowFrameCoercion extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case s @ WindowSpecDefinition(_, Seq(order), SpecifiedWindowFrame(RangeFrame, lower, upper)) +if order.resolved => +s.copy(frameSpecification = SpecifiedWindowFrame( + RangeFrame, + createBoundaryCast(lower, order.dataType), + createBoundaryCast(upper, order.dataType))) +} + +private def createBoundaryCast(boundary: Expression, dt: DataType): Expression = { + boundary match { +case e: Expression if e.dataType != dt && Cast.canCast(e.dataType, dt) && + !e.isInstanceOf[SpecialFrameBoundary] => --- End diff -- Splitting `if` in the middle looks weird. How about? ```Scala case e: SpecialFrameBoundary => e case e: Expression if e.dataType != dt && Cast.canCast(e.dataType, dt) => Cast(e, dt) case _ => boundary ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130214846 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -121,15 +119,10 @@ trait CheckAnalysis extends PredicateHelper { // function. e match { case _: AggregateExpression | _: OffsetWindowFunction | _: AggregateWindowFunction => +w case _ => failAnalysis(s"Expression '$e' not supported within a window function.") } -// Make sure the window specification is valid. -s.validate match { --- End diff -- The verification is moved to `checkInputDataTypes`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130214797 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -109,10 +109,8 @@ trait CheckAnalysis extends PredicateHelper { failAnalysis(s"Distinct window functions are not supported: $w") case w @ WindowExpression(_: OffsetWindowFunction, WindowSpecDefinition(_, order, - SpecifiedWindowFrame(frame, - FrameBoundary(l), - FrameBoundary(h - if order.isEmpty || frame != RowFrame || l != h => + frame: SpecifiedWindowFrame)) --- End diff -- Nit: Cutting in the middle looks weird. Please keep them in the same line. `WindowSpecDefinition(_, order, frame: SpecifiedWindowFrame))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130078786 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) --- End diff -- sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130076316 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) --- End diff -- How about keep this so we can fail earlier? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130074867 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1179,32 +1179,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create or resolve a [[FrameBoundary]]. Simple math expressions are allowed for Value - * Preceding/Following boundaries. These expressions must be constant (foldable) and return an - * integer value. + * Create or resolve a frame boundary expressions. */ - override def visitFrameBound(ctx: FrameBoundContext): FrameBoundary = withOrigin(ctx) { -// We currently only allow foldable integers. -def value: Int = { + override def visitFrameBound(ctx: FrameBoundContext): Expression = withOrigin(ctx) { +def value: Expression = { val e = expression(ctx.expression) - validate(e.resolved && e.foldable && e.dataType == IntegerType, -"Frame bound value must be a constant integer.", -ctx) - e.eval().asInstanceOf[Int] + validate(e.resolved && e.foldable, "Frame bound value must be a literal.", ctx) --- End diff -- is it necessary? I think analyzer can detect and report this failure too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130073540 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,161 +113,176 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType - -/** - * The trait used to represent the type of a Window Frame Boundary. - */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" } /** - * Extractor for making working with frame boundaries easier. + * The trait used to represent special boundaries used in a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false + + def notFollows(other: Expression): Boolean } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** UNBOUNDED boundary. */ +case object UnboundedPreceding extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED PRECEDING" - override def toString: String = "UNBOUNDED PRECEDING" + override def notFollows(other: Expression): Boolean = true } -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => false -case ValuePreceding(anotherValue) => value >= anotherValue -case CurrentRow => true -case vf: ValueFollowing => true +case object UnboundedFollowing extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED FOLLOWING" + + override def notFollows(other: Expression): Boolean = other match { case UnboundedFollowing => true +case _ => false } - - override def toString: String = s"$value PRECEDING" } /** CURRENT ROW boundary. */ -case object CurrentRow extends FrameBoundary { - def notFollows(other:
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130073180 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,161 +113,176 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType - -/** - * The trait used to represent the type of a Window Frame Boundary. - */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" } /** - * Extractor for making working with frame boundaries easier. + * The trait used to represent special boundaries used in a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false + + def notFollows(other: Expression): Boolean } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** UNBOUNDED boundary. */ +case object UnboundedPreceding extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED PRECEDING" - override def toString: String = "UNBOUNDED PRECEDING" + override def notFollows(other: Expression): Boolean = true } -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => false -case ValuePreceding(anotherValue) => value >= anotherValue -case CurrentRow => true -case vf: ValueFollowing => true +case object UnboundedFollowing extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED FOLLOWING" + + override def notFollows(other: Expression): Boolean = other match { case UnboundedFollowing => true +case _ => false } - - override def toString: String = s"$value PRECEDING" } /** CURRENT ROW boundary. */ -case object CurrentRow extends FrameBoundary { - def notFollows(other:
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130073049 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,161 +113,176 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType - -/** - * The trait used to represent the type of a Window Frame Boundary. - */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" } /** - * Extractor for making working with frame boundaries easier. + * The trait used to represent special boundaries used in a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil --- End diff -- nit: `def` instead of `lazy val` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130073011 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,161 +113,176 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType - -/** - * The trait used to represent the type of a Window Frame Boundary. - */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" } /** - * Extractor for making working with frame boundaries easier. + * The trait used to represent special boundaries used in a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false + + def notFollows(other: Expression): Boolean } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** UNBOUNDED boundary. */ +case object UnboundedPreceding extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED PRECEDING" - override def toString: String = "UNBOUNDED PRECEDING" + override def notFollows(other: Expression): Boolean = true } -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => false -case ValuePreceding(anotherValue) => value >= anotherValue -case CurrentRow => true -case vf: ValueFollowing => true +case object UnboundedFollowing extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED FOLLOWING" + + override def notFollows(other: Expression): Boolean = other match { case UnboundedFollowing => true +case _ => false } - - override def toString: String = s"$value PRECEDING" } /** CURRENT ROW boundary. */ -case object CurrentRow extends FrameBoundary { - def notFollows(other:
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130036535 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,161 +113,176 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType - -/** - * The trait used to represent the type of a Window Frame Boundary. - */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" } /** - * Extractor for making working with frame boundaries easier. + * The trait used to represent special boundaries used in a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false + + def notFollows(other: Expression): Boolean } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** UNBOUNDED boundary. */ +case object UnboundedPreceding extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED PRECEDING" - override def toString: String = "UNBOUNDED PRECEDING" + override def notFollows(other: Expression): Boolean = true } -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => false -case ValuePreceding(anotherValue) => value >= anotherValue -case CurrentRow => true -case vf: ValueFollowing => true +case object UnboundedFollowing extends SpecialFrameBoundary { + override def sql: String = "UNBOUNDED FOLLOWING" + + override def notFollows(other: Expression): Boolean = other match { case UnboundedFollowing => true +case _ => false } - - override def toString: String = s"$value PRECEDING" } /** CURRENT ROW boundary. */ -case object CurrentRow extends FrameBoundary { - def notFollows(other:
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130036232 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,57 +42,65 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec :+ frameSpecification override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cannot use an UnspecifiedFrame. This should have been converted during analysis. " + +"Please file a bug report.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && !f.isUnbounded && +orderSpec.isEmpty => +TypeCheckFailure( + "A range window frame cannot be used in an unordered window specification.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +orderSpec.size > 1 => +TypeCheckFailure( + s"A range window frame with value boundaries cannot be used in a window specification " + +s"with multiple order by expressions: ${orderSpec.mkString(",")}") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +!isValidFrameType(f.valueBoundary.head.dataType) => +TypeCheckFailure( + s"The data type '${orderSpec.head.dataType}' used in the order specification does " + +s"not match the data type '${f.valueBoundary.head.dataType}' which is used in the " + +"range frame.") + case f: SpecifiedWindowFrame if !isValidFrameBoundary(f.lower, f.upper) => +TypeCheckFailure(s"The upper bound of the window frame is '${f.upper.sql}', which is " + + s"smaller than the lower bound '${f.lower.sql}'.") + case _ => TypeCheckSuccess } + } -val order = if (orderSpec.isEmpty) { - "" -} else { - "ORDER BY " + orderSpec.map(_.sql).mkString(", ") + " " + override def sql: String = { +def toSql(exprs: Seq[Expression], prefix: String): Seq[String] = { + Seq(exprs).filter(_.nonEmpty).map(_.map(_.sql).mkString(prefix, ", ", "")) } -s"($partition$order${frameSpecification.toString})" +val elements = + toSql(partitionSpec, "PARTITION BY ") ++ +toSql(orderSpec, "ORDER BY ") ++ +
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130035488 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,58 +42,54 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec :+ frameSpecification override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cannot use an UnspecifiedFrame. This should have been converted during analysis. " + +"Please file a bug report.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && !f.isUnbounded && --- End diff -- Sorry I was wrong --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r130035366 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,57 +42,65 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec :+ frameSpecification override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cannot use an UnspecifiedFrame. This should have been converted during analysis. " + +"Please file a bug report.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && !f.isUnbounded && +orderSpec.isEmpty => +TypeCheckFailure( + "A range window frame cannot be used in an unordered window specification.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +orderSpec.size > 1 => +TypeCheckFailure( + s"A range window frame with value boundaries cannot be used in a window specification " + +s"with multiple order by expressions: ${orderSpec.mkString(",")}") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound && +!isValidFrameType(f.valueBoundary.head.dataType) => +TypeCheckFailure( + s"The data type '${orderSpec.head.dataType}' used in the order specification does " + +s"not match the data type '${f.valueBoundary.head.dataType}' which is used in the " + +"range frame.") + case f: SpecifiedWindowFrame if !isValidFrameBoundary(f.lower, f.upper) => +TypeCheckFailure(s"The upper bound of the window frame is '${f.upper.sql}', which is " + + s"smaller than the lower bound '${f.lower.sql}'.") + case _ => TypeCheckSuccess } + } -val order = if (orderSpec.isEmpty) { - "" -} else { - "ORDER BY " + orderSpec.map(_.sql).mkString(", ") + " " + override def sql: String = { +def toSql(exprs: Seq[Expression], prefix: String): Seq[String] = { + Seq(exprs).filter(_.nonEmpty).map(_.map(_.sql).mkString(prefix, ", ", "")) } -s"($partition$order${frameSpecification.toString})" +val elements = + toSql(partitionSpec, "PARTITION BY ") ++ +toSql(orderSpec, "ORDER BY ") ++ +
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129209847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,58 +42,54 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec :+ frameSpecification override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cannot use an UnspecifiedFrame. This should have been converted during analysis. " + +"Please file a bug report.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && !f.isUnbounded && --- End diff -- I think this should be `!f.isValueBound`? basically `current row and current row` is not unbound but should be allowed here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129078561 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +101,167 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary + /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. + */ +case class SpecifiedWindowFrame( +frameType: FrameType, +lower: Expression, +upper: Expression) + extends WindowFrame { -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129005758 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala --- @@ -267,16 +267,17 @@ class ExpressionParserSuite extends PlanTest { // Range/Row val frameTypes = Seq(("rows", RowFrame), ("range", RangeFrame)) val boundaries = Seq( - ("10 preceding", ValuePreceding(10), CurrentRow), - ("3 + 1 following", ValueFollowing(4), CurrentRow), // Will fail during analysis - ("unbounded preceding", UnboundedPreceding, CurrentRow), - ("unbounded following", UnboundedFollowing, CurrentRow), // Will fail during analysis - ("between unbounded preceding and current row", UnboundedPreceding, CurrentRow), + ("10 preceding", -Literal(10), CurrentRow), + ("2147483648 preceding", -Literal(2147483648L), CurrentRow), + ("3 + 1 following", Add(Literal(3), Literal(1)), CurrentRow), // Will fail during analysis + ("unbounded preceding", Unbounded, CurrentRow), + ("unbounded following", Unbounded, CurrentRow), // Will fail during analysis --- End diff -- Well the idea was that the unboundedness was tied to the location in which it was used, so for example unbounded in the first position would mean unbounded preceding. However this is completely opposite to how we interpret literal bounds, it might be better to reintroduce special boundaries for unbounded preceding and unbounded following. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129005233 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala --- @@ -267,16 +267,17 @@ class ExpressionParserSuite extends PlanTest { // Range/Row val frameTypes = Seq(("rows", RowFrame), ("range", RangeFrame)) val boundaries = Seq( - ("10 preceding", ValuePreceding(10), CurrentRow), - ("3 + 1 following", ValueFollowing(4), CurrentRow), // Will fail during analysis - ("unbounded preceding", UnboundedPreceding, CurrentRow), - ("unbounded following", UnboundedFollowing, CurrentRow), // Will fail during analysis - ("between unbounded preceding and current row", UnboundedPreceding, CurrentRow), + ("10 preceding", -Literal(10), CurrentRow), + ("2147483648 preceding", -Literal(2147483648L), CurrentRow), + ("3 + 1 following", Add(Literal(3), Literal(1)), CurrentRow), // Will fail during analysis --- End diff -- Do you think that will improve the UX? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129005122 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +101,167 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary + /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. + */ +case class SpecifiedWindowFrame( +frameType: FrameType, +lower: Expression, +upper: Expression) + extends WindowFrame { -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129004992 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +101,167 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary + /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. + */ +case class SpecifiedWindowFrame( +frameType: FrameType, +lower: Expression, +upper: Expression) + extends WindowFrame { -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129002630 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala --- @@ -267,16 +267,17 @@ class ExpressionParserSuite extends PlanTest { // Range/Row val frameTypes = Seq(("rows", RowFrame), ("range", RangeFrame)) val boundaries = Seq( - ("10 preceding", ValuePreceding(10), CurrentRow), - ("3 + 1 following", ValueFollowing(4), CurrentRow), // Will fail during analysis - ("unbounded preceding", UnboundedPreceding, CurrentRow), - ("unbounded following", UnboundedFollowing, CurrentRow), // Will fail during analysis - ("between unbounded preceding and current row", UnboundedPreceding, CurrentRow), + ("10 preceding", -Literal(10), CurrentRow), + ("2147483648 preceding", -Literal(2147483648L), CurrentRow), + ("3 + 1 following", Add(Literal(3), Literal(1)), CurrentRow), // Will fail during analysis + ("unbounded preceding", Unbounded, CurrentRow), + ("unbounded following", Unbounded, CurrentRow), // Will fail during analysis --- End diff -- In fact this is problematic, we would generate the same result for both `unbounded preceding` and `unbounded following`. @hvanhovell any idea on resolving this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129002412 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala --- @@ -267,16 +267,17 @@ class ExpressionParserSuite extends PlanTest { // Range/Row val frameTypes = Seq(("rows", RowFrame), ("range", RangeFrame)) val boundaries = Seq( - ("10 preceding", ValuePreceding(10), CurrentRow), - ("3 + 1 following", ValueFollowing(4), CurrentRow), // Will fail during analysis - ("unbounded preceding", UnboundedPreceding, CurrentRow), - ("unbounded following", UnboundedFollowing, CurrentRow), // Will fail during analysis - ("between unbounded preceding and current row", UnboundedPreceding, CurrentRow), + ("10 preceding", -Literal(10), CurrentRow), + ("2147483648 preceding", -Literal(2147483648L), CurrentRow), + ("3 + 1 following", Add(Literal(3), Literal(1)), CurrentRow), // Will fail during analysis --- End diff -- The lower boundary would be higher than the upper boundary, previously it would fail, but we have removed this check, should add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r129000692 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala --- @@ -436,21 +436,22 @@ class TreeNodeSuite extends SparkFunSuite { "bucketColumnNames" -> "[bucket]", "sortColumnNames" -> "[sort]")) -// Converts FrameBoundary to JSON -assertJSON( - ValueFollowing(3), - JObject( -"product-class" -> classOf[ValueFollowing].getName, -"value" -> 3)) - // Converts WindowFrame to JSON assertJSON( - SpecifiedWindowFrame(RowFrame, UnboundedFollowing, CurrentRow), - JObject( -"product-class" -> classOf[SpecifiedWindowFrame].getName, -"frameType" -> JObject("object" -> JString(RowFrame.getClass.getName)), -"frameStart" -> JObject("object" -> JString(UnboundedFollowing.getClass.getName)), -"frameEnd" -> JObject("object" -> JString(CurrentRow.getClass.getName + SpecifiedWindowFrame(RowFrame, Unbounded, CurrentRow), + List( +JObject( + "class" -> classOf[SpecifiedWindowFrame].getName, + "num-children" -> 2, + "frameType" -> JObject("object" -> JString(RowFrame.getClass.getName)), + "lower" -> 0, + "upper" -> 1), --- End diff -- After this PR, `SpecialFrameBoundary` and `WindowFrame` are made `Expression`s, thus they are `TreeNode`s, so the field values are made value index in the TreeNode.children. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128947347 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +101,167 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary + /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. + */ +case class SpecifiedWindowFrame( +frameType: FrameType, +lower: Expression, +upper: Expression) + extends WindowFrame { -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128894965 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala --- @@ -436,21 +436,22 @@ class TreeNodeSuite extends SparkFunSuite { "bucketColumnNames" -> "[bucket]", "sortColumnNames" -> "[sort]")) -// Converts FrameBoundary to JSON -assertJSON( - ValueFollowing(3), - JObject( -"product-class" -> classOf[ValueFollowing].getName, -"value" -> 3)) - // Converts WindowFrame to JSON assertJSON( - SpecifiedWindowFrame(RowFrame, UnboundedFollowing, CurrentRow), - JObject( -"product-class" -> classOf[SpecifiedWindowFrame].getName, -"frameType" -> JObject("object" -> JString(RowFrame.getClass.getName)), -"frameStart" -> JObject("object" -> JString(UnboundedFollowing.getClass.getName)), -"frameEnd" -> JObject("object" -> JString(CurrentRow.getClass.getName + SpecifiedWindowFrame(RowFrame, Unbounded, CurrentRow), + List( +JObject( + "class" -> classOf[SpecifiedWindowFrame].getName, + "num-children" -> 2, + "frameType" -> JObject("object" -> JString(RowFrame.getClass.getName)), + "lower" -> 0, + "upper" -> 1), --- End diff -- why lower and upper is 0 and 1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128894924 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala --- @@ -267,16 +267,17 @@ class ExpressionParserSuite extends PlanTest { // Range/Row val frameTypes = Seq(("rows", RowFrame), ("range", RangeFrame)) val boundaries = Seq( - ("10 preceding", ValuePreceding(10), CurrentRow), - ("3 + 1 following", ValueFollowing(4), CurrentRow), // Will fail during analysis - ("unbounded preceding", UnboundedPreceding, CurrentRow), - ("unbounded following", UnboundedFollowing, CurrentRow), // Will fail during analysis - ("between unbounded preceding and current row", UnboundedPreceding, CurrentRow), + ("10 preceding", -Literal(10), CurrentRow), + ("2147483648 preceding", -Literal(2147483648L), CurrentRow), + ("3 + 1 following", Add(Literal(3), Literal(1)), CurrentRow), // Will fail during analysis + ("unbounded preceding", Unbounded, CurrentRow), + ("unbounded following", Unbounded, CurrentRow), // Will fail during analysis --- End diff -- ditto, why? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128894916 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala --- @@ -267,16 +267,17 @@ class ExpressionParserSuite extends PlanTest { // Range/Row val frameTypes = Seq(("rows", RowFrame), ("range", RangeFrame)) val boundaries = Seq( - ("10 preceding", ValuePreceding(10), CurrentRow), - ("3 + 1 following", ValueFollowing(4), CurrentRow), // Will fail during analysis - ("unbounded preceding", UnboundedPreceding, CurrentRow), - ("unbounded following", UnboundedFollowing, CurrentRow), // Will fail during analysis - ("between unbounded preceding and current row", UnboundedPreceding, CurrentRow), + ("10 preceding", -Literal(10), CurrentRow), + ("2147483648 preceding", -Literal(2147483648L), CurrentRow), + ("3 + 1 following", Add(Literal(3), Literal(1)), CurrentRow), // Will fail during analysis --- End diff -- why this will fail during analysis? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128894905 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -1109,6 +1109,31 @@ class TypeCoercionSuite extends AnalysisTest { EqualTo(Literal(Array(1, 2)), Literal("123")), EqualTo(Literal(Array(1, 2)), Literal("123"))) } + + test("cast WindowFrame boundaries to the type they operate upon") { +// Can cast frame boundaries to order dataType. +ruleTest(WindowFrameCoercion, + windowSpec( +Seq(UnresolvedAttribute("a")), +Seq(SortOrder(Literal(1L), Ascending)), +SpecifiedWindowFrame(RangeFrame, Literal(3), Literal(2147483648L))), + windowSpec( +Seq(UnresolvedAttribute("a")), +Seq(SortOrder(Literal(1L), Ascending)), +SpecifiedWindowFrame(RangeFrame, Cast(3, LongType), Literal(2147483648L))) +) +// Cannot cast frame boundaries to order dataType. +ruleTest(WindowFrameCoercion, + windowSpec( +Seq(UnresolvedAttribute("a")), +Seq(SortOrder(Literal.default(DateType), Ascending)), +SpecifiedWindowFrame(RangeFrame, Literal(10.0), Literal(2147483648L))), + windowSpec( +Seq(UnresolvedAttribute("a")), +Seq(SortOrder(Literal.default(DateType), Ascending)), +SpecifiedWindowFrame(RangeFrame, Literal(10.0), Literal(2147483648L))) +) --- End diff -- can we add some more test cases with special window frame boundary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128894802 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +101,167 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary + /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. + */ +case class SpecifiedWindowFrame( +frameType: FrameType, +lower: Expression, +upper: Expression) + extends WindowFrame { -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128894791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +101,167 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary + /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. + */ +case class SpecifiedWindowFrame( +frameType: FrameType, +lower: Expression, +upper: Expression) + extends WindowFrame { -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128894779 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +101,167 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean +sealed trait SpecialFrameBoundary extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = NullType + override def foldable: Boolean = false + override def nullable: Boolean = false } +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary + /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. + */ +case class SpecifiedWindowFrame( +frameType: FrameType, +lower: Expression, +upper: Expression) + extends WindowFrame { -/** PRECEDING boundary. */ -case class ValuePreceding(value: Int) extends FrameBoundary { - def
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r128894061 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,58 +42,54 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec ++ Seq(frameSpecification) --- End diff -- nit: `partitionSpec ++ orderSpec :+ frameSpecification` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r127453702 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +105,164 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean -} +sealed trait SpecialFrameBoundary + +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. --- End diff -- Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r127451425 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +105,164 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean -} +sealed trait SpecialFrameBoundary + +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. --- End diff -- can we do this in `WindowFrameCoercion`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r127449331 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +105,164 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean -} +sealed trait SpecialFrameBoundary + +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. --- End diff -- I tried that. The problem is that you will need to make them have a proper data type. I tried to make them `case object .. {}` with data type null, but I ended with these replaced with a null literal. All I am saying that this will require a little bit more coding. Since you need to resolve the data type of the boundary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r127143764 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,57 +42,57 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec ++ Seq(frameSpecification) override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cannot use an UnspecifiedFrame. This should have been converted during analysis. " + +"Please file a bug report.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && !f.isUnbounded +&& orderSpec.isEmpty => +TypeCheckFailure( + "A range window frame cannot be used in an unordered window specification.") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound +&& orderSpec.size > 1 => +TypeCheckFailure( + s"A range window frame with value boundaries cannot be used in a window specification " + +s"with multiple order by expressions: ${orderSpec.mkString(",")}") + case f: SpecifiedWindowFrame if f.frameType == RangeFrame && f.isValueBound +&& !isValidFrameType(f.children.head.dataType) => +TypeCheckFailure( + s"The data type '${orderSpec.head.dataType}' used in the order specification does " + +s"not match the data type '${f.children.head.dataType}' which is used in the " + +"range frame.") + case _ => TypeCheckSuccess } + } -val order = if (orderSpec.isEmpty) { - "" -} else { - "ORDER BY " + orderSpec.map(_.sql).mkString(", ") + " " + override def sql: String = { +def toSql(exprs: Seq[Expression], prefix: String): Seq[String] = { + Seq(exprs).filter(_.nonEmpty).map(_.map(_.sql).mkString(prefix, ", ", "")) } -s"($partition$order${frameSpecification.toString})" +val elements = + toSql(partitionSpec, "PARTITION BY ") ++ +toSql(orderSpec, "ORDER BY ") ++ +Seq(frameSpecification.sql) +elements.mkString("(", " ", ")") + } + + private def isValidFrameType(ft: DataType): Boolean = (orderSpec.head.dataType, ft) match { +case (DateType, IntegerType) => true +case (TimestampType,
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r127142649 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +105,164 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean -} +sealed trait SpecialFrameBoundary + +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** + * A specified Window Frame. The val lower/uppper can be either a foldable [[Expression]] or a + * [[SpecialFrameBoundary]]. --- End diff -- can we make `SpecialFrameBoundary` an expression? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail:
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r126680805 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -805,4 +806,24 @@ object TypeCoercion { Option(ret) } } + + /** + * Cast WindowFrame boundaries to the type they operate upon. + */ + object WindowFrameCoercion extends Rule[LogicalPlan] { --- End diff -- Hmmm that would be kind of weird. So a user will get type coercion in its select but not in the range clause. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r126187356 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql]( */ // Note: when updating the doc for this method, also update Window.rangeBetween. def rangeBetween(start: Long, end: Long): WindowSpec = { --- End diff -- Let's rule it out of the scope of this PR and address this in a follow-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r126016128 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql]( */ // Note: when updating the doc for this method, also update Window.rangeBetween. def rangeBetween(start: Long, end: Long): WindowSpec = { --- End diff -- i was trying to avoid introducing a special value, but maybe you can do that. How important is it to fix this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r126016260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -805,4 +806,24 @@ object TypeCoercion { Option(ret) } } + + /** + * Cast WindowFrame boundaries to the type they operate upon. + */ + object WindowFrameCoercion extends Rule[LogicalPlan] { --- End diff -- do we really need this? can we just require strict types? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125901045 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -151,6 +151,48 @@ class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { Row(2.0d), Row(2.0d))) } + test("row between should accept integer values as boundary") { --- End diff -- We can only add the test cases after we have finalized the API change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125900833 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql]( */ // Note: when updating the doc for this method, also update Window.rangeBetween. def rangeBetween(start: Long, end: Long): WindowSpec = { --- End diff -- Maybe we can use `Literal(0)` to represent `CurrentRow`? And a sufficient large number(like `Literal(Long.MaxValue)`) for `Unbounded`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125735576 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql]( */ // Note: when updating the doc for this method, also update Window.rangeBetween. def rangeBetween(start: Long, end: Long): WindowSpec = { --- End diff -- It might be the easiest to make it take a `Column`, so `rangeBetween(begin: Column, end: Column)`, only downside to this is that we need some way to express the special boundaries (`current row`, `unbounded`). Also cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125688475 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -151,6 +151,48 @@ class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { Row(2.0d), Row(2.0d))) } + test("row between should accept integer values as boundary") { --- End diff -- Will do this tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125688053 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql]( */ // Note: when updating the doc for this method, also update Window.rangeBetween. def rangeBetween(start: Long, end: Long): WindowSpec = { --- End diff -- I totally agree that's what we definitely should do, but I'd suggest we address this in a follow-up work, and focus on resolving the overflow issue on Long frame boundaries in `rangeBetween` in this PR. One major concern is the `WindowSpec` API is marked `Stable`, so I'm wondering what's the proper procedure to make a change to this interface? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125602833 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala --- @@ -109,46 +109,54 @@ case class WindowExec( * * This method uses Code Generation. It can only be used on the executor side. * - * @param frameType to evaluate. This can either be Row or Range based. - * @param offset with respect to the row. + * @param frame to evaluate. This can either be a Row or Range frame. + * @param bound with respect to the row. * @return a bound ordering object. */ - private[this] def createBoundOrdering(frameType: FrameType, offset: Int): BoundOrdering = { -frameType match { - case RangeFrame => -val (exprs, current, bound) = if (offset == 0) { - // Use the entire order expression when the offset is 0. - val exprs = orderSpec.map(_.child) - val buildProjection = () => newMutableProjection(exprs, child.output) - (orderSpec, buildProjection(), buildProjection()) -} else if (orderSpec.size == 1) { - // Use only the first order expression when the offset is non-null. - val sortExpr = orderSpec.head - val expr = sortExpr.child - // Create the projection which returns the current 'value'. - val current = newMutableProjection(expr :: Nil, child.output) - // Flip the sign of the offset when processing the order is descending - val boundOffset = sortExpr.direction match { -case Descending => -offset -case Ascending => offset - } - // Create the projection which returns the current 'value' modified by adding the offset. - val boundExpr = Add(expr, Cast(Literal.create(boundOffset, IntegerType), expr.dataType)) - val bound = newMutableProjection(boundExpr :: Nil, child.output) - (sortExpr :: Nil, current, bound) -} else { - sys.error("Non-Zero range offsets are not supported for windows " + -"with multiple order expressions.") + private[this] def createBoundOrdering(frame: FrameType, bound: AnyRef): BoundOrdering = { +(frame, bound) match { + case (RowFrame, CurrentRow) => +RowBoundOrdering(0) + + case (RowFrame, IntegerLiteral(offset)) => +RowBoundOrdering(offset) + + case (RangeFrame, CurrentRow) => +val ordering = newOrdering(orderSpec, child.output) +RangeBoundOrdering(ordering, IdentityProjection, IdentityProjection) + + case (RangeFrame, offset: Expression) if orderSpec.size == 1 => +// Use only the first order expression when the offset is non-null. +val sortExpr = orderSpec.head +val expr = sortExpr.child + +// Create the projection which returns the current 'value'. +val current = newMutableProjection(expr :: Nil, child.output) + +// Flip the sign of the offset when processing the order is descending +val boundOffset = sortExpr.direction match { + case Descending => UnaryMinus(offset) + case Ascending => offset +} + +// Create the projection which returns the current 'value' modified by adding the offset. +val boundExpr = (expr.dataType, boundOffset.dataType) match { + case (DateType, IntegerType) => DateAdd(expr, boundOffset) --- End diff -- Both `DateType` and `TimestampType` expressions are going to need a time zone. I was wondering if we can use a `GMT` because these are just offset calculation? cc @ueshin If we can't then we either need to thread through the session local timezone, or it might be easier to put the entire offset calculation in the frame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125603187 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -151,6 +151,48 @@ class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { Row(2.0d), Row(2.0d))) } + test("row between should accept integer values as boundary") { --- End diff -- We should also add tests for dates/doubles/timestamps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125598577 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -43,57 +42,57 @@ case class WindowSpecDefinition( orderSpec: Seq[SortOrder], frameSpecification: WindowFrame) extends Expression with WindowSpec with Unevaluable { - def validate: Option[String] = frameSpecification match { -case UnspecifiedFrame => - Some("Found a UnspecifiedFrame. It should be converted to a SpecifiedWindowFrame " + -"during analysis. Please file a bug report.") -case frame: SpecifiedWindowFrame => frame.validate.orElse { - def checkValueBasedBoundaryForRangeFrame(): Option[String] = { -if (orderSpec.length > 1) { - // It is not allowed to have a value-based PRECEDING and FOLLOWING - // as the boundary of a Range Window Frame. - Some("This Range Window Frame only accepts at most one ORDER BY expression.") -} else if (orderSpec.nonEmpty && !orderSpec.head.dataType.isInstanceOf[NumericType]) { - Some("The data type of the expression in the ORDER BY clause should be a numeric type.") -} else { - None -} - } - - (frame.frameType, frame.frameStart, frame.frameEnd) match { -case (RangeFrame, vp: ValuePreceding, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, vf: ValueFollowing, _) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vp: ValuePreceding) => checkValueBasedBoundaryForRangeFrame() -case (RangeFrame, _, vf: ValueFollowing) => checkValueBasedBoundaryForRangeFrame() -case (_, _, _) => None - } -} - } - - override def children: Seq[Expression] = partitionSpec ++ orderSpec + override def children: Seq[Expression] = partitionSpec ++ orderSpec ++ Seq(frameSpecification) override lazy val resolved: Boolean = childrenResolved && checkInputDataTypes().isSuccess && frameSpecification.isInstanceOf[SpecifiedWindowFrame] override def nullable: Boolean = true override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException + override def dataType: DataType = throw new UnsupportedOperationException("dataType") - override def sql: String = { -val partition = if (partitionSpec.isEmpty) { - "" -} else { - "PARTITION BY " + partitionSpec.map(_.sql).mkString(", ") + " " + override def checkInputDataTypes(): TypeCheckResult = { +frameSpecification match { + case UnspecifiedFrame => +TypeCheckFailure( + "Cann't use an UnspecifiedFrame. This should have been converted during analysis. " + --- End diff -- NIT: typo `Cannot` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125599886 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -106,173 +105,161 @@ case class WindowSpecReference(name: String) extends WindowSpec /** * The trait used to represent the type of a Window Frame. */ -sealed trait FrameType +sealed trait FrameType { + def inputType: AbstractDataType + def sql: String +} /** - * RowFrame treats rows in a partition individually. When a [[ValuePreceding]] - * or a [[ValueFollowing]] is used as its [[FrameBoundary]], the value is considered - * as a physical offset. + * RowFrame treats rows in a partition individually. Values used in a row frame are considered + * to be physical offsets. * For example, `ROW BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a 3-row frame, * from the row that precedes the current row to the row that follows the current row. */ -case object RowFrame extends FrameType +case object RowFrame extends FrameType { + override def inputType: AbstractDataType = IntegerType + override def sql: String = "ROWS" +} /** - * RangeFrame treats rows in a partition as groups of peers. - * All rows having the same `ORDER BY` ordering are considered as peers. - * When a [[ValuePreceding]] or a [[ValueFollowing]] is used as its [[FrameBoundary]], - * the value is considered as a logical offset. + * RangeFrame treats rows in a partition as groups of peers. All rows having the same `ORDER BY` + * ordering are considered as peers. Values used in a range frame are considered to be logical + * offsets. * For example, assuming the value of the current row's `ORDER BY` expression `expr` is `v`, * `RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING` represents a frame containing rows whose values * `expr` are in the range of [v-1, v+1]. * * If `ORDER BY` clause is not defined, all rows in the partition are considered as peers * of the current row. */ -case object RangeFrame extends FrameType +case object RangeFrame extends FrameType { + override def inputType: AbstractDataType = TypeCollection.NumericAndInterval + override def sql: String = "RANGE" +} /** - * The trait used to represent the type of a Window Frame Boundary. + * The trait used to represent special boundaries used in a window frame. */ -sealed trait FrameBoundary { - def notFollows(other: FrameBoundary): Boolean -} +sealed trait SpecialFrameBoundary + +/** UNBOUNDED boundary. */ +case object Unbounded extends SpecialFrameBoundary + +/** CURRENT ROW boundary. */ +case object CurrentRow extends SpecialFrameBoundary /** - * Extractor for making working with frame boundaries easier. + * Represents a window frame. */ -object FrameBoundary { - def apply(boundary: FrameBoundary): Option[Int] = unapply(boundary) - def unapply(boundary: FrameBoundary): Option[Int] = boundary match { -case CurrentRow => Some(0) -case ValuePreceding(offset) => Some(-offset) -case ValueFollowing(offset) => Some(offset) -case _ => None - } +sealed trait WindowFrame extends Expression with Unevaluable { + override lazy val children: Seq[Expression] = Nil + override def dataType: DataType = throw new UnsupportedOperationException("dataType") + override def foldable: Boolean = false + override def nullable: Boolean = false } -/** UNBOUNDED PRECEDING boundary. */ -case object UnboundedPreceding extends FrameBoundary { - def notFollows(other: FrameBoundary): Boolean = other match { -case UnboundedPreceding => true -case vp: ValuePreceding => true -case CurrentRow => true -case vf: ValueFollowing => true -case UnboundedFollowing => true - } +/** Used as a placeholder when a frame specification is not defined. */ +case object UnspecifiedFrame extends WindowFrame - override def toString: String = "UNBOUNDED PRECEDING" -} +/** A specified Window Frame. */ +case class SpecifiedWindowFrame( +frameType: FrameType, +lower: AnyRef, --- End diff -- I made `lower` and `upper` `AnyRef`, this was to allow the use of both of (foldable) `Expressions` and `SpecialFrameBoundary`. This works rather well with things like constant folding. The reason for not making `SpecialFrameBoundary` an `Expression` is that this cannot have a type (unless you make it a case class I suppose) and that it showed some weird behavior during analysis/optimization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r125602934 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql]( */ // Note: when updating the doc for this method, also update Window.rangeBetween. def rangeBetween(start: Long, end: Long): WindowSpec = { --- End diff -- We should also create API's that allow for other types of literals, at least one for `CalendarInterval`s. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/18540 [SPARK-19451][SQL] rangeBetween method should accept Long value as boundary ## What changes were proposed in this pull request? Long values can be passed to `rangeBetween` as range frame boundaries, but we silently convert it to Int values, this can cause wrong results and we should fix this. Further more, we should accept any legal literal values as range frame boundaries. In this PR, we make it possible for Long values, and make accepting other DataTypes really easy to add. ## How was this patch tested? Add new tests in `DataFrameWindowFunctionsSuite` and `TypeCoercionSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark rangeFrame Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18540.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18540 commit 52c52895cfe3cae58f8b029a350949cc153971a7 Author: Xingbo JiangDate: 2017-07-05T09:02:51Z rangeBetween accept literal values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org