[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76335953 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" + } case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" --- End diff -- I have created SPARK-17246 to track this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14777 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76334925 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" + } case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" --- End diff -- Ok, lets do that. I created SPARK-17246 to track this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76332385 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" + } case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" --- End diff -- BigDecimal literals are a good idea. Given that there are multiple overlapping / complimentary approaches here, I think we should fork this discussion and defer any decimal changes to a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76326911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" + } case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" --- End diff -- A third option would be to add support for Hive's BigDecimal literals. Any number ending with a `BD` would be treated as a `BigDecimal` literal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76313519 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" + } case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" --- End diff -- According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-FloatingPointTypes: > Floating point literals are assumed to be DOUBLE. Scientific notation is not yet supported. However, the professed lack of support for scientific notation seems to be contradicted by https://issues.apache.org/jira/browse/HIVE-2536 and manual tests. Here's a test query which demonstrates the precision issues in decimal literals: ``` SELECT CAST(-0.06688467811848818630 as DECIMAL(38, 36)), CAST(-6.688467811848818630E-18 AS DECIMAL(38, 36)) ``` In Hive, these both behave equivalently: both forms of the number are interpreted as double so we lose precision and both cases wind up as `0.06688467811848818` (with the final three digits lost). In Spark 2.0, the first expanded form is parsed as a decimal literal, while the scientific notation form is parsed as a double, so the expanded form correctly preserves the decimal while the scientific notation causes precision loss (as in Hive). I think there's two possible fixes here: we could either emit the fully-expanded form or could update Spark's parser to treat scientific notation floating point literals as decimals. From a consistency point, I'm in favor of the latter approach because I don't think it makes sense for `1.1` and `1.1e0` to be treated differently. Given all of this, I think that it would certainly be _safe_ to emit fully-expanded forms of the decimal but I'm not sure if this is the optimal fix because it doesn't resolve inconsistencies between Spark and Hive and results in really ugly, hard-to-read expressions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76305691 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" + } case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" --- End diff -- Actually, let me go ahead and quickly confirm whether Hive will support full expansion... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76284138 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" + } case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" --- End diff -- Hmmm as discussed, that's going to look very ugly but might be more compatible with Postgres and won't be lossy for very precise decimals. I say that we defer to followup for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76240220 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" + } case (v: Decimal, t: DecimalType) => s"CAST($v AS ${t.sql})" --- End diff -- Should we also prevent that a Decimal gets dumped in scientific notation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76104663 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" --- End diff -- I think you'd have to use `CAST(x as DOUBLE PRECISION)`, but Spark doesn't seem to support `DOUBLE PRECISION` and neither does Hive (AFAIK). Postgres doesn't understand the `D` suffix and instead treats it as a column name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76100396 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" --- End diff -- What would PostgreSQL use? I don't think it would be bad to increase compatibility with PostgreSQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/14777#discussion_r76005098 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -251,8 +251,21 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression with case (v: Short, ShortType) => v + "S" case (v: Long, LongType) => v + "L" // Float type doesn't have a suffix -case (v: Float, FloatType) => s"CAST($v AS ${FloatType.sql})" -case (v: Double, DoubleType) => v + "D" +case (v: Float, FloatType) => + val castedValue = v match { +case _ if v.isNaN => "'NaN'" +case Float.PositiveInfinity => "'Infinity'" +case Float.NegativeInfinity => "'-Infinity'" +case _ => v + } + s"CAST($castedValue AS ${FloatType.sql})" +case (v: Double, DoubleType) => + v match { +case _ if v.isNaN => s"CAST('NaN' AS ${DoubleType.sql})" +case Double.PositiveInfinity => s"CAST('Infinity' AS ${DoubleType.sql})" +case Double.NegativeInfinity => s"CAST('-Infinity' AS ${DoubleType.sql})" +case _ => v + "D" --- End diff -- As in the original code, this is intended to work with Spark / Hive; Postgres would use a slightly different form. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14777: [SPARK-17205] Literal.sql should handle Infinity ...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/14777 [SPARK-17205] Literal.sql should handle Infinity and NaN This patch updates `Literal.sql` to properly generate SQL for `NaN` and `Infinity` float and double literals: these special values need to be handled separately from regular values. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-17205 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14777.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14777 commit 26e036af512e7e21a1365cdf665cb5e9dca39c66 Author: Josh Rosen Date: 2016-08-24T00:44:36Z [SPARK-17205] Literal.sql should handle Infinity and NaN --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org