[ https://issues.apache.org/jira/browse/SPARK-42579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697216#comment-17697216 ]
Yang Jie edited comment on SPARK-42579 at 3/7/23 3:09 AM: ---------------------------------------------------------- [~hvanhovell] In fact, if we compare `org.apache.spark.sql.functions#lit` in sql module, I think this function has been completed. [https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L116-L126] {code:java} def lit(literal: Any): Column = literal match { case c: Column => c case s: Symbol => new ColumnName(s.name) case _ => // This is different from `typedlit`. `typedlit` calls `Literal.create` to use // `ScalaReflection` to get the type of `literal`. However, since we use `Any` in this method, // `typedLit[Any](literal)` will always fail and fallback to `Literal.apply`. Hence, we can // just manually call `Literal.apply` to skip the expensive `ScalaReflection` code. This is // significantly better when there are many threads calling `lit` concurrently. Column(Literal(literal)) } {code} [https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L64-L102] {code:java} def apply(v: Any): Literal = v match { case i: Int => Literal(i, IntegerType) case l: Long => Literal(l, LongType) case d: Double => Literal(d, DoubleType) case f: Float => Literal(f, FloatType) case b: Byte => Literal(b, ByteType) case s: Short => Literal(s, ShortType) case s: String => Literal(UTF8String.fromString(s), StringType) case s: UTF8String => Literal(s, StringType) case c: Char => Literal(UTF8String.fromString(c.toString), StringType) case ac: Array[Char] => Literal(UTF8String.fromString(String.valueOf(ac)), StringType) case b: Boolean => Literal(b, BooleanType) case d: BigDecimal => val decimal = Decimal(d) Literal(decimal, DecimalType.fromDecimal(decimal)) case d: JavaBigDecimal => val decimal = Decimal(d) Literal(decimal, DecimalType.fromDecimal(decimal)) case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), d.scale)) case i: Instant => Literal(instantToMicros(i), TimestampType) case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType) case l: LocalDateTime => Literal(DateTimeUtils.localDateTimeToMicros(l), TimestampNTZType) case ld: LocalDate => Literal(ld.toEpochDay.toInt, DateType) case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType) case d: Duration => Literal(durationToMicros(d), DayTimeIntervalType()) case p: Period => Literal(periodToMonths(p), YearMonthIntervalType()) case a: Array[Byte] => Literal(a, BinaryType) case a: collection.mutable.WrappedArray[_] => apply(a.array) case a: Array[_] => val elementType = componentTypeToDataType(a.getClass.getComponentType()) val dataType = ArrayType(elementType) val convert = CatalystTypeConverters.createToCatalystConverter(dataType) Literal(convert(a), dataType) case i: CalendarInterval => Literal(i, CalendarIntervalType) case null => Literal(null, NullType) case v: Literal => v case _ => throw QueryExecutionErrors.literalTypeUnsupportedError(v) } {code} For nested dataType, `org.apache.spark.sql.functions#lit` only supports Array[_], For example, run {code:java} val column = org.apache.spark.sql.functions.lit(Map(1 -> "v1", 2 -> "v2")) {code} There will be following errors: {code:java} [UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for 'Map(1 -> v1, 2 -> v2)' of class scala.collection.immutable.Map$Map2. org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for 'Map(1 -> v1, 2 -> v2)' of class scala.collection.immutable.Map$Map2. at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:327) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101) at org.apache.spark.sql.functions$.lit(functions.scala:125) {code} and from the code comments, for other types, we should use `typedlit` instead of `lit` was (Author: luciferyang): [~hvanhovell] In fact, if we compare `org.apache.spark.sql.functions#lit` in sql module, I think this function has been completed. [https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L116-L126] {code:java} def lit(literal: Any): Column = literal match { case c: Column => c case s: Symbol => new ColumnName(s.name) case _ => // This is different from `typedlit`. `typedlit` calls `Literal.create` to use // `ScalaReflection` to get the type of `literal`. However, since we use `Any` in this method, // `typedLit[Any](literal)` will always fail and fallback to `Literal.apply`. Hence, we can // just manually call `Literal.apply` to skip the expensive `ScalaReflection` code. This is // significantly better when there are many threads calling `lit` concurrently. Column(Literal(literal)) } {code} [https://github.com/apache/spark/blob/201e08c03a31c763e3120540ac1b1ca8ef252e6b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L64-L102] {code:java} def apply(v: Any): Literal = v match { case i: Int => Literal(i, IntegerType) case l: Long => Literal(l, LongType) case d: Double => Literal(d, DoubleType) case f: Float => Literal(f, FloatType) case b: Byte => Literal(b, ByteType) case s: Short => Literal(s, ShortType) case s: String => Literal(UTF8String.fromString(s), StringType) case s: UTF8String => Literal(s, StringType) case c: Char => Literal(UTF8String.fromString(c.toString), StringType) case ac: Array[Char] => Literal(UTF8String.fromString(String.valueOf(ac)), StringType) case b: Boolean => Literal(b, BooleanType) case d: BigDecimal => val decimal = Decimal(d) Literal(decimal, DecimalType.fromDecimal(decimal)) case d: JavaBigDecimal => val decimal = Decimal(d) Literal(decimal, DecimalType.fromDecimal(decimal)) case d: Decimal => Literal(d, DecimalType(Math.max(d.precision, d.scale), d.scale)) case i: Instant => Literal(instantToMicros(i), TimestampType) case t: Timestamp => Literal(DateTimeUtils.fromJavaTimestamp(t), TimestampType) case l: LocalDateTime => Literal(DateTimeUtils.localDateTimeToMicros(l), TimestampNTZType) case ld: LocalDate => Literal(ld.toEpochDay.toInt, DateType) case d: Date => Literal(DateTimeUtils.fromJavaDate(d), DateType) case d: Duration => Literal(durationToMicros(d), DayTimeIntervalType()) case p: Period => Literal(periodToMonths(p), YearMonthIntervalType()) case a: Array[Byte] => Literal(a, BinaryType) case a: collection.mutable.WrappedArray[_] => apply(a.array) case a: Array[_] => val elementType = componentTypeToDataType(a.getClass.getComponentType()) val dataType = ArrayType(elementType) val convert = CatalystTypeConverters.createToCatalystConverter(dataType) Literal(convert(a), dataType) case i: CalendarInterval => Literal(i, CalendarIntervalType) case null => Literal(null, NullType) case v: Literal => v case _ => throw QueryExecutionErrors.literalTypeUnsupportedError(v) } {code} For nested structures, `org.apache.spark.sql.functions#lit` only supports Array[_], For example, run {code:java} val column = org.apache.spark.sql.functions.lit(Map(1 -> "v1", 2 -> "v2")) {code} There will be following errors: {code:java} [UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for 'Map(1 -> v1, 2 -> v2)' of class scala.collection.immutable.Map$Map2. org.apache.spark.SparkRuntimeException: [UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for 'Map(1 -> v1, 2 -> v2)' of class scala.collection.immutable.Map$Map2. at org.apache.spark.sql.errors.QueryExecutionErrors$.literalTypeUnsupportedError(QueryExecutionErrors.scala:327) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:101) at org.apache.spark.sql.functions$.lit(functions.scala:125) {code} and from the code comments, for other types, we should use `typedlit` instead of `lit` > Extend function.lit() to match Literal.apply() > ---------------------------------------------- > > Key: SPARK-42579 > URL: https://issues.apache.org/jira/browse/SPARK-42579 > Project: Spark > Issue Type: New Feature > Components: Connect > Affects Versions: 3.4.0 > Reporter: Herman van Hövell > Priority: Major > Attachments: image-2023-03-07-11-00-39-429.png, > image-2023-03-07-11-00-49-379.png > > > function.lit should support the same conversions as the original. > This requires an addition to the connect protocol, since it does not support > nested type literals at the moment. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org