GideonPotok commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1777467063
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ########## @@ -86,6 +91,53 @@ case class Mode( buffer } + private def getCollationAwareBuffer( + childDataType: DataType, + buffer: OpenHashMap[AnyRef, Long]): Iterable[(AnyRef, Long)] = { + def groupAndReduceBuffer(groupingFunction: AnyRef => _): Iterable[(AnyRef, Long)] = { + buffer.groupMapReduce(t => + groupingFunction(t._1))(x => x)((x, y) => (x._1, x._2 + y._2)).values + } + def determineBufferingFunction( + childDataType: DataType): Option[AnyRef => _] = { + childDataType match { + case _ if UnsafeRowUtils.isBinaryStable(child.dataType) => None + case _ => Some(collationAwareTransform(_, childDataType)) + } + } + determineBufferingFunction(childDataType).map(groupAndReduceBuffer).getOrElse(buffer) + } + + private def collationAwareTransform(data: AnyRef, dataType: DataType): AnyRef = { + dataType match { + case _ if UnsafeRowUtils.isBinaryStable(dataType) => data + case st: StructType => + processStructTypeWithBuffer(data.asInstanceOf[InternalRow].toSeq(st).zip(st.fields)) + case at: ArrayType => processArrayTypeWithBuffer(at, data.asInstanceOf[ArrayData]) + case st: StringType => + CollationFactory.getCollationKey(data.asInstanceOf[UTF8String], st.collationId) + case _ => + throw new SparkUnsupportedOperationException( + "UNSUPPORTED_MODE_DATA_TYPE", Review Comment: @MaxGekk @uros-db I am having a lot of trouble with this one! I implemented it as COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.NO_INPUT, and plan to create a different subclass that represents this situation (Maybe BAD_INPUT), once I just have it working. But When I throw the following: ``` SparkUnsupportedOperationException( errorClass = "COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.NO_INPUT", ``` I end up getting: ``` // org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot // find sub error class 'COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.NO_INPUT' SQLSTATE: XX000 ``` Yet if I do ``` SparkUnsupportedOperationException( errorClass = "COMPLEX_EXPRESSION_UNSUPPORTED_INPUT", ``` Then `org.apache.spark.ErrorClassesJsonReader#getMessageTemplate` fails during the assertion `assert(errorInfo.subClass.isDefined == subErrorClass.isDefined)` As the subClass is missing. Would either of you be able to tell me if this is the right pattern (eg. maybe it should be ComplexExpressionException("BAD_INPUT"), not sure what is the "old" pattern and which is the preferred one). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org