Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/19439 @hhbyyh regarding the data representation, one could indeed have the each of the representations being encoded with the proper array information. This brings some additional complexity for the complex UDFs though, because they need to select the proper field, and the target implementations in C++ or tensorflow already can cast the field to the proper type. I suggest we keep bytes[] for now and see if there is a need to have a more refined representations. For the `origin` field, @dakirsa or @imatiach-msft should have more context.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org