Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/3640#issuecomment-66251380 Appreciate a lot for fixing this case! The serialization wrapper class makes sense. However, would like to make some refactoring. A summary of my comments above: 1. Renaming `HiveFunctionCache` to `HiveFunctionWrapper` 2. Moving `HiveFunctionWrapper` to the shim layer, and keep Hive 0.12.0 code path exactly the same as before. Considering Hive 0.12.0 is not affected by this issue, and 1.2.0 release is really close, I'd like to lower the possibility of breaking 0.12.0 code paths as much as possible. 3. Add a `HiveSimpleUdfWrapper`, which inherits from `HiveFunctionWrapper`. As you've mentioned in the code, `HiveSimpleUdf` is a special case, it shouldn't be serialized, and should always create a fresh object on executor side. Currently this special case complicates `HiveFunctionWrapper` implementation and makes it somewhat confusing. Defining a special subclass for `HvieSimpleUdf` helps making `HiveFunctionWrapper` simpler (e.g. no need to serialize the boolean flag any more). 4. Avoid using `Utilities.de/serializePlan` by mimicking `Utilities.de/serializeObjectByKryo` in the `read/writeExternal` methods of the wrapper class, so that we can remove the expensive `HiveConf` instantiation. In a word, we can add two classes, `HiveFunctionWrapper` and `HiveSImpleUdfWrapper` into the shim layer, and make sure that the 0.12.0 version behaves exactly the same as before.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org