dtenedor opened a new pull request, #43204: URL: https://github.com/apache/spark/pull/43204
### What changes were proposed in this pull request? This PR adds a Python UDTF API for 'analyze' to return a buffer to consume on each class creation. * The `AnalyzeResult` class now contains a new string field `prepare_buffer`. * If assigned to a non-empty value, the UDTF should have another method `prepare` that accepts the string argument, which will get called after `__init__` when the class is created. * The format of the buffer is opaque and known only to the UDTF. Common use-cases including serializing protocol buffers or JSON objects into the buffer in order to help organize the contents therein. For example, this UDTF accepts a constant scalar string argument, then assigns this value to the buffer. ``` @udtf class TestUDTF: def __init__(self): self._total = 0 self._buffer = None @staticmethod def analyze(argument, _): return AnalyzeResult( schema=StructType().add("total", IntegerType()).add("buffer", StringType()), prepare_buffer=argument.value, with_single_partition=True) def prepare(self, buffer): self._buffer = buffer self._total = len(buffer) def eval(self, argument, row: Row): self._total += 1 def terminate(self): yield self._total, self._buffer ``` ### Why are the changes needed? In this way, the UDTF can perform potentially expensive initialization logic in the `analyze` method just once and result the result of such initialization rather than repeating the initialization in `eval`. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds new unit test coverage. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org