Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193886200 --- Diff: python/pyspark/sql/streaming.py --- @@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None, continuous=None): self._jwrite = self._jwrite.trigger(jTrigger) return self + def foreach(self, f): + """ + Sets the output of the streaming query to be processed using the provided writer ``f``. + This is often used to write the output of a streaming query to arbitrary storage systems. + The processing logic can be specified in two ways. + + #. A **function** that takes a row as input. --- End diff -- Python APIs anyways have slightly divergences from Scala/Java APIs in order to provide better experiences for Python users. For example, `StreamingQuery.lastProgress` returns an object of type `StreamingQueryProgress` in Java/Scala but returns a dict in python. Because python users are more used to dealing with dicts, and java/scala users (typed language) are more comfortable with structures). Similarly, in DataFrame.select, you can refer to columns in scala using `$"columnName"` but in python, you can refer to it as `df.columnName`. If we strictly adhere to pure consistency, then we cannot make it convenient for users in different languages. And ultimately convenience is what matters for the user experience. So its okay to have a superset of supported types in python compared to java/scala. Personally, I think we should also add the lambda variant to Scala as well. But that decision for Scala is independent of this PR as there is enough justification for add the lambda variant for
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org