[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

tdas Thu, 07 Jun 2018 13:55:16 -0700

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21477#discussion_r193886200
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None, 
continuous=None):
             self._jwrite = self._jwrite.trigger(jTrigger)
             return self
     
    +    def foreach(self, f):
    +        """
    +        Sets the output of the streaming query to be processed using the 
provided writer ``f``.
    +        This is often used to write the output of a streaming query to 
arbitrary storage systems.
    +        The processing logic can be specified in two ways.
    +
    +        #. A **function** that takes a row as input.
    --- End diff --
    
    Python APIs anyways have slightly divergences from Scala/Java APIs in order 
to provide better experiences for Python users. For example, 
`StreamingQuery.lastProgress` returns an object of type 
`StreamingQueryProgress` in Java/Scala but returns a dict in python. Because 
python users are more used to dealing with dicts, and java/scala users (typed 
language) are more comfortable with structures). Similarly, in 
DataFrame.select, you can refer to columns in scala using `$"columnName"` but 
in python, you can refer to it as `df.columnName`. If we strictly adhere to 
pure consistency, then we cannot make it convenient for users in different 
languages. And ultimately convenience is what matters for the user experience. 
So its okay to have a superset of supported types in python compared to 
java/scala. 
    
    Personally, I think we should also add the lambda variant to Scala as well. 
But that decision for Scala is independent of this PR as there is enough 
justification for add the lambda variant for



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

Reply via email to