[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

tdas Wed, 06 Jun 2018 22:45:10 -0700

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21477#discussion_r193631209
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None, 
continuous=None):
             self._jwrite = self._jwrite.trigger(jTrigger)
             return self
     
    +    def foreach(self, f):
    +        """
    +        Sets the output of the streaming query to be processed using the 
provided writer ``f``.
    +        This is often used to write the output of a streaming query to 
arbitrary storage systems.
    +        The processing logic can be specified in two ways.
    +
    +        #. A **function** that takes a row as input.
    +            This is a simple way to express your processing logic. Note 
that this does
    +            not allow you to deduplicate generated data when failures 
cause reprocessing of
    +            some input data. That would require you to specify the 
processing logic in the next
    +            way.
    +
    +        #. An **object** with a ``process`` method and optional ``open`` 
and ``close`` methods.
    --- End diff --
    
    I discussed this with @marmbrus . If there is a ForeachWriter class in 
python, then uses will have to additionally import it. That's just another 
overhead that can be avoided by just allowing any class with the appropriate 
methods. One less step for python users.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

Reply via email to