[spark] branch branch-3.2 updated: [MINOR][SS][DOCS] Point to correct examples of Arbitrary Stateful Operations

viirya Thu, 28 Oct 2021 09:24:53 -0700

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new 9bfc5b1  [MINOR][SS][DOCS] Point to correct examples of Arbitrary 
Stateful Operations
9bfc5b1 is described below

commit 9bfc5b14c9b0fbb50dd537a509f2e094e1c5779e
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Thu Oct 28 09:22:42 2021 -0700

    [MINOR][SS][DOCS] Point to correct examples of Arbitrary Stateful Operations
    
    ### What changes were proposed in this pull request?
    
    This fixes incorrect example links in Structured Streaming Programming 
Guide.
    
    ### Why are the changes needed?
    
    StructuredSessionization.scala and JavaStructuredSessionization.java are 
now using session window expression, not `flatMapGroupsWithState`. The section 
talks about arbitrary stateful operations and should point to another examples.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Doc change only.
    
    Closes #34408 from viirya/fix-ss-doc.
    
    Authored-by: Liang-Chi Hsieh <[email protected]>
    Signed-off-by: Liang-Chi Hsieh <[email protected]>
    (cherry picked from commit 5b2bbcef6854c495c32b37e383dd5f1f6ce23dd4)
    Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
 docs/structured-streaming-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 18dfbec..4642d44 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1806,7 +1806,7 @@ However, as a side effect, data from the slower streams 
will be aggressively dro
 this configuration judiciously.
 
 ### Arbitrary Stateful Operations
-Many usecases require more advanced stateful operations than aggregations. For 
example, in many usecases, you have to track sessions from data streams of 
events. For doing such sessionization, you will have to save arbitrary types of 
data as state, and perform arbitrary operations on the state using the data 
stream events in every trigger. Since Spark 2.2, this can be done using the 
operation `mapGroupsWithState` and the more powerful operation 
`flatMapGroupsWithState`. Both operations a [...]
+Many usecases require more advanced stateful operations than aggregations. For 
example, in many usecases, you have to track sessions from data streams of 
events. For doing such sessionization, you will have to save arbitrary types of 
data as state, and perform arbitrary operations on the state using the data 
stream events in every trigger. Since Spark 2.2, this can be done using the 
operation `mapGroupsWithState` and the more powerful operation 
`flatMapGroupsWithState`. Both operations a [...]
 
 Though Spark cannot check and force it, the state function should be 
implemented with respect to the semantics of the output mode. For example, in 
Update mode Spark doesn't expect that the state function will emit rows which 
are older than current watermark plus allowed late record delay, whereas in 
Append mode the state function can emit these rows.
 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.2 updated: [MINOR][SS][DOCS] Point to correct examples of Arbitrary Stateful Operations

Reply via email to