[storm] 01/05: Update Common-patterns.md

rzo1 Fri, 25 Aug 2023 03:23:42 -0700

This is an automated email from the ASF dual-hosted git repository.

rzo1 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/storm.git


commit c7cb4139a60c58232e61deaf6c1870d0cbded999
Author: PoojaChandak <[email protected]>
AuthorDate: Tue Sep 15 09:54:42 2020 +0530

    Update Common-patterns.md
    
    typo/grammatical changes
---
 docs/Common-patterns.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/Common-patterns.md b/docs/Common-patterns.md
index e0e8c1f66..92c8c0075 100644
--- a/docs/Common-patterns.md
+++ b/docs/Common-patterns.md
@@ -39,11 +39,11 @@ builder.setBolt("expand", new ExpandUrl(), parallelism)
   .fieldsGrouping("urls", new Fields("url"));
 ```
 
-The second approach will have vastly more effective caches, since the same URL 
will always go to the same task. This avoids having duplication across any of 
the caches in the tasks and makes it much more likely that a short URL will hit 
the cache.
+The second approach will have vastly more effective caches since the same URL 
will always go to the same task. This avoids having duplication across any of 
the caches in the tasks and makes it much more likely that a short URL will hit 
the cache.
 
 ### Streaming top N
 
-A common continuous computation done on Storm is a "streaming top N" of some 
sort. Suppose you have a bolt that emits tuples of the form ["value", "count"] 
and you want a bolt that emits the top N tuples based on count. The simplest 
way to do this is to have a bolt that does a global grouping on the stream and 
maintains a list in memory of the top N items.
+A common continuous computation done on Storm is a "streaming top N" of some 
sort. Suppose you have a bolt that emits tuples of the form ["value", "count"] 
and you want a bolt that emits the top N tuples based on the count. The 
simplest way to do this is to have a bolt that does a global grouping on the 
stream and maintains a list in memory of the top N items.
 
 This approach obviously doesn't scale to large streams since the entire stream 
has to go through one task. A better way to do the computation is to do many 
top N's in parallel across partitions of the stream, and then merge those top 
N's together to get the global top N. The pattern looks like this:
 
@@ -56,7 +56,7 @@ builder.setBolt("merge", new MergeObjects())
 
 This pattern works because of the fields grouping done by the first bolt which 
gives the partitioning you need for this to be semantically correct. You can 
see an example of this pattern in storm-starter 
[here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/RollingTopWords.java).
 
-If however you have a known skew in the data being processed it can be 
advantageous to use partialKeyGrouping instead of fieldsGrouping.  This will 
distribute the load for each key between two downstream bolts instead of a 
single one.
+If however, you have a known skew in the data being processed it can be 
advantageous to use partialKeyGrouping instead of fieldsGrouping.  This will 
distribute the load for each key between two downstream bolts instead of a 
single one.
 
 ```java
 builder.setBolt("count", new CountObjects(), parallelism)
@@ -67,11 +67,11 @@ builder.setBolt("merge", new MergeRanksObjects())
   .globalGrouping("rank");
 ``` 
 
-The topology needs an extra layer of processing to aggregate the partial 
counts from the upstream bolts but this only processes aggregated values now so 
the bolt it is not subject to the load caused by the skewed data. You can see 
an example of this pattern in storm-starter 
[here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/SkewedRollingTopWords.java).
+The topology needs an extra layer of processing to aggregate the partial 
counts from the upstream bolts but this only processes aggregated values now so 
the bolt is not subject to the load caused by the skewed data. You can see an 
example of this pattern in storm-starter 
[here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/SkewedRollingTopWords.java).
 
 ### TimeCacheMap for efficiently keeping a cache of things that have been 
recently updated
 
-You sometimes want to keep a cache in memory of items that have been recently 
"active" and have items that have been inactive for some time be automatically 
expires. [TimeCacheMap](javadocs/org/apache/storm/utils/TimeCacheMap.html) is 
an efficient data structure for doing this and provides hooks so you can insert 
callbacks whenever an item is expired.
+You sometimes want to keep a cache in memory of items that have been recently 
"active" and have items that have been inactive for some time automatically 
expire. [TimeCacheMap](javadocs/org/apache/storm/utils/TimeCacheMap.html) is an 
efficient data structure for doing this and provides hooks so you can insert 
callbacks whenever an item is expired.
 
 ### CoordinatedBolt and KeyedFairBolt for Distributed RPC

[storm] 01/05: Update Common-patterns.md

Reply via email to