[spark] branch branch-2.4 updated: [MINOR][DOC][SS] Correct description of minPartitions in Kafka option

dongjoon Fri, 02 Aug 2019 09:14:11 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new dad1cd6  [MINOR][DOC][SS] Correct description of minPartitions in 
Kafka option
dad1cd6 is described below

commit dad1cd691fcb846f99818cab078808e07f560f6e
Author: Jungtaek Lim (HeartSaVioR) <kabh...@gmail.com>
AuthorDate: Fri Aug 2 09:12:54 2019 -0700

    [MINOR][DOC][SS] Correct description of minPartitions in Kafka option
    
    ## What changes were proposed in this pull request?
    
    `minPartitions` has been used as a hint and relevant method 
(KafkaOffsetRangeCalculator.getRanges) doesn't guarantee the behavior that 
partitions will be equal or more than given value.
    
    
https://github.com/apache/spark/blob/d67b98ea016e9b714bef68feaac108edd08159c9/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala#L32-L46
    
    This patch makes clear the configuration is a hint, and actual partitions 
could be less or more.
    
    ## How was this patch tested?
    
    Just a documentation change.
    
    Closes #25332 from 
HeartSaVioR/MINOR-correct-kafka-structured-streaming-doc-minpartition.
    
    Authored-by: Jungtaek Lim (HeartSaVioR) <kabh...@gmail.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
    (cherry picked from commit 7ffc00ccc37fc94a45b7241bb3c6a17736b55ba3)
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 docs/structured-streaming-kafka-integration.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/structured-streaming-kafka-integration.md 
b/docs/structured-streaming-kafka-integration.md
index 680fe78..1a3ee85 100644
--- a/docs/structured-streaming-kafka-integration.md
+++ b/docs/structured-streaming-kafka-integration.md
@@ -379,10 +379,12 @@ The following configurations are optional:
   <td>int</td>
   <td>none</td>
   <td>streaming and batch</td>
-  <td>Minimum number of partitions to read from Kafka.
+  <td>Desired minimum number of partitions to read from Kafka.
   By default, Spark has a 1-1 mapping of topicPartitions to Spark partitions 
consuming from Kafka.
   If you set this option to a value greater than your topicPartitions, Spark 
will divvy up large
-  Kafka partitions to smaller pieces.</td>
+  Kafka partitions to smaller pieces. Please note that this configuration is 
like a `hint`: the
+  number of Spark tasks will be **approximately** `minPartitions`. It can be 
less or more depending on
+  rounding errors or Kafka partitions that didn't receive any new data.</td>
 </tr>
 </table>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][DOC][SS] Correct description of minPartitions in Kafka option

Reply via email to