[
https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688246#comment-15688246
]
Ofir Manor commented on SPARK-18475:
------------------------------------
Cody, for me your are the main gatekeeper for everything Kafka and the main
Kafka expert, so I wanted your perspective, not Michael's (except the generic
"order" guarantee, which I still think does not exist).
I thought that if someone did the effort of building, testing and trying to
contribute it, it is an indication that it hurts in the real world, especially
when you said it is a repeated request. I guess in many places, getting a read
access to a potentially huge, shared topic is not the same as having Kafka
admin rights or being the only or main consumer or being able to easily fix bad
past decisions around partitions and keys...
Anyway, it is totally up to you, you'll have to maintain it. I personally have
no use for this feature.
> Be able to provide higher parallelization for StructuredStreaming Kafka Source
> ------------------------------------------------------------------------------
>
> Key: SPARK-18475
> URL: https://issues.apache.org/jira/browse/SPARK-18475
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 2.0.2, 2.1.0
> Reporter: Burak Yavuz
>
> Right now the StructuredStreaming Kafka Source creates as many Spark tasks as
> there are TopicPartitions that we're going to read from Kafka.
> This doesn't work well when we have data skew, and there is no reason why we
> shouldn't be able to increase parallelism further, i.e. have multiple Spark
> tasks reading from the same Kafka TopicPartition.
> What this will mean is that we won't be able to use the "CachedKafkaConsumer"
> for what it is defined for (being cached) in this use case, but the extra
> overhead is worth handling data skew and increasing parallelism especially in
> ETL use cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]