[ 
https://issues.apache.org/jira/browse/SAMZA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488250#comment-15488250
 ] 

Hai commented on SAMZA-967:
---------------------------

> In this case there is no ordering among these files. Let's imaging, instead 
> of writing to HDFS, we write to Kafka, then you also have no ordering within 
> the samza topic partition when the events are coming from different upstream 
> producers.

>> Ok. Let's say my repartitioner writes to a partition directory. If there is 
>> no implicit ordering defined in the output itself, how does a downstream 
>> HDFS consumer guarantee deterministic consumption? That is what I am not 
>> clear about.

>>> You brought up a good point. There is no guarantee for deterministic 
>>> consumption if repartitioning happens. But I think my point is that we are 
>>> not able to solve this problem for Kafka either. Let's say we do 
>>> repartitioning for a job that reads from Kafka and writes to Kafka, how do 
>>> you guarantee consistent result, now? Well, you could argue that 
>>> deterministic repartitioning result is not needed in the case of Kafka - a 
>>> stream processing job, but is relevant in HDFS - essentially a batch 
>>> processing job. I have to admit that I don't have a good solution to your 
>>> question as of now:( 

> Add HDFS system consumer to Samza
> ---------------------------------
>
>                 Key: SAMZA-967
>                 URL: https://issues.apache.org/jira/browse/SAMZA-967
>             Project: Samza
>          Issue Type: Sub-task
>            Reporter: Hai
>            Assignee: Hai
>             Fix For: 0.12.0
>
>         Attachments: HDFSSystemConsumer.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to