[ 
https://issues.apache.org/jira/browse/STORM-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201176#comment-15201176
 ] 

ASF GitHub Bot commented on STORM-1464:
---------------------------------------

Github user arunmahadevan commented on the pull request:

    https://github.com/apache/storm/pull/1044#issuecomment-198255363
  
    @dossett I went through the patch again and I have one question regarding 
the writer key. I see that you maintain a separate writer per writer key. In 
the docs you mention "The avro bolt will write records to separate files based 
on the schema of the record being processed.  In other words, if the bolt 
receives records with two different schemas, it will write to two separate 
files."
    
    Is each writer expected to write to a separate file ? If so I dont see that 
happening because in `getBasePathForNextFile` the file name is based on the 
partition path and rotation id alone and then this path is passed to 
`makeNewWriter`. So there could be multiple writers writing to the same file 
and infact at overlapping offsets based on that writer's offset and may be 
corrupt the file. Can you help me understand if each writer can write to the 
same file or is it always supposed to write to different files?
    



> storm-hdfs should support writing to multiple files
> ---------------------------------------------------
>
>                 Key: STORM-1464
>                 URL: https://issues.apache.org/jira/browse/STORM-1464
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-hdfs
>            Reporter: Aaron Dossett
>            Assignee: Aaron Dossett
>              Labels: avro
>
> Examples of when this is needed include:
> - One avro bolt writing multiple schemas, each of which require a different 
> file. Schema evolution is a common use of avro and the avro bolt should 
> support that seamlessly.
> - Partitioning output to different directories based on the tuple contents.  
> For example, if the tuple contains a "USER" field, it should be possible to 
> partition based on that value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to