[GitHub] storm pull request: STORM-1464: Support multiple file outputs

dossett Tue, 15 Mar 2016 14:07:53 -0700

Github user dossett commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1044#discussion_r56240831
  
    --- Diff: external/storm-hdfs/README.md ---
    @@ -240,6 +240,23 @@ If you are using Trident and sequence files you can do 
something like this:
                     .addRotationAction(new 
MoveFileAction().withDestination("/dest2/"));
     ```
     
    +### Data Partitioning
    +Data can be partitioned to different HDFS directories based on 
characteristics of the tuple being processed or purely
    +external factors, such as system time.  To partition your your data, write 
a class that implements the ```Partitioner```
    +interface and pass it to the withPartitioner() method of your bolt. The 
getPartitionPath() method returns a partition 
    +path for a given tuple.
    +
    +Here's an example of a Partitioner that operates on a specific field of 
data:
    +
    +```java
    +
    +    Partitioner partitoner = new Partitioner() {
    +            @Override
    +            public String getPartitionPath(Tuple tuple) {
    +                return Path.SEPARATOR + "city=" + 
tuple.getStringByField("city");
    --- End diff --
    
    The "city=" was from a Hive-specific use case I was working on.  The 
partitioner would return "city=<name>" to create a partitioning scheme that 
would be usable for a Hive external table.  I will remove that from the 
standard documentation and maybe add a hive-specific note.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: STORM-1464: Support multiple file outputs

Reply via email to