Github user dossett commented on a diff in the pull request: https://github.com/apache/storm/pull/1044#discussion_r56241176 --- Diff: external/storm-hdfs/README.md --- @@ -240,6 +240,23 @@ If you are using Trident and sequence files you can do something like this: .addRotationAction(new MoveFileAction().withDestination("/dest2/")); ``` +### Data Partitioning +Data can be partitioned to different HDFS directories based on characteristics of the tuple being processed or purely +external factors, such as system time. To partition your your data, write a class that implements the ```Partitioner``` +interface and pass it to the withPartitioner() method of your bolt. The getPartitionPath() method returns a partition +path for a given tuple. + +Here's an example of a Partitioner that operates on a specific field of data: + +```java + + Partitioner partitoner = new Partitioner() { + @Override + public String getPartitionPath(Tuple tuple) { + return Path.SEPARATOR + "city=" + tuple.getStringByField("city"); --- End diff -- I thought about having Partitioner returning an actual path but decided against it for two reasons: - I liked the idea of the "partition" being solely a function of the tuple without reference to anything else - Since end users implement a Partitioner having it return a complete path would give the user access to details otherwise hidden from their code.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---