Matthew Basil created CRUNCH-337:
------------------------------------
Summary: Make it easier to use multiple input paths
Key: CRUNCH-337
URL: https://issues.apache.org/jira/browse/CRUNCH-337
Project: Crunch
Issue Type: Improvement
Components: Core
Affects Versions: 0.9.0
Reporter: Matthew Basil
Assignee: Josh Wills
Priority: Minor
It would be more user-friendly, especially for newbies, to provides methods on
{{From}} for creating sources from multiple {{Path}}s. I'm currently
attempting to write my first Crunch Pipeline, which needs to read from multiple
paths using a custom input format, and I needed to dig into the source for
{{From.formattedFile}} to see I need to do something like this
{code}
PTableType<K, V> tableType = keyType.getFamily().tableOf(keyType, valueType);
return new FileTableSourceImpl<K, V>(paths, tableType, formatClass);
{code}
I don't particularly mind, but other potential new users might be a bit put off
by having to look at the source on the first line of their first pipeline. If
it's undesirable to double the number of methods in {{From}} by doing this
(which is understandable), it might be nice to add some note on multiple input
paths to the section of the users guide on {{Source}}s.
Thanks!
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)