It shouldn't, but that'll help me recreate it. Thanks! On Wednesday, February 20, 2013, Mike Barretta <[email protected]> wrote: > just once - each of the parallelDo's happens within the run() of my Tool, and I kick it off with the pipeline.done() vs pipeline.run() - any difference there? > > On Wed, Feb 20, 2013 at 3:25 PM, Josh Wills <[email protected]> wrote: > > Ah, okay. I just got on a train, so I'll have to do a bit of local debugging. > > Curious: are you explicitly calling run() between each of these jobs, or just once after they've all been defined? > > On Wednesday, February 20, 2013, Mike Barretta <[email protected]> wrote: >> okay, well, things turned for the worse quickly :) >> Following the same output above, the following jobs were created: >> 13/02/20 19:25:26 INFO exec.CrunchJob: Running job "com.digitalreasoning.petal.extract.SynthesysKBExtractor: SeqFile(/Synthesys/MessageData)+[[S1+Text(/Synthesys/export/Contexts)]/[S0+Text(/Synthesys/export/MessageData)]/[S2+Text(/Synthesys/export/ContextualElements)]]" >> 13/02/20 19:25:26 INFO exec.CrunchJob: Job status available at: <snip> >> 13/02/20 19:25:28 INFO input.FileInputFormat: Total input paths to process : 40 >> 13/02/20 19:25:29 INFO exec.CrunchJob: Running job "com.digitalreasoning.petal.extract.SynthesysKBExtractor: SeqFile(/Synthesys/ElementData)+S5+Text(/Synthesys/export/ElementData)" >> 13/02/20 19:25:29 INFO exec.CrunchJob: Job status available at: <snip> >> 13/02/20 19:25:32 INFO input.FileInputFormat: Total input paths to process : 40 >> 13/02/20 19:25:32 INFO exec.CrunchJob: Running job "com.digitalreasoning.petal.extract.SynthesysKBExtractor: SeqFile(/Synthesys/RelationshipData)+S3+Text(/Synthesys/export/RelationshipData)" >> notice that the first (MessageData) shows all three output paths while the last (RelationshipData) only shows one. This is despite the previous log messages showing: >> 13/02/20 19:25:04 INFO extract.SynthesysKBExtractor: reading [RelationshipData] >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/export/RelationshipData >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/export/RelationshipStructures >> *forgive the mismatched paths between this email and my previous - am shorting for brevity, and trying to convey the difference between input and export paths >> >> On Wed, Feb 20, 2013 at 2:30 PM, Mike Barretta <[email protected]> wrote: >> >> Was using a very early 0.5.0-incubating build, with hadoop 0.20.2, but just did a fresh git pull and now with 0.6.0-incubating, things look better (MessageData and RelationshipData are my parents with children): >> 13/02/20 19:25:04 INFO extract.SynthesysKBExtractor: reading [MessageData] >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/MessageData >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/Contexts >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/ContextualElements >> 13/02/20 19:25:04 INFO extract.SynthesysKBExtractor: reading [RelationshipData] >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/RelationshipData >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/RelationshipStructures >> 13/02/20 19:25:04 INFO extract.SynthesysKBExtractor: reading [ElementData] >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/ElementData >> 13/02/20 19:25:04 INFO extract.SynthesysKBExtractor: reading [ConceptData] >> 13/02/20 19:25:04 INFO impl.FileTargetImpl: Will write output files to new path: /Synthesys/ConceptData >> I'll try a few more times and let you know if anything funky happens. >> Thanks, as always, for your prompt responses, >> Mike >> >> On Wed, Feb 20, 2013 at 1:06 PM, Josh Wills <[email protected]> wrote: >> >> Hey Mike, >> I can't replicate this problem using the MultipleOutputIT (which I think we added as a test for this problem.) Which version of Crunch and Hadoop are you using? The 0.5.0-incubating release should be up on the maven repos if you want to try that out. >> J >> >> On Wed, Feb 20, 2013 at 6:43 AM, Josh Wills <jwills
-- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
