[ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054641#comment-13054641 ]
jirapos...@reviews.apache.org commented on MAPREDUCE-1347: ---------------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/953/#review910 ----------------------------------------------------------- I think your issue with the test case is that you're calling tMOF.getRecordWriter() inside the thread. So, each thread has its own record writer and hence has its own computing map. You should call that from outside the thread, and just have the threads do the writing. - Todd On 2011-06-24 19:06:46, Harsh J wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/953/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-06-24 19:06:46) bq. bq. bq. Review request for hadoop-mapreduce and Todd Lipcon. bq. bq. bq. Summary bq. ------- bq. bq. Used the makeComputingMap from Guava's MapMaker to provide a thread safe way of creating a RecordWriter cache. bq. bq. For some reason, the map is not really caching it and is instead trying to apply() over and over again for the same key-value pairs. bq. bq. bq. This addresses bug MAPREDUCE-1347. bq. http://issues.apache.org/jira/browse/MAPREDUCE-1347 bq. bq. bq. Diffs bq. ----- bq. bq. mapreduce/ivy.xml 85ee014 bq. mapreduce/ivy/libraries.properties 9d40aaa bq. mapreduce/src/java/org/apache/hadoop/mapred/lib/MultipleOutputFormat.java b8944f1 bq. mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMultipleTextOutputFormat.java 14c097d bq. bq. Diff: https://reviews.apache.org/r/953/diff bq. bq. bq. Testing bq. ------- bq. bq. Added a test case, but it fails with the current behavior of MapMaker's makeComputingMap() (would pass if its alright) bq. bq. bq. Thanks, bq. bq. Harsh bq. bq. > Missing synchronization in MultipleOutputFormat > ----------------------------------------------- > > Key: MAPREDUCE-1347 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.2, 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Assignee: Harsh J > Attachments: MAPREDUCE-1347.r2.diff, MAPREDUCE-1347.r3.diff, > mapreduce.1347.r1.diff > > > MultipleOutputFormat's RecordWriter implementation doesn't use > synchronization when accessing the recordWriters member. When using > multithreaded mappers or reducers, this can result in problems where two > threads will both try to create the same file, causing > AlreadyBeingCreatedException. Doing this more fine-grained than just > synchronizing the whole method is probably a good idea, so that multithreaded > mappers can actually achieve parallelism writing into separate output streams. > From what I can tell, the new API's MultipleOutputs seems not to have this > issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira