[jira] [Commented] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat

[email protected] (JIRA) Fri, 24 Jun 2011 12:40:21 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054641#comment-13054641
 ]

[email protected] commented on MAPREDUCE-1347:
----------------------------------------------------------

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/953/#review910
-----------------------------------------------------------

I think your issue with the test case is that you're calling 
tMOF.getRecordWriter() inside the thread. So, each thread has its own record 
writer and hence has its own computing map. You should call that from outside 
the thread, and just have the threads do the writing.

- Todd

On 2011-06-24 19:06:46, Harsh J wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/953/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-06-24 19:06:46)
bq.  
bq.  
bq.  Review request for hadoop-mapreduce and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Used the makeComputingMap from Guava's MapMaker to provide a thread safe 
way of creating a RecordWriter cache.
bq.  
bq.  For some reason, the map is not really caching it and is instead trying to 
apply() over and over again for the same key-value pairs.
bq.  
bq.  
bq.  This addresses bug MAPREDUCE-1347.
bq.      http://issues.apache.org/jira/browse/MAPREDUCE-1347
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    mapreduce/ivy.xml 85ee014 
bq.    mapreduce/ivy/libraries.properties 9d40aaa 
bq.    
mapreduce/src/java/org/apache/hadoop/mapred/lib/MultipleOutputFormat.java 
b8944f1 
bq.    
mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMultipleTextOutputFormat.java
 14c097d 
bq.  
bq.  Diff: https://reviews.apache.org/r/953/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Added a test case, but it fails with the current behavior of MapMaker's 
makeComputingMap() (would pass if its alright)
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Harsh
bq.  
bq.

> Missing synchronization in MultipleOutputFormat
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1347
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>         Attachments: MAPREDUCE-1347.r2.diff, MAPREDUCE-1347.r3.diff, 
> mapreduce.1347.r1.diff
>
>
> MultipleOutputFormat's RecordWriter implementation doesn't use 
> synchronization when accessing the recordWriters member. When using 
> multithreaded mappers or reducers, this can result in problems where two 
> threads will both try to create the same file, causing 
> AlreadyBeingCreatedException. Doing this more fine-grained than just 
> synchronizing the whole method is probably a good idea, so that multithreaded 
> mappers can actually achieve parallelism writing into separate output streams.
> From what I can tell, the new API's MultipleOutputs seems not to have this 
> issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat

Reply via email to