[jira] [Commented] (AVRO-1215) AvroMultipleOutputs not working when specifying baseOutputPath

shahnawaj akhtar (JIRA) Thu, 20 Aug 2015 12:12:05 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705582#comment-14705582
 ]


shahnawaj akhtar commented on AVRO-1215:
----------------------------------------

Hi - I am facing another problem. from the mapper,  My program writing like 
this:
Mapper: 
avroMultipleOutputs.write("partmap", avroKey, 
NullWritable.get(),baseOutputPath);

Launcher :
AvroMultipleOutputs.addNamedOutput(job, "partmap", AvroKeyOutputFormat.class, 
Avro_0_Prod_Top_L1.getClassSchema(), Schema.create(Schema.Type.NULL));

while generating the o/p from mapper, it is not considering the named O/p as 
file names. O/p files name looks like :
/20150818/00/Map_Out/-m-00000.avro
have anyone faced this issue? is there a patch already? I tried but could not 
find. Thanks in advance.


> AvroMultipleOutputs not working when specifying baseOutputPath
> --------------------------------------------------------------
>
>                 Key: AVRO-1215
>                 URL: https://issues.apache.org/jira/browse/AVRO-1215
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.2
>            Reporter: Matthew Hayes
>            Assignee: Ashish Nagavaram
>              Labels: avro, mapreduce
>             Fix For: 1.7.4
>
>         Attachments: AVRO-1215-v3.patch, AVRO-1215.patch, AVRO-1215.patch, 
> AVRO-1215.patch, AVRO-1215_final.patch
>
>
> I'm calling the write() method of AvroMultipleOutputs which takes the 
> baseOutputPath.  The reducer appears to begin hanging once it tries writing 
> to a baseOuputPath value not already encountered.  It then fails with:
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
> create file ... because current leaseholder is trying to recreate file.
> I think the problem has to do with this line in AvroMultipleOutputs:
> {code}
> // get the record writer from context output format
> //FileOutputFormat.setOutputName(taskContext, baseFileName);
> {code}
> This line is not commented out in the similar code from Hadoop.  So I think 
> the baseOutputPath is ignored.  As a result when each record writer is 
> created it uses the same path, leading to the exception.
> Uncommenting this line does not work because of visibility of the method.  
> However what this method does is set "mapreduce.output.basename".  But 
> setting this doesn't work either.  
> After digging through Avro code I found that AvroOutputFormatBase is using 
> "avro.mo.config.namedOutput" to create the path.  If I replace the commented 
> out line with this it seems to work:
> {code}
> taskContext.getConfiguration().set("avro.mo.config.namedOutput", 
> baseFileName);  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AVRO-1215) AvroMultipleOutputs not working when specifying baseOutputPath

Reply via email to