[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830383#action_12830383 ]
Gaurav Jain commented on PIG-1115: ---------------------------------- Proposed Solution: -- Zebra will implement ZebraOutputCommitter -- Zebra FrontEnd will create all the final directories and schema files $basicTable/.btschema $basicTable/CG0/.schema $basicTable/CG1/.schema -- Zebra will create a temporary directory per BasicTable and write all data there during RecordWrite.write() under $basicTable/_temporary/CG0/part-0000 $basicTable/_temporary/CG1/part-0000 -- _temporary directory will always be created under $basicTable -- In BackEnd, Zebra created RecordWrites which in turn creates CGInserter. CGInserter works on directory, which we call 'workOutputPath' , $basicTable/_temporary/$CG/ But It needs .schema file which is located 2 levels up. So it reads schema file from $basicTable/$workOutputPath.getName() -- In CGInserter.close(), $basicTable/_temporary/CG0/part-0000 -----------> $basicTable/CG0/part-0000 -- In ZebraOutputCommitter.cleanupJob(), BasicTableOutputFormat.close() will be called. -- In BasicTableOutPutFormat.close() remove ( $basicTable/_temporary/ ) > [zebra] temp files are not cleaned. > ----------------------------------- > > Key: PIG-1115 > URL: https://issues.apache.org/jira/browse/PIG-1115 > Project: Pig > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Hong Tang > > Temp files created by zebra during table creation are not cleaned where there > is any task failure, which results in waste of disk space. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.