[ 
https://issues.apache.org/jira/browse/ACCUMULO-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Vines updated ACCUMULO-55:
-------------------------------

          Description: 
In conjuction with Accumulo-52, large amounts of empty files can cause 
problems. The short problem is when a reducer is empty, due to the partitioner 
used, the file for it will still be created. We do not want empty files 
lingering around, especially do not want them bulk imported. It should be as 
simple as either not creating the file until a write on it is attempted (more 
complex) or the file should be deleted at close time if there were no records 
written (simpler but more overhead due to file creation and deletion in the 
process).

Due to the complexity of the patch, I do not think it should be applied before 
the 1.4 version. It should simply delete the file after closing it if there are 
no writes to the file.

EDIT: As of 1.4 we now delete empty files on close() in the RecordWriter. I 
would like to implement a more robust version which does not create a file 
until the first write. I will do this for version 1.5 so as not to worry about 
breaking things.

  was:
In conjuction with Accumulo-52, large amounts of empty files can cause 
problems. The short problem is when a reducer is empty, due to the partitioner 
used, the file for it will still be created. We do not want empty files 
lingering around, especially do not want them bulk imported. It should be as 
simple as either not creating the file until a write on it is attempted (more 
complex) or the file should be deleted at close time if there were no records 
written (simpler but more overhead due to file creation and deletion in the 
process).

Due to the complexity of the patch, I do not think it should be applied before 
the 1.4 version. It should simply delete the file after closing it if there are 
no writes to the file.

    Affects Version/s:     (was: 1.3.5)
                           (was: 1.4.0)
        Fix Version/s:     (was: 1.4.0)
                       1.5.0
    
> Accumulo Output Format can create numerous empty files
> ------------------------------------------------------
>
>                 Key: ACCUMULO-55
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-55
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: John Vines
>            Assignee: John Vines
>              Labels: empty, file, output_format
>             Fix For: 1.5.0
>
>
> In conjuction with Accumulo-52, large amounts of empty files can cause 
> problems. The short problem is when a reducer is empty, due to the 
> partitioner used, the file for it will still be created. We do not want empty 
> files lingering around, especially do not want them bulk imported. It should 
> be as simple as either not creating the file until a write on it is attempted 
> (more complex) or the file should be deleted at close time if there were no 
> records written (simpler but more overhead due to file creation and deletion 
> in the process).
> Due to the complexity of the patch, I do not think it should be applied 
> before the 1.4 version. It should simply delete the file after closing it if 
> there are no writes to the file.
> EDIT: As of 1.4 we now delete empty files on close() in the RecordWriter. I 
> would like to implement a more robust version which does not create a file 
> until the first write. I will do this for version 1.5 so as not to worry 
> about breaking things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to