[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl
[ https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1495#comment-1495 ] Jan Filipiak commented on HDFS-8836: [~ajisakaa] thank yu for rethinking this issue, the comments to the path you suggest make sense. I was always tempted todo something along the lines of: {code} delimiter = cf.getOpt("skip-empty-file") ? "" : "\n"; {code} But to I dont have very strong opinions about how this should look in the end. > Skip newline on empty files with getMerge -nl > - > > Key: HDFS-8836 > URL: https://issues.apache.org/jira/browse/HDFS-8836 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.6.0, 2.7.1 >Reporter: Jan Filipiak >Assignee: Kanaka Kumar Avvaru >Priority: Trivial > Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, > HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch > > > Hello everyone, > I recently was in the need of using the new line option -nl with getMerge > because the files I needed to merge simply didn't had one. I was merging all > the files from one directory and unfortunately this directory also included > empty files, which effectively led to multiple newlines append after some > files. I needed to remove them manually afterwards. > In this situation it is maybe good to have another argument that allows > skipping empty files. > Thing one could try to implement this feature: > The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't > return the number of bytes copied which would be convenient as one could > skip append the new line when 0 bytes where copied or one would check the > file size before. > I posted this Idea on the mailing list > http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E > but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl
[ https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1497#comment-1497 ] Jan Filipiak commented on HDFS-8836: sorry for that many typos > Skip newline on empty files with getMerge -nl > - > > Key: HDFS-8836 > URL: https://issues.apache.org/jira/browse/HDFS-8836 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.6.0, 2.7.1 >Reporter: Jan Filipiak >Assignee: Kanaka Kumar Avvaru >Priority: Trivial > Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, > HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch > > > Hello everyone, > I recently was in the need of using the new line option -nl with getMerge > because the files I needed to merge simply didn't had one. I was merging all > the files from one directory and unfortunately this directory also included > empty files, which effectively led to multiple newlines append after some > files. I needed to remove them manually afterwards. > In this situation it is maybe good to have another argument that allows > skipping empty files. > Thing one could try to implement this feature: > The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't > return the number of bytes copied which would be convenient as one could > skip append the new line when 0 bytes where copied or one would check the > file size before. > I posted this Idea on the mailing list > http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E > but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl
[ https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900303#comment-14900303 ] Jan Filipiak commented on HDFS-8836: [~ajisakaa] Your approach is quite similliar to the one followed in the ticket. Find zero size files and treat them differently. Ideally I would like skipping the empty files from the moment they get created, but this is 1) unpractical as many different applications show the behavior of creating empty files and all of them had to be fixed and 2) sometimes these emtpy files are required for some purposes and only harmful during the getmerge step. To explain case 2 a little bit more, imagine an application that uses directory A as an intermediate output that gets used by many other applications. Sqoop makes a good example for this. One could set up many oozie coordinators that would wait for A/_SUCCESS and then start processing it. There would be no safe time to delete the file as one is always in danger of having one of the cooridnators not executed as they didn't find its "dataset" file. Those two are the main reasons I consider this patch very helpfull. If namespacesize gets a problem one can always start tackling this at a different level. Applying the default Hiddenfilefilter would help in my case, but this would need a option aswell and just skipping all the empty files is semantically more correct in this case. > Skip newline on empty files with getMerge -nl > - > > Key: HDFS-8836 > URL: https://issues.apache.org/jira/browse/HDFS-8836 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.6.0, 2.7.1 >Reporter: Jan Filipiak >Assignee: Kanaka Kumar Avvaru >Priority: Trivial > Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, > HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch > > > Hello everyone, > I recently was in the need of using the new line option -nl with getMerge > because the files I needed to merge simply didn't had one. I was merging all > the files from one directory and unfortunately this directory also included > empty files, which effectively led to multiple newlines append after some > files. I needed to remove them manually afterwards. > In this situation it is maybe good to have another argument that allows > skipping empty files. > Thing one could try to implement this feature: > The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't > return the number of bytes copied which would be convenient as one could > skip append the new line when 0 bytes where copied or one would check the > file size before. > I posted this Idea on the mailing list > http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E > but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl
[ https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724779#comment-14724779 ] Jan Filipiak commented on HDFS-8836: Thanks for taking this into consideration [~kanaka], one could probably think about skipping the open + the readFully call in the zero length case. OTOH that is probably a rare case and I dont think one needs to really pay attention. Looking forward to use this feature in an official release. Thanks > Skip newline on empty files with getMerge -nl > - > > Key: HDFS-8836 > URL: https://issues.apache.org/jira/browse/HDFS-8836 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.6.0, 2.7.1 >Reporter: Jan Filipiak >Assignee: kanaka kumar avvaru >Priority: Trivial > Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, > HDFS-8836-03.patch > > > Hello everyone, > I recently was in the need of using the new line option -nl with getMerge > because the files I needed to merge simply didn't had one. I was merging all > the files from one directory and unfortunately this directory also included > empty files, which effectively led to multiple newlines append after some > files. I needed to remove them manually afterwards. > In this situation it is maybe good to have another argument that allows > skipping empty files. > Thing one could try to implement this feature: > The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't > return the number of bytes copied which would be convenient as one could > skip append the new line when 0 bytes where copied or one would check the > file size before. > I posted this Idea on the mailing list > http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E > but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl
[ https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14722953#comment-14722953 ] Jan Filipiak commented on HDFS-8836: Hi [~kanaka] thank you for looking into this and your patch. I was just wondering if the call src.fs.getFileStatus is really necessary, the comment above processPath and processPath itself indicate that the FileStatus is already set for the PathData objects in the srcs list. It looks like this extra roundtrip to the NN might be skipped. Skip newline on empty files with getMerge -nl - Key: HDFS-8836 URL: https://issues.apache.org/jira/browse/HDFS-8836 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.6.0, 2.7.1 Reporter: Jan Filipiak Assignee: kanaka kumar avvaru Priority: Trivial Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch Hello everyone, I recently was in the need of using the new line option -nl with getMerge because the files I needed to merge simply didn't had one. I was merging all the files from one directory and unfortunately this directory also included empty files, which effectively led to multiple newlines append after some files. I needed to remove them manually afterwards. In this situation it is maybe good to have another argument that allows skipping empty files. Thing one could try to implement this feature: The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't return the number of bytes copied which would be convenient as one could skip append the new line when 0 bytes where copied or one would check the file size before. I posted this Idea on the mailing list http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8836) Skip newline on empty files with getMerge -nl
Jan Filipiak created HDFS-8836: -- Summary: Skip newline on empty files with getMerge -nl Key: HDFS-8836 URL: https://issues.apache.org/jira/browse/HDFS-8836 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.7.1, 2.6.0 Reporter: Jan Filipiak Priority: Trivial Hello everyone, I recently was in the need of using the new line option -nl with getMerge because the files I needed to merge simply didn't had one. I was merging all the files from one directory and unfortunately this directory also included empty files, which effectively led to multiple newlines append after some files. I needed to remove them manually afterwards. In this situation it is maybe good to have another argument that allows skipping empty files. Thing one could try to implement this feature: The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't return the number of bytes copied which would be convenient as one could skip append the new line when 0 bytes where copied or one would check the file size before. I posted this Idea on the mailing list http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8836) Skip newline on empty files with getMerge -nl
[ https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Filipiak updated HDFS-8836: --- Description: Hello everyone, I recently was in the need of using the new line option -nl with getMerge because the files I needed to merge simply didn't had one. I was merging all the files from one directory and unfortunately this directory also included empty files, which effectively led to multiple newlines append after some files. I needed to remove them manually afterwards. In this situation it is maybe good to have another argument that allows skipping empty files. Thing one could try to implement this feature: The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't return the number of bytes copied which would be convenient as one could skip append the new line when 0 bytes where copied or one would check the file size before. I posted this Idea on the mailing list http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E but I didn't really get many responses, so I thought I my try this way. was: Hello everyone, I recently was in the need of using the new line option -nl with getMerge because the files I needed to merge simply didn't had one. I was merging all the files from one directory and unfortunately this directory also included empty files, which effectively led to multiple newlines append after some files. I needed to remove them manually afterwards. In this situation it is maybe good to have another argument that allows skipping empty files. Thing one could try to implement this feature: The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't return the number of bytes copied which would be convenient as one could skip append the new line when 0 bytes where copied or one would check the file size before. I posted this Idea on the mailing list http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E but I didn't really get many responses, so I thought I my try this way. Skip newline on empty files with getMerge -nl - Key: HDFS-8836 URL: https://issues.apache.org/jira/browse/HDFS-8836 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.6.0, 2.7.1 Reporter: Jan Filipiak Priority: Trivial Hello everyone, I recently was in the need of using the new line option -nl with getMerge because the files I needed to merge simply didn't had one. I was merging all the files from one directory and unfortunately this directory also included empty files, which effectively led to multiple newlines append after some files. I needed to remove them manually afterwards. In this situation it is maybe good to have another argument that allows skipping empty files. Thing one could try to implement this feature: The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't return the number of bytes copied which would be convenient as one could skip append the new line when 0 bytes where copied or one would check the file size before. I posted this Idea on the mailing list http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)