[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-10-21 Thread Jan Filipiak (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1495#comment-1495
 ] 

Jan Filipiak commented on HDFS-8836:


[~ajisakaa] thank yu for rethinking this issue, the comments to the path you 
suggest make sense. I was always tempted todo something along the lines of: 

{code}
delimiter = cf.getOpt("skip-empty-file") ? "" : "\n";
{code}

But to I dont have very strong opinions about how this should look in the end. 

> Skip newline on empty files with getMerge -nl
> -
>
> Key: HDFS-8836
> URL: https://issues.apache.org/jira/browse/HDFS-8836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Jan Filipiak
>Assignee: Kanaka Kumar Avvaru
>Priority: Trivial
> Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, 
> HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch
>
>
> Hello everyone,
> I recently was in the need of using the new line option -nl with getMerge 
> because the files I needed to merge simply didn't had one. I was merging all 
> the files from one directory and unfortunately this directory also included 
> empty files, which effectively led to multiple newlines append after some 
> files. I needed to remove them manually afterwards.
> In this situation it is maybe good to have another argument that allows 
> skipping empty files.
> Thing one could try to implement this feature:
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
> return the number of bytes copied which would be convenient as one could
> skip append the new line when 0 bytes where copied or one would check the 
> file size before.
> I posted this Idea on the mailing list 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
>  but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-10-21 Thread Jan Filipiak (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1497#comment-1497
 ] 

Jan Filipiak commented on HDFS-8836:


sorry for that many typos

> Skip newline on empty files with getMerge -nl
> -
>
> Key: HDFS-8836
> URL: https://issues.apache.org/jira/browse/HDFS-8836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Jan Filipiak
>Assignee: Kanaka Kumar Avvaru
>Priority: Trivial
> Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, 
> HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch
>
>
> Hello everyone,
> I recently was in the need of using the new line option -nl with getMerge 
> because the files I needed to merge simply didn't had one. I was merging all 
> the files from one directory and unfortunately this directory also included 
> empty files, which effectively led to multiple newlines append after some 
> files. I needed to remove them manually afterwards.
> In this situation it is maybe good to have another argument that allows 
> skipping empty files.
> Thing one could try to implement this feature:
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
> return the number of bytes copied which would be convenient as one could
> skip append the new line when 0 bytes where copied or one would check the 
> file size before.
> I posted this Idea on the mailing list 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
>  but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-09-21 Thread Jan Filipiak (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900303#comment-14900303
 ] 

Jan Filipiak commented on HDFS-8836:


[~ajisakaa]
Your approach is quite similliar to the one followed in the ticket. Find zero 
size files and treat them differently.
 Ideally I would like skipping the empty files from the moment they get 
created, but this is 1) unpractical as many different applications show the 
behavior of creating empty files and all of them had to be fixed and 2) 
sometimes these emtpy files are required for some purposes and only harmful 
during the getmerge step. To explain case 2 a little bit more, imagine an 
application that uses directory A as an intermediate output that gets used by 
many other applications. Sqoop makes a good example for this. One could set up 
many oozie coordinators that would wait for A/_SUCCESS and then start 
processing it. There would be no safe time to delete the file as one is always 
in danger of having one of the cooridnators not executed as they didn't find 
its "dataset" file. 

Those two are the main reasons I consider this patch very helpfull. If 
namespacesize gets a problem one can always start tackling this at a different 
level. Applying the default Hiddenfilefilter would help in my case, but this 
would need a option aswell and just skipping all the empty files is 
semantically more correct in this case.

> Skip newline on empty files with getMerge -nl
> -
>
> Key: HDFS-8836
> URL: https://issues.apache.org/jira/browse/HDFS-8836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Jan Filipiak
>Assignee: Kanaka Kumar Avvaru
>Priority: Trivial
> Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, 
> HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch
>
>
> Hello everyone,
> I recently was in the need of using the new line option -nl with getMerge 
> because the files I needed to merge simply didn't had one. I was merging all 
> the files from one directory and unfortunately this directory also included 
> empty files, which effectively led to multiple newlines append after some 
> files. I needed to remove them manually afterwards.
> In this situation it is maybe good to have another argument that allows 
> skipping empty files.
> Thing one could try to implement this feature:
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
> return the number of bytes copied which would be convenient as one could
> skip append the new line when 0 bytes where copied or one would check the 
> file size before.
> I posted this Idea on the mailing list 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
>  but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-08-31 Thread Jan Filipiak (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724779#comment-14724779
 ] 

Jan Filipiak commented on HDFS-8836:


Thanks for taking this into consideration [~kanaka], one could probably think 
about skipping the open + the readFully call in the zero length case. OTOH that 
is probably a rare case and I dont think one needs to really pay attention. 
Looking forward to use this feature in an official release. Thanks



> Skip newline on empty files with getMerge -nl
> -
>
> Key: HDFS-8836
> URL: https://issues.apache.org/jira/browse/HDFS-8836
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Jan Filipiak
>Assignee: kanaka kumar avvaru
>Priority: Trivial
> Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, 
> HDFS-8836-03.patch
>
>
> Hello everyone,
> I recently was in the need of using the new line option -nl with getMerge 
> because the files I needed to merge simply didn't had one. I was merging all 
> the files from one directory and unfortunately this directory also included 
> empty files, which effectively led to multiple newlines append after some 
> files. I needed to remove them manually afterwards.
> In this situation it is maybe good to have another argument that allows 
> skipping empty files.
> Thing one could try to implement this feature:
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
> return the number of bytes copied which would be convenient as one could
> skip append the new line when 0 bytes where copied or one would check the 
> file size before.
> I posted this Idea on the mailing list 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
>  but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-08-30 Thread Jan Filipiak (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14722953#comment-14722953
 ] 

Jan Filipiak commented on HDFS-8836:


Hi [~kanaka]

thank you for looking into this and your patch. I was just wondering if the 
call src.fs.getFileStatus is really necessary, the comment above processPath 
and processPath itself indicate that the FileStatus is already set for the 
PathData objects in the srcs list. It looks like this extra roundtrip to the NN 
might be skipped. 

 Skip newline on empty files with getMerge -nl
 -

 Key: HDFS-8836
 URL: https://issues.apache.org/jira/browse/HDFS-8836
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.6.0, 2.7.1
Reporter: Jan Filipiak
Assignee: kanaka kumar avvaru
Priority: Trivial
 Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch


 Hello everyone,
 I recently was in the need of using the new line option -nl with getMerge 
 because the files I needed to merge simply didn't had one. I was merging all 
 the files from one directory and unfortunately this directory also included 
 empty files, which effectively led to multiple newlines append after some 
 files. I needed to remove them manually afterwards.
 In this situation it is maybe good to have another argument that allows 
 skipping empty files.
 Thing one could try to implement this feature:
 The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
 return the number of bytes copied which would be convenient as one could
 skip append the new line when 0 bytes where copied or one would check the 
 file size before.
 I posted this Idea on the mailing list 
 http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
  but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-07-29 Thread Jan Filipiak (JIRA)
Jan Filipiak created HDFS-8836:
--

 Summary: Skip newline on empty files with getMerge -nl
 Key: HDFS-8836
 URL: https://issues.apache.org/jira/browse/HDFS-8836
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.7.1, 2.6.0
Reporter: Jan Filipiak
Priority: Trivial


Hello everyone,

I recently was in the need of using the new line option -nl with getMerge 
because the files I needed to merge simply didn't had one. I was merging all 
the files from one directory and unfortunately this directory also included 
empty files, which effectively led to multiple newlines append after some 
files. I needed to remove them manually afterwards.

In this situation it is maybe good to have another argument that allows
skipping empty files.

Thing one could try to implement this feature:

The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
return the number of bytes copied which would be convenient as one could
skip append the new line when 0 bytes where copied or one would check the file 
size before.

I posted this Idea on the mailing list 
http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
 but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8836) Skip newline on empty files with getMerge -nl

2015-07-29 Thread Jan Filipiak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Filipiak updated HDFS-8836:
---
Description: 
Hello everyone,

I recently was in the need of using the new line option -nl with getMerge 
because the files I needed to merge simply didn't had one. I was merging all 
the files from one directory and unfortunately this directory also included 
empty files, which effectively led to multiple newlines append after some 
files. I needed to remove them manually afterwards.

In this situation it is maybe good to have another argument that allows 
skipping empty files.
Thing one could try to implement this feature:

The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
return the number of bytes copied which would be convenient as one could
skip append the new line when 0 bytes where copied or one would check the file 
size before.

I posted this Idea on the mailing list 
http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
 but I didn't really get many responses, so I thought I my try this way.

  was:
Hello everyone,

I recently was in the need of using the new line option -nl with getMerge 
because the files I needed to merge simply didn't had one. I was merging all 
the files from one directory and unfortunately this directory also included 
empty files, which effectively led to multiple newlines append after some 
files. I needed to remove them manually afterwards.

In this situation it is maybe good to have another argument that allows
skipping empty files.

Thing one could try to implement this feature:

The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
return the number of bytes copied which would be convenient as one could
skip append the new line when 0 bytes where copied or one would check the file 
size before.

I posted this Idea on the mailing list 
http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
 but I didn't really get many responses, so I thought I my try this way.


 Skip newline on empty files with getMerge -nl
 -

 Key: HDFS-8836
 URL: https://issues.apache.org/jira/browse/HDFS-8836
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.6.0, 2.7.1
Reporter: Jan Filipiak
Priority: Trivial

 Hello everyone,
 I recently was in the need of using the new line option -nl with getMerge 
 because the files I needed to merge simply didn't had one. I was merging all 
 the files from one directory and unfortunately this directory also included 
 empty files, which effectively led to multiple newlines append after some 
 files. I needed to remove them manually afterwards.
 In this situation it is maybe good to have another argument that allows 
 skipping empty files.
 Thing one could try to implement this feature:
 The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
 return the number of bytes copied which would be convenient as one could
 skip append the new line when 0 bytes where copied or one would check the 
 file size before.
 I posted this Idea on the mailing list 
 http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E
  but I didn't really get many responses, so I thought I my try this way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)