[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-09-09 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608719#comment-16608719
 ] 

Sunil Govindan commented on HDFS-13744:
---

Hi [~mackrorysd] Looks like patch is committed, but issue is not closed. Could 
you please close this if its fine. Thank you.

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch, 
> HDFS-13744.03.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-09-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607517#comment-16607517
 ] 

Hudson commented on HDFS-13744:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14899 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14899/])
HDFS-13744. OIV tool should better handle control characters present in 
(mackrorysd: rev 410dd3faa55660b60e1292e2816b135cb476751e)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageDelimitedTextWriter.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java


> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch, 
> HDFS-13744.03.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-09-07 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607503#comment-16607503
 ] 

Sean Mackrory commented on HDFS-13744:
--

I concur. Committed.

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch, 
> HDFS-13744.03.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-09-06 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605831#comment-16605831
 ] 

Zsolt Venczel commented on HDFS-13744:
--

Thanks a lot [~mackrorysd] for the review and the fix!

I was a bit puzzled on the specification about how to escape a CRLF properly as 
it's not specified exactly (there's an example to replace it character by 
character which is your approach but there's another example here: 
https://tools.ietf.org/html/rfc2234#section-2.3).
>From a usability perspective I think you're approach is the best as it clearly 
>displays all special characters. For debugging purposes this is the most 
>valuable.

Test failures are unrelated.



> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch, 
> HDFS-13744.03.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-09-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603795#comment-16603795
 ] 

Hadoop QA commented on HDFS-13744:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 29s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13744 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938366/HDFS-13744.03.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 402aec829876 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9964e33 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24956/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24956/testReport/ |
| Max. process+thread count | 2881 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24956/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was 

[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-09-04 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603615#comment-16603615
 ] 

Sean Mackrory commented on HDFS-13744:
--

Looks good to me, except CR/LF was being escaped as LF, so I attached .003. 
with a trivial change that escapes both characters. If it's cool with you, I'll 
commit that version.

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch, 
> HDFS-13744.03.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-30 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597451#comment-16597451
 ] 

Zsolt Venczel commented on HDFS-13744:
--

Thank you very much for the review [~mackrorysd]. Please let me know if you 
have any preference about the direction this solution should take.
In my latest patch I added StringUtils.CR support as you suggested.

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-28 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595938#comment-16595938
 ] 

Sunil Govindan commented on HDFS-13744:
---

Hi [~zvenczel]

As this jira is marked for 3.2 as a critical, cud u pls help to take this 
forward or move out if its not feasible to finish in coming weeks. 3.2 code 
freeze date is nearby in a weeks. Kindly help to check the same.

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-22 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588981#comment-16588981
 ] 

Sean Mackrory commented on HDFS-13744:
--

We should probably also handle StringUtils.CR. Hadoop is sometimes used from 
Windows clients too.

I'm a little bit torn about not escaping the XML. If someone is embedding 
control characters in filenames, even if that is technically allowed and there 
are standards specifying how that is to be encoded / decoded, I think it's 
likely to cause problems, and I would want those characters to show up 
obviously in a report. I suspect there's a good chance that those characters 
are the reason someone is trying to inspect the image in the first place :) But 
I also don't want to cause practical problems in XML parsers. I can see an 
argument either way - like I said I'm a bit torn and want to think about it...

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582928#comment-16582928
 ] 

Zsolt Venczel commented on HDFS-13744:
--

I could not reproduce the above test failure with or without the patch 
therefore it should be unrelated:
{code}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.774 s 
- in 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582916#comment-16582916
 ] 

genericqa commented on HDFS-13744:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13744 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935885/HDFS-13744.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d676efd6de41 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cb21eaa |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24793/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24793/testReport/ |
| Max. process+thread count | 3100 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24793/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582703#comment-16582703
 ] 

Zsolt Venczel commented on HDFS-13744:
--

After doing some more analysis it turns out that very few CSV and XML clients 
are following the LF character encoding specifications.

This can have the following impact:

* For the XML processor:
Escaping the LF character following the specification can distort an XML parser 
to correctly reproduce a file name. It can also modify filenames when using the 
ReverseXML processor. *I would not recommend escaping here.*

* For the Delimited processor:
The output of the Delimited processor is handy for report creation and grepping 
where a wrongly displayed filename or directory name having LF can cause more 
problems than the appearance of an escaped LF character therefore *I would 
recommend escaping in this scenario*.

In my uploaded patch I added escaping for the Delimited processor only.

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * CSV processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org