[ 
https://issues.apache.org/jira/browse/IO-670?focusedWorklogId=464108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-464108
 ]

ASF GitHub Bot logged work on IO-670:
-------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Jul/20 17:36
            Start Date: 29/Jul/20 17:36
    Worklog Time Spent: 10m 
      Work Description: sebbASF commented on a change in pull request #118:
URL: https://github.com/apache/commons-io/pull/118#discussion_r462250040



##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -882,15 +894,26 @@ public static boolean contentEqualsIgnoreEOL(final Reader 
input1, final Reader i
                     nowPos2 += nowRead2;
                 }
             }
+
             if (readEnd1) {
+                // if input1 ended.

Review comment:
       This comment is not necessary comment if readEnd1 is properly commented
   

##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -814,9 +814,21 @@ public static boolean contentEquals(final Reader input1, 
final Reader input2)
     }
 
     private enum LastState {
-        r,
-        normal,
-        newLine;
+        /**
+         * If last char is '\r'.
+         */
+        R,
+
+        /**
+         * If last char is not '\r' nor '\n'.
+         */
+        NORMAL,
+
+        /**
+         * If we just moved to a new line.
+         * It might sounds weird but after you see the codes you can know it.
+         */
+        NEW_LINE;

Review comment:
       This could be NL to agree with CR above

##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -814,9 +814,21 @@ public static boolean contentEquals(final Reader input1, 
final Reader input2)
     }
 
     private enum LastState {
-        r,
-        normal,
-        newLine;
+        /**
+         * If last char is '\r'.
+         */
+        R,
+
+        /**
+         * If last char is not '\r' nor '\n'.
+         */
+        NORMAL,
+
+        /**
+         * If we just moved to a new line.
+         * It might sounds weird but after you see the codes you can know it.

Review comment:
       This comment is not useful

##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -882,15 +894,26 @@ public static boolean contentEqualsIgnoreEOL(final Reader 
input1, final Reader i
                     nowPos2 += nowRead2;
                 }
             }
+
             if (readEnd1) {
+                // if input1 ended.
                 if (readEnd2) {
+                    // if both input1 and input2 ended here, we just return 
true.
                     return true;
                 } else {
+                    // if input2 not ended.
                     switch (lastState1) {
-                        case r:
-                        case newLine:
+                        case R:
+                            // if last state of input1 is "\r"

Review comment:
       Unnecessary if the enum value has a sensible name and comment

##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -1147,31 +1201,43 @@ public static boolean contentEqualsIgnoreEOL(final 
Reader input1, final Reader i
                             }
                             continue;
                         default:
-                            if (lastState1 == LastState.r) {
-                                lastState1 = LastState.newLine;
+                            // if input2's next is normal.
+                            //  if input1's last is '\r', then it can become 
"\r\n", then legal.
+                            //  otherwise illegal.
+                            if (lastState1 == LastState.R) {
+                                lastState1 = LastState.NEW_LINE;
                                 nowCheck1++;
                                 continue;
                             } else {
                                 return false;
                             }
                     }
                 default:
+                    // if input1's next is normal.
                     switch (charArray2[nowCheck2]) {
                         case '\n':
-                            if (lastState2 == LastState.r) {
-                                lastState2 = LastState.newLine;
+                            // if input2's next is '\n'.
+                            //  if input2's last is '\r', then it can become 
"\r\n", then legal.
+                            //  otherwise illegal.
+                            if (lastState2 == LastState.R) {
+                                lastState2 = LastState.NEW_LINE;
                                 nowCheck2++;
                                 continue;
                             } else {
                                 return false;
                             }
                         case '\r':
+                            // if input2's next is '\r'.
+                            // illegal.
                             return false;
                         default:
+                            // if input2's next is normal.
+                            //  if equal then legal.
+                            //  otherwise illegal.
                             if (charArray1[nowCheck1] != 
charArray2[nowCheck2]) {
                                 return false;
                             }
-                            lastState1 = lastState2 = LastState.normal;
+                            lastState1 = lastState2 = LastState.NORMAL;
                             nowCheck1++;
                             nowCheck2++;
                             continue;

Review comment:
       Most of the comments above are unnecessary if the variables are well 
named and commented.
   What is missing is how the algorithm works, and what the various 
combinations of state actually mean in terms of the algorithm.
   
   I'm not clear why the code sometimes checks for '\r' and sometimes uses the 
enum.
   Why not use a variable containing the last character?
   
   ==
   
   I suspect the code could be much simplified by using a suitable filter on 
the inputs.
   There are several examples in IO, and NET has FromNetASCIIInputStream which 
deals with CRLF conversions

##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -882,15 +894,26 @@ public static boolean contentEqualsIgnoreEOL(final Reader 
input1, final Reader i
                     nowPos2 += nowRead2;
                 }
             }
+

Review comment:
       To prevent accidental run-away, it would be safer to use >= to compare 
incremented indexes

##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -814,9 +814,21 @@ public static boolean contentEquals(final Reader input1, 
final Reader input2)
     }
 
     private enum LastState {
-        r,
-        normal,
-        newLine;
+        /**
+         * If last char is '\r'.
+         */
+        R,

Review comment:
       Why is this not called CR or CARRIAGE_RETURN?

##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -882,15 +894,26 @@ public static boolean contentEqualsIgnoreEOL(final Reader 
input1, final Reader i
                     nowPos2 += nowRead2;
                 }
             }
+

Review comment:
       If the index starts at a value below the limit, and you increment it by 
1 between each check, then I agree that the values must be equal at some point, 
assuming single-threaded code.
   
   However if there is a bug in the code that can cause the index to be 
incremented twice, it may never match.
   Comparison using >= avoids this potential error.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 464108)
    Time Spent: 4h 20m  (was: 4h 10m)

> IOUtils.contentEquals is of low performance. I will refine it.
> --------------------------------------------------------------
>
>                 Key: IO-670
>                 URL: https://issues.apache.org/jira/browse/IO-670
>             Project: Commons IO
>          Issue Type: Improvement
>            Reporter: Jin Xu
>            Priority: Critical
>         Attachments: jmh-result.org.apache.json
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/commons-io/pull/118]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to