[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user britter commented on the pull request: https://github.com/apache/commons-lang/pull/75#issuecomment-102940510 @rikles thank you for your thorough feedback. Give me some time to go through your comments. I'll have time to have a look later this week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user rikles commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30461235 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] + * + * StringUtils.splitByLength(abcdefg, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefghij, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefg, 2, 4, 5) = [ab, cdef, g] + * StringUtils.splitByLength(abcdef, 2, 4, 1) = [ab, cdef, null] + * + * StringUtils.splitByLength( abcdef, 2, 4, 1) = [ a, bcde, f] + * StringUtils.splitByLength(abcdef , 2, 4, 1) = [ab, cdef, ] + * StringUtils.splitByLength(abcdefg, 2, 4, 0, 1) = [ab, cdef, , g] + * StringUtils.splitByLength(abcdefg, -1) = {@link IllegalArgumentException} --- End diff -- It's true that negative numbers could be easily promotes to 0. But here, it's not like in `StringUtils.split(String, String, int)` where the int value indicate the maximum number of results we want. It's very common that passing a zero or negative value mean _no limit_. Here, we request several column lengths. I thought it was natural and easier to consider negative lengths as a coding mistake. I even wondered if we could interpret negative values like : * _unlimited length_. But what about `StringUtils.splitByLength(abcdefg, -1, 2, -1, -1)` ... ? * _backward move_ like `StringUtils.splitByLength(abcde, 3, -2, 4) = [abc, bcde]`. But what about `StringUtils.splitByLength(abcdef, 1, -4, 3)` ... ? I don't know what to think about it... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user rikles commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30461086 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] + * + * StringUtils.splitByLength(abcdefg, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefghij, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefg, 2, 4, 5) = [ab, cdef, g] + * StringUtils.splitByLength(abcdef, 2, 4, 1) = [ab, cdef, null] --- End diff -- Good point. My idea was to indicate that there is no more characters to extract the explicitly requested column length text. But in the other hand, this can cause `NullPointerException` if the returned array is used without check... Why I used this approach : with `null` values, in case of hard coded lengths, we can simply check the returned array with a _for each_ loop, even later in other piece of code : ```java String[] cols = StringUtils.splitByLength(input, 2, 3, 1); // ... for (String col : cols) { if (col == null) { break; } // Do something } ``` Without `null` values, we must have a lengths array reference : ```java int[] LENGTHS = { 2, 3, 0, 1 }; String[] cols = StringUtils.splitByLength(input, LENGTHS); int index = 0; for (String col : cols) { if (col.length() == 0 LENGTHS[index] 0) { break; } index++; // Do something } ``` Of course, we can also check the input string length before calling `StringUtils.splitByLength`, but we have to get the lengths sum. And what about this case : `StringUtils.splitByLength(abcd, 1, 2, 2)` ? I don't know which is best... What do you think ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user rikles commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30460736 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] + * + * StringUtils.splitByLength(abcdefg, 2, 4, 1) = [ab, cdef, g] --- End diff -- Like said in the next line : `StringUtils.splitByLength(abcdefg, 2, 2)` will return `[ab, cd ]`. `StringUtils.splitByLength(abcdefghij, 2, 4, 1) = [ab, cdef, g]` I asked myself the question during development. Do we discard the extra characters ? I think it would be nice to let users decide. Moreover, depending on use case, it could be useful to keep/discard the first extra characters (like parsing a single line commented out string). I propose to : * add a private `splitByLengthWorker(String string, boolean splitFromEnd, boolean discardExtraChar, int ... lengths)` * keep this `splitByLength(String, int ...)` method logic as default : `return splitByLengthWorker(string, false, true, lengths)`. So, by default, the returned array is same size as the `int ... lengths` array param and this behavior is interesting on parsing fixed column lengths strings. * add a `splitByLengthKeepExtraChar(String, int ...)` : `return splitByLengthWorker(string, false, false, lengths)` * add a `splitByLengthFromEnd(String, int ...)` : `return splitByLengthWorker(string, true, false, lengths)` * add a `splitByLengthFromEndKeepExtraChar(String, int ...)` : `return splitByLengthWorker(string, true, true, lengths)` A question : For _split from end_ methods, which call do you think is more logic : _right aligned/end to start_ lengths, _reversed/not reversed_ result ? * `StringUtils.splitByLengthFromEndKeepExtraChar(__abcdef, 1, 2, 3) = [__, a, bc, def]` - (RA, NR) * `StringUtils.splitByLengthFromEndKeepExtraChar(__abcdef, 1, 2, 3) = [def, bc, a, __]` - (RA, R) * `StringUtils.splitByLengthFromEndKeepExtraChar(__abcdef, 1, 2, 3) = [f, de, abc, __]` - (E2S, R) * `StringUtils.splitByLengthFromEndKeepExtraChar(__abcdef, 1, 2, 3) = [__, abc, de, f]` - (E2S, NR) I think the first one is more readable, we can visually understand the splitting, but may be less intuitive : ``` StringUtils.splitByLengthFromEnd(ABCDEFGHIJKLM, 3, 4, 5) = [BCD, EFGH, IJKLM] [3][4_][_5_] ABCDEFGHIJKLM ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user rikles commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30467016 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] + * + * StringUtils.splitByLength(abcdefg, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefghij, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefg, 2, 4, 5) = [ab, cdef, g] + * StringUtils.splitByLength(abcdef, 2, 4, 1) = [ab, cdef, null] + * + * StringUtils.splitByLength( abcdef, 2, 4, 1) = [ a, bcde, f] + * StringUtils.splitByLength(abcdef , 2, 4, 1) = [ab, cdef, ] + * StringUtils.splitByLength(abcdefg, 2, 4, 0, 1) = [ab, cdef, , g] + * StringUtils.splitByLength(abcdefg, -1) = {@link IllegalArgumentException} --- End diff -- Another way to deal with negative values could be to treat them as _discard # characters_ : `StringUtils.splitByLength(abcdef, 2, -3, 1) = [ab, f]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user rikles commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30460281 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] --- End diff -- I followed the same logic as in other `StringUtils.split*(...)` methods : return `null` if the input string is `null`. So `StringUtils.splitByLength(null, *)` return `null` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user britter commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30060436 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] --- End diff -- According to ```StringUtils.split(String, char)```this should better return an empty array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user britter commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30061214 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] + * + * StringUtils.splitByLength(abcdefg, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefghij, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefg, 2, 4, 5) = [ab, cdef, g] + * StringUtils.splitByLength(abcdef, 2, 4, 1) = [ab, cdef, null] + * + * StringUtils.splitByLength( abcdef, 2, 4, 1) = [ a, bcde, f] + * StringUtils.splitByLength(abcdef , 2, 4, 1) = [ab, cdef, ] + * StringUtils.splitByLength(abcdefg, 2, 4, 0, 1) = [ab, cdef, , g] + * StringUtils.splitByLength(abcdefg, -1) = {@link IllegalArgumentException} --- End diff -- I'm undecided about this. The ```StringUtils.split(String, String, int)``` method will not fail if negative values are provided. But I'm not sure how to behave for an input like ```[3, -3, 3]```. I think we should handle this like the ```ArrayUtils``` class behaves: it promotes negative indicies to 0. So a negative number should add an empty string to the resulting array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user thiagoh commented on the pull request: https://github.com/apache/commons-lang/pull/75#issuecomment-101048576 is this finished? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user britter commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30061452 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] + * + * StringUtils.splitByLength(abcdefg, 2, 4, 1) = [ab, cdef, g] --- End diff -- What would be the result of ```StringUtils.splitByLength(abcdefg, 2, 2)```? I think it should be [ab, cd efg]. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...
Github user britter commented on a diff in the pull request: https://github.com/apache/commons-lang/pull/75#discussion_r30060493 --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java --- @@ -3277,6 +3277,164 @@ public static String substringBetween(final String str, final String open, final return list.toArray(new String[list.size()]); } +/** + * pSplit a String into an array, using an array of fixed string lengths./p + * + * pIf not null String input, the returned array size is same as the input lengths array./p + * + * pA null input String returns {@code null}. + * A {@code null} or empty input lengths array returns an empty array. + * A {@code 0} in the input lengths array results in en empty string./p + * + * pExtra characters are ignored (ie String length greater than sum of split lengths). + * All empty substrings other than zero length requested, are returned {@code null}./p + * + * pre + * StringUtils.splitByLength(null, *) = null + * StringUtils.splitByLength(abc)= [] + * StringUtils.splitByLength(abc, null) = [] + * StringUtils.splitByLength(abc, [])= [] + * StringUtils.splitByLength(, 2, 4, 1) = [null, null, null] + * + * StringUtils.splitByLength(abcdefg, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefghij, 2, 4, 1) = [ab, cdef, g] + * StringUtils.splitByLength(abcdefg, 2, 4, 5) = [ab, cdef, g] + * StringUtils.splitByLength(abcdef, 2, 4, 1) = [ab, cdef, null] --- End diff -- I'm not sure whether it is a good idea to add null entries to the result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---