[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
GitHub user obermeier reopened a pull request: https://github.com/apache/spark/pull/19408 [SPARK-22180][CORE] Allow IPv6 address in org.apache.spark.util.Utils.parseHostPort External applications like Apache Cassandra are able to deal with IPv6 addresses. Libraries like spark-cassandra-connector combine Apache Cassandra with Apache Spark. This combination is very useful IMHO. One problem is that `org.apache.spark.util.Utils.parseHostPort(hostPort:` `String)` takes the last colon to sepperate the port from host path. This conflicts with literal IPv6 addresses. I think we can take `hostPort` as literal IPv6 address if it contains tow ore more colons. If IPv6 addresses are enclosed in square brackets port definition is still possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/obermeier/spark issue/SPARK-22180 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19408.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19408 commit 453e104c3e6ed6a5ca310f599c274d6c66a3d3c8 Author: Stefan Obermeier Date: 2017-10-01T14:28:58Z [SPARK-22180][CORE] Allow IPv6 address in org.apache.spark.util.Utils.parseHostPort ## What changes were proposed in this pull request? Take ```hostPort``` as literal IPv6 address if it contains tow ore more colons. If IPv6 addresses are enclosed in square brackets port definition is still possible. ## How was this patch tested? Added a new test case into UtilsSuite Remove comment commit 1e6623f518b43df4079bdc2680288063c6be13c6 Author: obermeier Date: 2017-12-14T22:52:21Z Merge branch 'master' into issue/SPARK-22180 commit 1400299808631da0196a61f3588ede786dd0b041 Author: Stefan Obermeier Date: 2017-12-15T07:39:45Z Fix build problem commit 68c322129305a35c9d3e04f8cacc011be5fbaec4 Author: Stefan Obermeier Date: 2017-12-18T13:51:04Z Fix style checks violation. Remove whitespace at end of line. commit 8220d95a99f4564e735f22947cb1cb698613efa5 Author: Stefan Obermeier Date: 2018-10-09T21:54:20Z Add log message if hostname:port is not valid --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier closed the pull request at: https://github.com/apache/spark/pull/19408 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r150909772 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -1146,6 +1146,20 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { } } + test("parseHostPort") { +assert(Utils.parseHostPort("abc:123") === (("abc", 123))) +assert(Utils.parseHostPort("example.com") === (("example.com", 0))) +assert(Utils.parseHostPort("example.com:123") === (("example.com", 123))) +assert(Utils.parseHostPort("127.0.0.1")=== (("127.0.0.1", 0))) +assert(Utils.parseHostPort("127.0.0.1:123")=== (("127.0.0.1", 123))) +assert(Utils.parseHostPort("2001:db8::1") === (("2001:db8::1", 0))) +assert(Utils.parseHostPort("2001:DB8::1") === (("2001:DB8::1", 0))) +assert(Utils.parseHostPort("2001:dB8::1") === (("2001:dB8::1", 0))) +assert(Utils.parseHostPort("0:0:0:0:0:0:0:0") === (("0:0:0:0:0:0:0:0", 0))) +assert(Utils.parseHostPort("::1") === (("::1", 0))) +assert(Utils.parseHostPort("[::1]:123")=== (("[::1]", 123))) +assert(Utils.parseHostPort("[2001:db8:42::1]:123") === (("[2001:db8:42::1]", 123))) --- End diff -- What is the prefered way to handle this kind of parse errors in Spark? Changing the signature of this method to something like :Try[..], :Option ... is no option!? Error log messages? Unchecked Exceptions? ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r150907312 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -982,7 +982,19 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains multiple colons. + // IPv6 addresses enclosed in square brackets like [::1]:123 are not included. + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-fA-F]*):([0-9a-fA-F]*)){2,}")) { + // scalastyle:on SingleSpaceBetweenRParenAndLCurlyBrace +-1 + } else { +// Else last colon defines start of port definition +hostPort.lastIndexOf(':') + } + + // This is potentially broken - when dealing with ipv6 addresses for example, sigh ... // but then hadoop does not support ipv6 right now. // For now, we assume that if port exists, then it is valid - not check if it is an int > 0 --- End diff -- I removed this comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r150798868 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -982,7 +982,19 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains multiple colons. + // IPv6 addresses enclosed in square brackets like [::1]:123 are not included. + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-fA-F]*):([0-9a-fA-F]*)){2,}")) { + // scalastyle:on SingleSpaceBetweenRParenAndLCurlyBrace +-1 + } else { +// Else last colon defines start of port definition +hostPort.lastIndexOf(':') + } + + // This is potentially broken - when dealing with ipv6 addresses for example, sigh ... // but then hadoop does not support ipv6 right now. // For now, we assume that if port exists, then it is valid - not check if it is an int > 0 --- End diff -- I think yes because some checks are not implemented E.g. no check how many colons are used, double colon should appear only once ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r150787836 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -982,7 +982,19 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains multiple colons. + // IPv6 addresses enclosed in square brackets like [::1]:123 are not included. + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-fA-F]*):([0-9a-fA-F]*)){2,}")) { + // scalastyle:on SingleSpaceBetweenRParenAndLCurlyBrace +-1 + } else { +// Else last colon defines start of port definition +hostPort.lastIndexOf(':') + } + + // This is potentially broken - when dealing with ipv6 addresses for example, sigh ... // but then hadoop does not support ipv6 right now. // For now, we assume that if port exists, then it is valid - not check if it is an int > 0 --- End diff -- does this comment still valid? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r150743565 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -1146,6 +1146,20 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { } } + test("parseHostPort") { +assert(Utils.parseHostPort("abc:123") === (("abc", 123))) +assert(Utils.parseHostPort("example.com") === (("example.com", 0))) +assert(Utils.parseHostPort("example.com:123") === (("example.com", 123))) +assert(Utils.parseHostPort("127.0.0.1")=== (("127.0.0.1", 0))) +assert(Utils.parseHostPort("127.0.0.1:123")=== (("127.0.0.1", 123))) +assert(Utils.parseHostPort("2001:db8::1") === (("2001:db8::1", 0))) +assert(Utils.parseHostPort("2001:DB8::1") === (("2001:DB8::1", 0))) +assert(Utils.parseHostPort("2001:dB8::1") === (("2001:dB8::1", 0))) +assert(Utils.parseHostPort("0:0:0:0:0:0:0:0") === (("0:0:0:0:0:0:0:0", 0))) +assert(Utils.parseHostPort("::1") === (("::1", 0))) +assert(Utils.parseHostPort("[::1]:123")=== (("[::1]", 123))) +assert(Utils.parseHostPort("[2001:db8:42::1]:123") === (("[2001:db8:42::1]", 123))) --- End diff -- nit: we should add more test cases to cover the invalid cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user obermeier commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r142047496 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -981,7 +981,13 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains two ore more colons + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-f]*):([0-9a-f]*)){2,}")) -1 --- End diff -- Yes a real parser would be much better!! I hope the methods will check the input. Like the name resolver... At this point I thought more about the separation of the port. I think it is important to check if two colons exists, otherwise this expression accepts hostnames like abc:123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r142038575 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -981,7 +981,13 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains two ore more colons + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-f]*):([0-9a-f]*)){2,}")) -1 + // Else last colon defines start of port definition + else hostPort.lastIndexOf(':') --- End diff -- Final nit, use braces for both parts of the if-else --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r142038567 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -981,7 +981,13 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains two ore more colons + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace + if (hostPort.matches("(([0-9a-f]*):([0-9a-f]*)){2,}")) -1 --- End diff -- Hex digits can be uppercase right? Should the pattern it not be more like `[0-9a-f]*(:[0-9a-f]*)+` match a number, then colon-number colon-number pairs, not number-colon-number number-colon-number sequences? It might end up being equivalent because the match is for 0 or more digits. This allows some strings that it shouldn't like "", but, the purpose isn't to catch every possible case I guess. It would fail name resolution. I thought Inet6Address would just provide parsing for this but I guess not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r142038556 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -981,7 +981,13 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains two ore more colons + // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace --- End diff -- Turn the rule back on after the block? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19408#discussion_r142038519 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -981,7 +981,13 @@ private[spark] object Utils extends Logging { return cached } -val indx: Int = hostPort.lastIndexOf(':') +val indx: Int = + // Interpret hostPort as literal IPv6 address if it contains two ore more colons --- End diff -- Nit: ore You might note here that you're checking that you _don't_ have a `[::1]:123` IPv6 address here -- the braces are key. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6
GitHub user obermeier opened a pull request: https://github.com/apache/spark/pull/19408 [SPARK-22180][CORE] Allow IPv6 External applications like Apache Cassandra are able to deal with IPv6 addresses. Libraries like spark-cassandra-connector combine Apache Cassandra with Apache Spark. This combination is very useful IMHO. One problem is that `org.apache.spark.util.Utils.parseHostPort(hostPort:` `String)` takes the last colon to sepperate the port from host path. This conflicts with literal IPv6 addresses. I think we can take `hostPort` as literal IPv6 address if it contains tow ore more colons. If IPv6 addresses are enclosed in square brackets port definition is still possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/obermeier/spark issue/SPARK-22180 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19408.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19408 commit 3783f85b18540ed8746c078ad9c2f12d7167be9d Author: Stefan ObermeierDate: 2017-10-01T14:28:58Z [SPARK-22180][CORE] Allow IPv6 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org