[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2018-10-17 Thread obermeier
GitHub user obermeier reopened a pull request:

https://github.com/apache/spark/pull/19408

[SPARK-22180][CORE] Allow IPv6 address in 
org.apache.spark.util.Utils.parseHostPort

External applications like Apache Cassandra are able to deal with IPv6 
addresses. Libraries like spark-cassandra-connector combine Apache Cassandra 
with Apache Spark.
This combination is very useful IMHO.

One problem is that `org.apache.spark.util.Utils.parseHostPort(hostPort:` 
`String)` takes the last colon to sepperate the port from host path. This 
conflicts with literal IPv6 addresses.

I think we can take `hostPort` as literal IPv6 address if it contains tow 
ore more colons. If IPv6 addresses are enclosed in square brackets port 
definition is still possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/obermeier/spark issue/SPARK-22180

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19408.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19408


commit 453e104c3e6ed6a5ca310f599c274d6c66a3d3c8
Author: Stefan Obermeier 
Date:   2017-10-01T14:28:58Z

[SPARK-22180][CORE] Allow IPv6 address in 
org.apache.spark.util.Utils.parseHostPort

## What changes were proposed in this pull request?

Take ```hostPort``` as literal IPv6 address if it contains tow ore more 
colons. If IPv6 addresses are enclosed in square brackets port definition is 
still possible.

## How was this patch tested?

Added a new test case into UtilsSuite

Remove comment

commit 1e6623f518b43df4079bdc2680288063c6be13c6
Author: obermeier 
Date:   2017-12-14T22:52:21Z

Merge branch 'master' into issue/SPARK-22180

commit 1400299808631da0196a61f3588ede786dd0b041
Author: Stefan Obermeier 
Date:   2017-12-15T07:39:45Z

Fix build problem

commit 68c322129305a35c9d3e04f8cacc011be5fbaec4
Author: Stefan Obermeier 
Date:   2017-12-18T13:51:04Z

Fix style checks violation.

Remove whitespace at end of line.

commit 8220d95a99f4564e735f22947cb1cb698613efa5
Author: Stefan Obermeier 
Date:   2018-10-09T21:54:20Z

Add log message if hostname:port is not valid




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2018-10-13 Thread obermeier
Github user obermeier closed the pull request at:

https://github.com/apache/spark/pull/19408


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-11-14 Thread obermeier
Github user obermeier commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r150909772
  
--- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
@@ -1146,6 +1146,20 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 }
   }
 
+  test("parseHostPort") {
+assert(Utils.parseHostPort("abc:123")  === (("abc", 123)))
+assert(Utils.parseHostPort("example.com")  === (("example.com", 
0)))
+assert(Utils.parseHostPort("example.com:123")  === (("example.com", 
123)))
+assert(Utils.parseHostPort("127.0.0.1")=== (("127.0.0.1", 0)))
+assert(Utils.parseHostPort("127.0.0.1:123")=== (("127.0.0.1", 
123)))
+assert(Utils.parseHostPort("2001:db8::1")  === (("2001:db8::1", 
0)))
+assert(Utils.parseHostPort("2001:DB8::1")  === (("2001:DB8::1", 
0)))
+assert(Utils.parseHostPort("2001:dB8::1")  === (("2001:dB8::1", 
0)))
+assert(Utils.parseHostPort("0:0:0:0:0:0:0:0")  === 
(("0:0:0:0:0:0:0:0", 0)))
+assert(Utils.parseHostPort("::1")  === (("::1", 0)))
+assert(Utils.parseHostPort("[::1]:123")=== (("[::1]", 123)))
+assert(Utils.parseHostPort("[2001:db8:42::1]:123")  === 
(("[2001:db8:42::1]", 123)))
--- End diff --

What is the prefered way to handle this kind of parse errors in Spark?
Changing the signature of this method to something like :Try[..], :Option 
... is no option!?
Error log messages?
Unchecked Exceptions?
...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-11-14 Thread obermeier
Github user obermeier commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r150907312
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -982,7 +982,19 @@ private[spark] object Utils extends Logging {
   return cached
 }
 
-val indx: Int = hostPort.lastIndexOf(':')
+val indx: Int =
+  // Interpret hostPort as literal IPv6 address if it contains 
multiple colons.
+  // IPv6 addresses  enclosed in square brackets like [::1]:123 are 
not included.
+  // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace
+  if (hostPort.matches("(([0-9a-fA-F]*):([0-9a-fA-F]*)){2,}")) {
+  // scalastyle:on SingleSpaceBetweenRParenAndLCurlyBrace
+-1
+  } else {
+// Else last colon defines start of port definition
+hostPort.lastIndexOf(':')
+  }
+
+
 // This is potentially broken - when dealing with ipv6 addresses for 
example, sigh ...
 // but then hadoop does not support ipv6 right now.
 // For now, we assume that if port exists, then it is valid - not 
check if it is an int > 0
--- End diff --

I removed this comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-11-14 Thread obermeier
Github user obermeier commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r150798868
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -982,7 +982,19 @@ private[spark] object Utils extends Logging {
   return cached
 }
 
-val indx: Int = hostPort.lastIndexOf(':')
+val indx: Int =
+  // Interpret hostPort as literal IPv6 address if it contains 
multiple colons.
+  // IPv6 addresses  enclosed in square brackets like [::1]:123 are 
not included.
+  // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace
+  if (hostPort.matches("(([0-9a-fA-F]*):([0-9a-fA-F]*)){2,}")) {
+  // scalastyle:on SingleSpaceBetweenRParenAndLCurlyBrace
+-1
+  } else {
+// Else last colon defines start of port definition
+hostPort.lastIndexOf(':')
+  }
+
+
 // This is potentially broken - when dealing with ipv6 addresses for 
example, sigh ...
 // but then hadoop does not support ipv6 right now.
 // For now, we assume that if port exists, then it is valid - not 
check if it is an int > 0
--- End diff --

I think yes because some checks are not implemented 
E.g. no check how many colons are used,  double colon should appear only 
once ...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-11-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r150787836
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -982,7 +982,19 @@ private[spark] object Utils extends Logging {
   return cached
 }
 
-val indx: Int = hostPort.lastIndexOf(':')
+val indx: Int =
+  // Interpret hostPort as literal IPv6 address if it contains 
multiple colons.
+  // IPv6 addresses  enclosed in square brackets like [::1]:123 are 
not included.
+  // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace
+  if (hostPort.matches("(([0-9a-fA-F]*):([0-9a-fA-F]*)){2,}")) {
+  // scalastyle:on SingleSpaceBetweenRParenAndLCurlyBrace
+-1
+  } else {
+// Else last colon defines start of port definition
+hostPort.lastIndexOf(':')
+  }
+
+
 // This is potentially broken - when dealing with ipv6 addresses for 
example, sigh ...
 // but then hadoop does not support ipv6 right now.
 // For now, we assume that if port exists, then it is valid - not 
check if it is an int > 0
--- End diff --

does this comment still valid?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-11-13 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r150743565
  
--- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
@@ -1146,6 +1146,20 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 }
   }
 
+  test("parseHostPort") {
+assert(Utils.parseHostPort("abc:123")  === (("abc", 123)))
+assert(Utils.parseHostPort("example.com")  === (("example.com", 
0)))
+assert(Utils.parseHostPort("example.com:123")  === (("example.com", 
123)))
+assert(Utils.parseHostPort("127.0.0.1")=== (("127.0.0.1", 0)))
+assert(Utils.parseHostPort("127.0.0.1:123")=== (("127.0.0.1", 
123)))
+assert(Utils.parseHostPort("2001:db8::1")  === (("2001:db8::1", 
0)))
+assert(Utils.parseHostPort("2001:DB8::1")  === (("2001:DB8::1", 
0)))
+assert(Utils.parseHostPort("2001:dB8::1")  === (("2001:dB8::1", 
0)))
+assert(Utils.parseHostPort("0:0:0:0:0:0:0:0")  === 
(("0:0:0:0:0:0:0:0", 0)))
+assert(Utils.parseHostPort("::1")  === (("::1", 0)))
+assert(Utils.parseHostPort("[::1]:123")=== (("[::1]", 123)))
+assert(Utils.parseHostPort("[2001:db8:42::1]:123")  === 
(("[2001:db8:42::1]", 123)))
--- End diff --

nit: we should add more test cases to cover the invalid cases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-10-01 Thread obermeier
Github user obermeier commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r142047496
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -981,7 +981,13 @@ private[spark] object Utils extends Logging {
   return cached
 }
 
-val indx: Int = hostPort.lastIndexOf(':')
+val indx: Int =
+  // Interpret hostPort as literal IPv6 address if it contains two ore 
more colons
+  // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace
+  if (hostPort.matches("(([0-9a-f]*):([0-9a-f]*)){2,}")) -1
--- End diff --

Yes a real parser would be much better!! 
I hope the methods will check the input. Like the name resolver... 

At this point I thought more about the separation of the port.
I think it is important to check if two colons exists, otherwise this 
expression accepts hostnames like abc:123


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-10-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r142038575
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -981,7 +981,13 @@ private[spark] object Utils extends Logging {
   return cached
 }
 
-val indx: Int = hostPort.lastIndexOf(':')
+val indx: Int =
+  // Interpret hostPort as literal IPv6 address if it contains two ore 
more colons
+  // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace
+  if (hostPort.matches("(([0-9a-f]*):([0-9a-f]*)){2,}")) -1
+  // Else last colon defines start of port definition
+  else hostPort.lastIndexOf(':')
--- End diff --

Final nit, use braces for both parts of the if-else


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-10-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r142038567
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -981,7 +981,13 @@ private[spark] object Utils extends Logging {
   return cached
 }
 
-val indx: Int = hostPort.lastIndexOf(':')
+val indx: Int =
+  // Interpret hostPort as literal IPv6 address if it contains two ore 
more colons
+  // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace
+  if (hostPort.matches("(([0-9a-f]*):([0-9a-f]*)){2,}")) -1
--- End diff --

Hex digits can be uppercase right?
Should the pattern it not be more like `[0-9a-f]*(:[0-9a-f]*)+` match a 
number, then colon-number colon-number pairs, not number-colon-number 
number-colon-number sequences?
It might end up being equivalent because the match is for 0 or more digits.

This allows some strings that it shouldn't like "", but, the purpose 
isn't to catch every possible case I guess. It would fail name resolution.

I thought Inet6Address would just provide parsing for this but I guess not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-10-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r142038556
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -981,7 +981,13 @@ private[spark] object Utils extends Logging {
   return cached
 }
 
-val indx: Int = hostPort.lastIndexOf(':')
+val indx: Int =
+  // Interpret hostPort as literal IPv6 address if it contains two ore 
more colons
+  // scalastyle:off SingleSpaceBetweenRParenAndLCurlyBrace
--- End diff --

Turn the rule back on after the block?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6 address in org.apa...

2017-10-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19408#discussion_r142038519
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -981,7 +981,13 @@ private[spark] object Utils extends Logging {
   return cached
 }
 
-val indx: Int = hostPort.lastIndexOf(':')
+val indx: Int =
+  // Interpret hostPort as literal IPv6 address if it contains two ore 
more colons
--- End diff --

Nit: ore
You might note here that you're checking that you _don't_ have a 
`[::1]:123` IPv6 address here -- the braces are key.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19408: [SPARK-22180][CORE] Allow IPv6

2017-10-01 Thread obermeier
GitHub user obermeier opened a pull request:

https://github.com/apache/spark/pull/19408

[SPARK-22180][CORE] Allow IPv6

External applications like Apache Cassandra are able to deal with IPv6 
addresses. Libraries like spark-cassandra-connector combine Apache Cassandra 
with Apache Spark.
This combination is very useful IMHO.

One problem is that `org.apache.spark.util.Utils.parseHostPort(hostPort:` 
`String)` takes the last colon to sepperate the port from host path. This 
conflicts with literal IPv6 addresses.

I think we can take `hostPort` as literal IPv6 address if it contains tow 
ore more colons. If IPv6 addresses are enclosed in square brackets port 
definition is still possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/obermeier/spark issue/SPARK-22180

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19408.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19408


commit 3783f85b18540ed8746c078ad9c2f12d7167be9d
Author: Stefan Obermeier 
Date:   2017-10-01T14:28:58Z

[SPARK-22180][CORE] Allow IPv6




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org