Re: [PR] [SPARK-48218][SHUFFLE] TransportClientFactory.createClient may NPE cause FetchFailedException [spark]

2024-05-09 Thread via GitHub


mridulm commented on code in PR #46506:
URL: https://github.com/apache/spark/pull/46506#discussion_r1595887602


##
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:
##
@@ -169,8 +169,10 @@ public TransportClient createClient(String remoteHost, int 
remotePort, boolean f
   // this code was able to update things.
   TransportChannelHandler handler = cachedClient.getChannel().pipeline()
 .get(TransportChannelHandler.class);
-  synchronized (handler) {
-handler.getResponseHandler().updateTimeOfLastRequest();
+  if (handler != null) {

Review Comment:
   `synchronized` will throw an NPE if called on `null` - and `handler` is a 
local variable at this point, so the `null` check should be fine.
   
   This should be a fairly rare occurrence - if I am not wrong, this is due to 
socket close between the `isActive` call and the subsequent 
`pipeline().get(TransportChannelHandler.class)` call - but agree with 
@dongjoon-hyun, we have to make sure the exception is at the `sychronized` 
statement itself : which version was this observed against ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48218][SHUFFLE] TransportClientFactory.createClient may NPE cause FetchFailedException [spark]

2024-05-09 Thread via GitHub


mridulm commented on code in PR #46506:
URL: https://github.com/apache/spark/pull/46506#discussion_r1595887602


##
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:
##
@@ -169,8 +169,10 @@ public TransportClient createClient(String remoteHost, int 
remotePort, boolean f
   // this code was able to update things.
   TransportChannelHandler handler = cachedClient.getChannel().pipeline()
 .get(TransportChannelHandler.class);
-  synchronized (handler) {
-handler.getResponseHandler().updateTimeOfLastRequest();
+  if (handler != null) {

Review Comment:
   `synchronized` will throw an NPE if called on `null` - and `handler` is a 
local variable at this point, so the `null` check should be fine.
   
   This should be a fairly rare occurrence - if I am not wrong, this is due to 
socket close between the `isActive` call and the subsequent 
`pipeline().get(TransportChannelHandler.class)` call - but agree with 
@dongjoon-hyun, we have to make sure the exception is at the `sychronized` 
statement itself : which version was this observed against ?
   
   I am fine with the change, assuming @dongjoon-hyun does not have concerns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48218][SHUFFLE] TransportClientFactory.createClient may NPE cause FetchFailedException [spark]

2024-05-09 Thread via GitHub


mridulm commented on code in PR #46506:
URL: https://github.com/apache/spark/pull/46506#discussion_r1595887602


##
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:
##
@@ -169,8 +169,10 @@ public TransportClient createClient(String remoteHost, int 
remotePort, boolean f
   // this code was able to update things.
   TransportChannelHandler handler = cachedClient.getChannel().pipeline()
 .get(TransportChannelHandler.class);
-  synchronized (handler) {
-handler.getResponseHandler().updateTimeOfLastRequest();
+  if (handler != null) {

Review Comment:
   `synchronized` will throw an NPE if called on `null` - and `handler` is a 
local variable at this point, so the `null` check should be fine.
   
   This should be a fairly rare occurrence - if I am not wrong, this is due to 
socket close between the `isActive` call and the subsequent 
`pipeline().get(TransportChannelHandler.class)` call - but agree with 
@dongjoon-hyun, we have to make sure the exception is at the `sychronized` call 
itself.
   
   I am fine with the change, assuming @dongjoon-hyun does not have concerns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48218][SHUFFLE] TransportClientFactory.createClient may NPE cause FetchFailedException [spark]

2024-05-09 Thread via GitHub


mridulm commented on code in PR #46506:
URL: https://github.com/apache/spark/pull/46506#discussion_r1595887602


##
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:
##
@@ -169,8 +169,10 @@ public TransportClient createClient(String remoteHost, int 
remotePort, boolean f
   // this code was able to update things.
   TransportChannelHandler handler = cachedClient.getChannel().pipeline()
 .get(TransportChannelHandler.class);
-  synchronized (handler) {
-handler.getResponseHandler().updateTimeOfLastRequest();
+  if (handler != null) {

Review Comment:
   `synchronized` will throw an NPE if called on `null` - and `handler` is a 
local variable at this point, so the `null` check should be fine.
   
   This should be a fairly rare occurrence - if I am not wrong, this is due to 
socket close between the `isActive` call and the subsequent 
`pipeline().get(TransportChannelHandler.class)` call.
   
   I am fine with the change, assuming @dongjoon-hyun does not have concerns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48218][SHUFFLE] TransportClientFactory.createClient may NPE cause FetchFailedException [spark]

2024-05-09 Thread via GitHub


mridulm commented on code in PR #46506:
URL: https://github.com/apache/spark/pull/46506#discussion_r1595887602


##
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:
##
@@ -169,8 +169,10 @@ public TransportClient createClient(String remoteHost, int 
remotePort, boolean f
   // this code was able to update things.
   TransportChannelHandler handler = cachedClient.getChannel().pipeline()
 .get(TransportChannelHandler.class);
-  synchronized (handler) {
-handler.getResponseHandler().updateTimeOfLastRequest();
+  if (handler != null) {

Review Comment:
   `synchronized` will throw an NPE if called on `null` - and `handler` is a 
local variable at this point, so the `null` check should be fine.
   
   This should be a fairly rare occurrence - if I am not wrong, this is due to 
socket closer between the `isActive` call and the subsequent 
`pipeline().get(TransportChannelHandler.class)` call ?
   
   I am fine with the change, assuming @dongjoon-hyun does not have concerns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48218][SHUFFLE] TransportClientFactory.createClient may NPE cause FetchFailedException [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on code in PR #46506:
URL: https://github.com/apache/spark/pull/46506#discussion_r1595629230


##
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:
##
@@ -169,8 +169,10 @@ public TransportClient createClient(String remoteHost, int 
remotePort, boolean f
   // this code was able to update things.
   TransportChannelHandler handler = cachedClient.getChannel().pipeline()
 .get(TransportChannelHandler.class);
-  synchronized (handler) {
-handler.getResponseHandler().updateTimeOfLastRequest();
+  if (handler != null) {

Review Comment:
   Please let me know if you are observing `synchronized(handler)` line is the 
root cause of NPE. I assumed that NPE happens at `handler.getRes...`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48218][SHUFFLE] TransportClientFactory.createClient may NPE cause FetchFailedException [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on code in PR #46506:
URL: https://github.com/apache/spark/pull/46506#discussion_r1595628352


##
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:
##
@@ -169,8 +169,10 @@ public TransportClient createClient(String remoteHost, int 
remotePort, boolean f
   // this code was able to update things.
   TransportChannelHandler handler = cachedClient.getChannel().pipeline()
 .get(TransportChannelHandler.class);
-  synchronized (handler) {
-handler.getResponseHandler().updateTimeOfLastRequest();
+  if (handler != null) {

Review Comment:
   I'm not sure if this is a safe replacement.
   
   Although I don't know your code, if we need to check nullability, shall we 
do inside `synchronized` block?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48218][SHUFFLE] TransportClientFactory.createClient may NPE cause FetchFailedException [spark]

2024-05-09 Thread via GitHub


dongjoon-hyun commented on code in PR #46506:
URL: https://github.com/apache/spark/pull/46506#discussion_r1595626768


##
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java:
##
@@ -169,8 +169,10 @@ public TransportClient createClient(String remoteHost, int 
remotePort, boolean f
   // this code was able to update things.
   TransportChannelHandler handler = cachedClient.getChannel().pipeline()
 .get(TransportChannelHandler.class);
-  synchronized (handler) {

Review Comment:
   Your error message doesn't match with code. Maybe, it came from your own 
fork?
   ```
   Caused by: java.lang.NullPointerException
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:178)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org