[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939841#comment-16939841 ] Supratim Deka commented on HDDS-2175: - Note from [~aengineer] posted on the github PR: Also are these call stacks something that the end user should ever see? I have always found as user a call stack useless, it might be useful for the developer for debugging purposes, but clients are generally things used by real users. Maybe if these stacks are not logged in the ozone.log, we can log them, provided we can guard them via a config key and by default we do not do that. > Propagate System Exceptions from the OzoneManager > - > > Key: HDDS-2175 > URL: https://issues.apache.org/jira/browse/HDDS-2175 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Exceptions encountered while processing requests on the OM are categorized as > business exceptions and system exceptions. All of the business exceptions are > captured as OMException and have an associated status code which is returned > to the client. The handling of these is not going to be changed. > Currently system exceptions are returned as INTERNAL ERROR to the client with > a 1 line message string from the exception. The scope of this jira is to > capture system exceptions and propagate the related information(including the > complete stack trace) back to the client. > There are 3 sub-tasks required to achieve this > 1. Separate capture and handling for OMException and the other > exceptions(IOException). For system exceptions, use Hadoop IPC > ServiceException mechanism to send the stack trace to the client. > 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and > propagate up to the OzoneManager layer (on the leader). Currently, these > exceptions are not being tracked. > 3. Handle and propagate exceptions from Ratis. > Will raise jira for each sub-task. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939880#comment-16939880 ] Arpit Agarwal commented on HDDS-2175: - I feel that call stacks are invaluable when included in the bug report to the developer. > Propagate System Exceptions from the OzoneManager > - > > Key: HDDS-2175 > URL: https://issues.apache.org/jira/browse/HDDS-2175 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Exceptions encountered while processing requests on the OM are categorized as > business exceptions and system exceptions. All of the business exceptions are > captured as OMException and have an associated status code which is returned > to the client. The handling of these is not going to be changed. > Currently system exceptions are returned as INTERNAL ERROR to the client with > a 1 line message string from the exception. The scope of this jira is to > capture system exceptions and propagate the related information(including the > complete stack trace) back to the client. > There are 3 sub-tasks required to achieve this > 1. Separate capture and handling for OMException and the other > exceptions(IOException). For system exceptions, use Hadoop IPC > ServiceException mechanism to send the stack trace to the client. > 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and > propagate up to the OzoneManager layer (on the leader). Currently, these > exceptions are not being tracked. > 3. Handle and propagate exceptions from Ratis. > Will raise jira for each sub-task. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941392#comment-16941392 ] Anu Engineer commented on HDDS-2175: bq. I feel that call stacks are invaluable when included in the bug report to the developer. I completely agree. As I mentioned in my comment in the Github, they are very useful tools for debugging. But we have to weigh the pros and cons of the approach. Here are some downsides, so I will list them out. 1. Code and Style Consistency - Generally, Errors are propagated via Error code and Message (Goland, C, etc) or Exceptions (Java, C++ etc). When we developed this interface, we choose to go with Error code and Message approach instead of Exceptions. So mixing these different approaches creates very inconsistent code flows. 2. Prevent Java server abstractions from leaking to client side - Java exceptions are very java specific; it is hard to parse these exceptions even when they are part of normal log files. It is difficult to read thru a printed stack to even understand the issue. This gets compounded when Exceptions stack. When we were writing this client interface, we wanted to make sure it is easy to write clients in other languages. A simple, Error code and a message is universal, that all languages understand and easy to write other language clients which can speak this protocol. 3. The current code experience - There are several parts of this code, where the clients print out these messages to the users. If we add exceptions to those strings, the human readability of those error messages goes down. 4. If we want to move to exceptions instead of error codes , it is possible (even though I think our future clients will suffer), but we need to move away from the error/message model. That is lot of work, with very little benefit, other than the fact that we will have a consistent experience and exceptions will flow to the client side. I had a chat with [~sdeka] and I said that I am all for increasing the fidelity of the error codes, that is we can add more error codes if we want to fine tune these messages. I am also all for logging more on the server side. So I am not against the patch, just wanted to avoid *server side Java exceptions crossing over to the client side*. I prefer a clear, simple contract between the server and client, I think it makes it easier for future clients to be developed more easily. > Propagate System Exceptions from the OzoneManager > - > > Key: HDDS-2175 > URL: https://issues.apache.org/jira/browse/HDDS-2175 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Exceptions encountered while processing requests on the OM are categorized as > business exceptions and system exceptions. All of the business exceptions are > captured as OMException and have an associated status code which is returned > to the client. The handling of these is not going to be changed. > Currently system exceptions are returned as INTERNAL ERROR to the client with > a 1 line message string from the exception. The scope of this jira is to > capture system exceptions and propagate the related information(including the > complete stack trace) back to the client. > There are 3 sub-tasks required to achieve this > 1. Separate capture and handling for OMException and the other > exceptions(IOException). For system exceptions, use Hadoop IPC > ServiceException mechanism to send the stack trace to the client. > 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and > propagate up to the OzoneManager layer (on the leader). Currently, these > exceptions are not being tracked. > 3. Handle and propagate exceptions from Ratis. > Will raise jira for each sub-task. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941456#comment-16941456 ] Arpit Agarwal commented on HDDS-2175: - bq. it is hard to parse these exceptions even when they are part of normal log files. And yet these exceptions are a godsend. I would rather see one exception than 10 obscure log messages since it tells me exactly when something 'exceptional' happened and the code path leading to the occurrence. bq. If we add exceptions to those strings, the human readability of those error messages goes down. The readability goes up. You now actually get a sense for what actually went wrong instead of some generic message. bq. I had a chat with Supratim Deka and I said that I am all for increasing the fidelity of the error codes, that is we can add more error codes if we want to fine tune these messages. Lot more work with inferior results. Error codes are terrible in layered systems [since multiple layers will often wind up translating codes|https://twitter.com/Obdurodon/status/1161700056740876289]. The only way to maintain full fidelity is add a new error code for every single failure path, an impossible task. Instead just present the original exception as it happened. This is friendlier for your end users and painless for developers. bq. I prefer a clear, simple contract between the server and client, I think it makes it easier for future clients to be developed more easily. Exceptions as added here will make development of future clients super easy. Since the exception is stringified and propagated over the wire, all the client has to do is print the string without any interpretation. The fears seems unfounded to me. > Propagate System Exceptions from the OzoneManager > - > > Key: HDDS-2175 > URL: https://issues.apache.org/jira/browse/HDDS-2175 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Exceptions encountered while processing requests on the OM are categorized as > business exceptions and system exceptions. All of the business exceptions are > captured as OMException and have an associated status code which is returned > to the client. The handling of these is not going to be changed. > Currently system exceptions are returned as INTERNAL ERROR to the client with > a 1 line message string from the exception. The scope of this jira is to > capture system exceptions and propagate the related information(including the > complete stack trace) back to the client. > There are 3 sub-tasks required to achieve this > 1. Separate capture and handling for OMException and the other > exceptions(IOException). For system exceptions, use Hadoop IPC > ServiceException mechanism to send the stack trace to the client. > 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and > propagate up to the OzoneManager layer (on the leader). Currently, these > exceptions are not being tracked. > 3. Handle and propagate exceptions from Ratis. > Will raise jira for each sub-task. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941476#comment-16941476 ] Anu Engineer commented on HDDS-2175: It is something that I disagree with. But if you feel strongly about this; please go ahead. > Propagate System Exceptions from the OzoneManager > - > > Key: HDDS-2175 > URL: https://issues.apache.org/jira/browse/HDDS-2175 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Exceptions encountered while processing requests on the OM are categorized as > business exceptions and system exceptions. All of the business exceptions are > captured as OMException and have an associated status code which is returned > to the client. The handling of these is not going to be changed. > Currently system exceptions are returned as INTERNAL ERROR to the client with > a 1 line message string from the exception. The scope of this jira is to > capture system exceptions and propagate the related information(including the > complete stack trace) back to the client. > There are 3 sub-tasks required to achieve this > 1. Separate capture and handling for OMException and the other > exceptions(IOException). For system exceptions, use Hadoop IPC > ServiceException mechanism to send the stack trace to the client. > 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and > propagate up to the OzoneManager layer (on the leader). Currently, these > exceptions are not being tracked. > 3. Handle and propagate exceptions from Ratis. > Will raise jira for each sub-task. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942165#comment-16942165 ] Arpit Agarwal commented on HDDS-2175: - C++ exceptions are [widely considered broken|http://yosefk.com/c++fqa/defective.html#defect-10] so we can't directly compare C++ best practices with Java. Golang not having exceptions is a step backwards for debuggability. Perhaps it works well for Google, for mere mortals like me exceptions are a boon. :) It is especially valuable in this phase of Ozone where we are stabilizing it. bq. But as I said; I think the disagreement is a question of taste; so I do not want perfect to be the enemy of good Thanks for giving the option to go ahead. One thing we can do is make this behavior configurable. In the future we can turn it off entirely if it turns out not to be useful. > Propagate System Exceptions from the OzoneManager > - > > Key: HDDS-2175 > URL: https://issues.apache.org/jira/browse/HDDS-2175 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Exceptions encountered while processing requests on the OM are categorized as > business exceptions and system exceptions. All of the business exceptions are > captured as OMException and have an associated status code which is returned > to the client. The handling of these is not going to be changed. > Currently system exceptions are returned as INTERNAL ERROR to the client with > a 1 line message string from the exception. The scope of this jira is to > capture system exceptions and propagate the related information(including the > complete stack trace) back to the client. > There are 3 sub-tasks required to achieve this > 1. Separate capture and handling for OMException and the other > exceptions(IOException). For system exceptions, use Hadoop IPC > ServiceException mechanism to send the stack trace to the client. > 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and > propagate up to the OzoneManager layer (on the leader). Currently, these > exceptions are not being tracked. > 3. Handle and propagate exceptions from Ratis. > Will raise jira for each sub-task. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2175) Propagate System Exceptions from the OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942183#comment-16942183 ] Arpit Agarwal commented on HDDS-2175: - Thank you for the link to the paper. It looks like a great weekend read. This quote from chapter 1 stands out: bq. While it is widely accepted that exception handling has a number of problems, it is the best we currently have available[38, 72]. > Propagate System Exceptions from the OzoneManager > - > > Key: HDDS-2175 > URL: https://issues.apache.org/jira/browse/HDDS-2175 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Exceptions encountered while processing requests on the OM are categorized as > business exceptions and system exceptions. All of the business exceptions are > captured as OMException and have an associated status code which is returned > to the client. The handling of these is not going to be changed. > Currently system exceptions are returned as INTERNAL ERROR to the client with > a 1 line message string from the exception. The scope of this jira is to > capture system exceptions and propagate the related information(including the > complete stack trace) back to the client. > There are 3 sub-tasks required to achieve this > 1. Separate capture and handling for OMException and the other > exceptions(IOException). For system exceptions, use Hadoop IPC > ServiceException mechanism to send the stack trace to the client. > 2. track and propagate exceptions inside Ratis OzoneManagerStateMachine and > propagate up to the OzoneManager layer (on the leader). Currently, these > exceptions are not being tracked. > 3. Handle and propagate exceptions from Ratis. > Will raise jira for each sub-task. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org