[ 
https://issues.apache.org/jira/browse/HDFS-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673749#comment-16673749
 ] 

CR Hota commented on HDFS-13834:
--------------------------------

[~elgoiri] Yeah, this is what I was mentioning will get into tricky area. For 
ex after 5 secs, we cannot simply assume that thread has picked up the task and 
worked, we would again check the queue and assert if task is still present, 
(its possible that even after 5sec task is still present), if task is present 
then we cant assert that thread is alive, coz the thread will surely be alive 
as it hasn't done any work and we cant assert if it will be fine if it were to 
pick the task and work on it.

 

> RBF: Connection creator thread should catch Throwable
> -----------------------------------------------------
>
>                 Key: HDFS-13834
>                 URL: https://issues.apache.org/jira/browse/HDFS-13834
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: CR Hota
>            Assignee: CR Hota
>            Priority: Critical
>         Attachments: HDFS-13834-HDFS-13891.0.patch, 
> HDFS-13834-HDFS-13891.1.patch, HDFS-13834-HDFS-13891.2.patch, 
> HDFS-13834-HDFS-13891.3.patch, HDFS-13834-HDFS-13891.4.patch, 
> HDFS-13834.0.patch, HDFS-13834.1.patch
>
>
> Connection creator thread is a single thread thats responsible for creating 
> all downstream namenode connections.
> This is very critical thread and hence should not die understand 
> exception/error scenarios.
> We saw this behavior in production systems where the thread died leaving the 
> router process in bad state.
> The thread should also catch a generic error/exception.
> {code}
>     @Override
>     public void run() {
>       while (this.running) {
>         try {
>           ConnectionPool pool = this.queue.take();
>           try {
>             int total = pool.getNumConnections();
>             int active = pool.getNumActiveConnections();
>             if (pool.getNumConnections() < pool.getMaxSize() &&
>                 active >= MIN_ACTIVE_RATIO * total) {
>               ConnectionContext conn = pool.newConnection();
>               pool.addConnection(conn);
>             } else {
>               LOG.debug("Cannot add more than {} connections to {}",
>                   pool.getMaxSize(), pool);
>             }
>           } catch (IOException e) {
>             LOG.error("Cannot create a new connection", e);
>           }
>         } catch (InterruptedException e) {
>           LOG.error("The connection creator was interrupted");
>           this.running = false;
>         }
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to