[ 
https://issues.apache.org/jira/browse/TEZ-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-988:
---------------------------------

    Attachment: TEZ-988-v5.patch

>> We read the errorstream mainly for keep-alive reuse.  So it makes sense to 
>> wrap the system.setProperty within keepalive check. (Done)
>> Added "&keepAlive=true"

>> http.maxConnections
- This refers to the number connections entries (protocol + host + port) that 
can be maintained in the keepAlive cache.  Default value is 5 which is very low 
for a large cluster.  (e.g for a cluster size of 20 or 500 nodes, maintaining 
only 5 connections in the pool can be very very small).  Also this is a per JVM 
setting (i.e All HttpURLConnection instances of the same JVM would be 
internally sharing the keepAliveCache in the JVM).  Ideally this value should 
be set equal to the number of nodes in the cluster. 

- Number of connections per host is determined by the "keep-alive: max" header. 
 If there is nothing specified, this defaults to 5 as per the JDK's 
implementation.  NodeManager's shuffle handler does not tweak this at server 
side.  We do not need to tweak this, as maintaing more than 1 connection to the 
a host from the same JVM might not be beneficial.

> http.maxConnections needs to be configurable in Tez Fetcher & read from 
> errorstream to make the connection reusable
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-988
>                 URL: https://issues.apache.org/jira/browse/TEZ-988
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-988-v1.patch, TEZ-988-v2.patch, TEZ-988-v3.patch, 
> TEZ-988-v4.patch, TEZ-988-v5.patch
>
>
> 1. Currently http.maxConnections is set to 5 (default).  Make this 
> configurable in Fetcher.java.  This will help in running larger queries
> 2. ErrorStream has to be read completely in order to make the connection 
> reusable (when keepAlive is enabled).  Currently, we do not read error stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to