[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client

2018-03-11 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394755#comment-16394755
 ] 

He Xiaoqiao commented on HDFS-13234:


Thanks [~kihwal] for your detailed comments.
It is interesting issues (HADOOP-11223 and HADOOP-9570) for resolving 
duplicated Configuration instances. But I am not sure if these issue are 
complete solution for huge memory footprint waste of the case mentioned above. 
Beside HADOOP-11223 and HADOOP-9570, I think it is necessary to maintain 
incremental change for Configuration, thus Configuration::getDefault() + 
Incremental Change could form the complete configuration and no unintended conf 
update propagation, meaning while it could reduce memory footprint. If I am 
wrong please correct me.
Thanks again.

> Remove renew configuration instance in ConfiguredFailoverProxyProvider and 
> reduce memory footprint for client
> -
>
> Key: HDFS-13234
> URL: https://issues.apache.org/jira/browse/HDFS-13234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, ha, hdfs-client
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13234.001.patch
>
>
> The memory footprint of #DFSClient is very considerable in some special 
> scenario since there are many #Configuration instances and occupy much memory 
> resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies 
> over 600MB we meet under HDFS Federation an HA with QJM and there are dozens 
> of NameNodes). I think some new Configuration instance is not necessary. Such 
> as  #ConfiguredFailoverProxyProvider initialization.
> {code:java}
>   public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
>   Class xface, HAProxyFactory factory) {
> this.xface = xface;
> this.conf = new Configuration(conf);
> ..
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client

2018-03-08 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391519#comment-16391519
 ] 

Kihwal Lee commented on HDFS-13234:
---

[~jlowe] and I discussed a bit about the conf issue this morning. Configuration 
has both performance and memory foot print issue, but coming up with a single 
generic solution to solve them for all use cases is difficult, if not 
impossible. That's one of the road blocks many previous improvement attempts 
have met. For use cases that do not require refreshing, we can have a single 
mutable instance to load/reload all resources, instead of duplicating for each 
config instance. Each new conf can have its own "overlay" map internally to 
keep track of locally set keys/values. For the keys not found in this map, it 
will look them up in the base instance. The look-ups will get a bit more 
expensive, but it avoids problem of multiple resource reloads and object 
duplication. Since this might not work well with refreshable configs, it would 
be better to make it a new feature (i.e. a new version of ctor) and offer it 
opt-in basis. I think most client-side code will be able to take advantage of 
this.

Related: HADOOP-11223 and HADOOP-9570

We can start a design/feasibility discussion,  if there is enough interest.

> Remove renew configuration instance in ConfiguredFailoverProxyProvider and 
> reduce memory footprint for client
> -
>
> Key: HDFS-13234
> URL: https://issues.apache.org/jira/browse/HDFS-13234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, ha, hdfs-client
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13234.001.patch
>
>
> The memory footprint of #DFSClient is very considerable in some special 
> scenario since there are many #Configuration instances and occupy much memory 
> resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies 
> over 600MB we meet under HDFS Federation an HA with QJM and there are dozens 
> of NameNodes). I think some new Configuration instance is not necessary. Such 
> as  #ConfiguredFailoverProxyProvider initialization.
> {code:java}
>   public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
>   Class xface, HAProxyFactory factory) {
> this.xface = xface;
> this.conf = new Configuration(conf);
> ..
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client

2018-03-07 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390739#comment-16390739
 ] 

He Xiaoqiao commented on HDFS-13234:


Thanks [~kihwal],[~elgoiri] for your comments.
{quote}How big is a single instance in your use case? Bloated conf in dfs 
client is obviously a serious issue, but it can create bigger issues in 
apps/jobs.{quote}
Actually this is yarn logs upload service, and the size of single 
{{Configuration}} instance which located at NodeManager is about 120KB, but it 
is bloated to 600MB over all {{Configuration}} instances since two factors:
a. HDFS Federation + HA with QJM and there are dozens of nameservices (~20), 
and it create {{ConfiguredFailoverProxyProvider}} instance for each name 
service at client,  while num of {{Configuration}} instances will *2;
b. there are 150 single threads at most to execute upload yarn logs to HDFS;
so, in the extreme case, memory footprint of {{Configuration}} instances will 
occupy ~20 * 2 * 150 * 120KB;

{quote}New conf objects are created to prevent unintended conf update 
propagation. {quote}
it is true to prevent unintended conf update propagation, I think there are 
other ways to avoid clone the whole conf for only two parameters of 
{{ConfiguredFailoverProxyProvider}} and {{IPFailoverProxyProvider}} and waste 
huge memory resource probably as you mentioned, is there some suggestions? 
[~kihwal]

Thanks again.

> Remove renew configuration instance in ConfiguredFailoverProxyProvider and 
> reduce memory footprint for client
> -
>
> Key: HDFS-13234
> URL: https://issues.apache.org/jira/browse/HDFS-13234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, ha, hdfs-client
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13234.001.patch
>
>
> The memory footprint of #DFSClient is very considerable in some special 
> scenario since there are many #Configuration instances and occupy much memory 
> resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies 
> over 600MB we meet under HDFS Federation an HA with QJM and there are dozens 
> of NameNodes). I think some new Configuration instance is not necessary. Such 
> as  #ConfiguredFailoverProxyProvider initialization.
> {code:java}
>   public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
>   Class xface, HAProxyFactory factory) {
> this.xface = xface;
> this.conf = new Configuration(conf);
> ..
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client

2018-03-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390298#comment-16390298
 ] 

Kihwal Lee commented on HDFS-13234:
---

New conf objects are created to prevent unintended conf update propagation.  If 
we have an overlay config feature, we could achieve the same thing without 
duplicating the entire conf object.  Configuration has something overlay-like, 
but I was told it does not work the way we want.

> Remove renew configuration instance in ConfiguredFailoverProxyProvider and 
> reduce memory footprint for client
> -
>
> Key: HDFS-13234
> URL: https://issues.apache.org/jira/browse/HDFS-13234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, ha, hdfs-client
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13234.001.patch
>
>
> The memory footprint of #DFSClient is very considerable in some special 
> scenario since there are many #Configuration instances and occupy much memory 
> resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies 
> over 600MB we meet under HDFS Federation an HA with QJM and there are dozens 
> of NameNodes). I think some new Configuration instance is not necessary. Such 
> as  #ConfiguredFailoverProxyProvider initialization.
> {code:java}
>   public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
>   Class xface, HAProxyFactory factory) {
> this.xface = xface;
> this.conf = new Configuration(conf);
> ..
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client

2018-03-07 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390147#comment-16390147
 ] 

Kihwal Lee commented on HDFS-13234:
---

bq. Configuration occupies over 600MB
How big is a single instance in your use case? Bloated conf in dfs client is 
obviously a serious issue, but it can create bigger issues in apps/jobs. 
Sometimes a conf can get embedded in another conf.  Avoiding unnecessarily 
duplicated confs is a good thing, but looking into what is causing the bloat 
and fixing that will also be important.

> Remove renew configuration instance in ConfiguredFailoverProxyProvider and 
> reduce memory footprint for client
> -
>
> Key: HDFS-13234
> URL: https://issues.apache.org/jira/browse/HDFS-13234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, ha, hdfs-client
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13234.001.patch
>
>
> The memory footprint of #DFSClient is very considerable in some special 
> scenario since there are many #Configuration instances and occupy much memory 
> resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies 
> over 600MB we meet under HDFS Federation an HA with QJM and there are dozens 
> of NameNodes). I think some new Configuration instance is not necessary. Such 
> as  #ConfiguredFailoverProxyProvider initialization.
> {code:java}
>   public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
>   Class xface, HAProxyFactory factory) {
> this.xface = xface;
> this.conf = new Configuration(conf);
> ..
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client

2018-03-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389843#comment-16389843
 ] 

Íñigo Goiri commented on HDFS-13234:


[~kihwal], you had some good points on HDFS-13195 to a somewhat related topic.
Do you mind chiming in here?

> Remove renew configuration instance in ConfiguredFailoverProxyProvider and 
> reduce memory footprint for client
> -
>
> Key: HDFS-13234
> URL: https://issues.apache.org/jira/browse/HDFS-13234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, ha, hdfs-client
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13234.001.patch
>
>
> The memory footprint of #DFSClient is very considerable in some special 
> scenario since there are many #Configuration instances and occupy much memory 
> resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies 
> over 600MB we meet under HDFS Federation an HA with QJM and there are dozens 
> of NameNodes). I think some new Configuration instance is not necessary. Such 
> as  #ConfiguredFailoverProxyProvider initialization.
> {code:java}
>   public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
>   Class xface, HAProxyFactory factory) {
> this.xface = xface;
> this.conf = new Configuration(conf);
> ..
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client

2018-03-07 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389346#comment-16389346
 ] 

He Xiaoqiao commented on HDFS-13234:


upload patch v1 for trunk and pending jenkins.

> Remove renew configuration instance in ConfiguredFailoverProxyProvider and 
> reduce memory footprint for client
> -
>
> Key: HDFS-13234
> URL: https://issues.apache.org/jira/browse/HDFS-13234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, ha, hdfs-client
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13234.001.patch
>
>
> The memory footprint of #DFSClient is very considerable in some special 
> scenario since there are many #Configuration instances and occupy much memory 
> resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies 
> over 600MB we meet under HDFS Federation an HA with QJM and there are dozens 
> of NameNodes). I think some new Configuration instance is not necessary. Such 
> as  #ConfiguredFailoverProxyProvider initialization.
> {code:java}
>   public ConfiguredFailoverProxyProvider(Configuration conf, URI uri,
>   Class xface, HAProxyFactory factory) {
> this.xface = xface;
> this.conf = new Configuration(conf);
> ..
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org