[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client
[ https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394755#comment-16394755 ] He Xiaoqiao commented on HDFS-13234: Thanks [~kihwal] for your detailed comments. It is interesting issues (HADOOP-11223 and HADOOP-9570) for resolving duplicated Configuration instances. But I am not sure if these issue are complete solution for huge memory footprint waste of the case mentioned above. Beside HADOOP-11223 and HADOOP-9570, I think it is necessary to maintain incremental change for Configuration, thus Configuration::getDefault() + Incremental Change could form the complete configuration and no unintended conf update propagation, meaning while it could reduce memory footprint. If I am wrong please correct me. Thanks again. > Remove renew configuration instance in ConfiguredFailoverProxyProvider and > reduce memory footprint for client > - > > Key: HDFS-13234 > URL: https://issues.apache.org/jira/browse/HDFS-13234 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, ha, hdfs-client >Reporter: He Xiaoqiao >Priority: Major > Attachments: HDFS-13234.001.patch > > > The memory footprint of #DFSClient is very considerable in some special > scenario since there are many #Configuration instances and occupy much memory > resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies > over 600MB we meet under HDFS Federation an HA with QJM and there are dozens > of NameNodes). I think some new Configuration instance is not necessary. Such > as #ConfiguredFailoverProxyProvider initialization. > {code:java} > public ConfiguredFailoverProxyProvider(Configuration conf, URI uri, > Class xface, HAProxyFactory factory) { > this.xface = xface; > this.conf = new Configuration(conf); > .. > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client
[ https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391519#comment-16391519 ] Kihwal Lee commented on HDFS-13234: --- [~jlowe] and I discussed a bit about the conf issue this morning. Configuration has both performance and memory foot print issue, but coming up with a single generic solution to solve them for all use cases is difficult, if not impossible. That's one of the road blocks many previous improvement attempts have met. For use cases that do not require refreshing, we can have a single mutable instance to load/reload all resources, instead of duplicating for each config instance. Each new conf can have its own "overlay" map internally to keep track of locally set keys/values. For the keys not found in this map, it will look them up in the base instance. The look-ups will get a bit more expensive, but it avoids problem of multiple resource reloads and object duplication. Since this might not work well with refreshable configs, it would be better to make it a new feature (i.e. a new version of ctor) and offer it opt-in basis. I think most client-side code will be able to take advantage of this. Related: HADOOP-11223 and HADOOP-9570 We can start a design/feasibility discussion, if there is enough interest. > Remove renew configuration instance in ConfiguredFailoverProxyProvider and > reduce memory footprint for client > - > > Key: HDFS-13234 > URL: https://issues.apache.org/jira/browse/HDFS-13234 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, ha, hdfs-client >Reporter: He Xiaoqiao >Priority: Major > Attachments: HDFS-13234.001.patch > > > The memory footprint of #DFSClient is very considerable in some special > scenario since there are many #Configuration instances and occupy much memory > resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies > over 600MB we meet under HDFS Federation an HA with QJM and there are dozens > of NameNodes). I think some new Configuration instance is not necessary. Such > as #ConfiguredFailoverProxyProvider initialization. > {code:java} > public ConfiguredFailoverProxyProvider(Configuration conf, URI uri, > Class xface, HAProxyFactory factory) { > this.xface = xface; > this.conf = new Configuration(conf); > .. > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client
[ https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390739#comment-16390739 ] He Xiaoqiao commented on HDFS-13234: Thanks [~kihwal],[~elgoiri] for your comments. {quote}How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs.{quote} Actually this is yarn logs upload service, and the size of single {{Configuration}} instance which located at NodeManager is about 120KB, but it is bloated to 600MB over all {{Configuration}} instances since two factors: a. HDFS Federation + HA with QJM and there are dozens of nameservices (~20), and it create {{ConfiguredFailoverProxyProvider}} instance for each name service at client, while num of {{Configuration}} instances will *2; b. there are 150 single threads at most to execute upload yarn logs to HDFS; so, in the extreme case, memory footprint of {{Configuration}} instances will occupy ~20 * 2 * 150 * 120KB; {quote}New conf objects are created to prevent unintended conf update propagation. {quote} it is true to prevent unintended conf update propagation, I think there are other ways to avoid clone the whole conf for only two parameters of {{ConfiguredFailoverProxyProvider}} and {{IPFailoverProxyProvider}} and waste huge memory resource probably as you mentioned, is there some suggestions? [~kihwal] Thanks again. > Remove renew configuration instance in ConfiguredFailoverProxyProvider and > reduce memory footprint for client > - > > Key: HDFS-13234 > URL: https://issues.apache.org/jira/browse/HDFS-13234 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, ha, hdfs-client >Reporter: He Xiaoqiao >Priority: Major > Attachments: HDFS-13234.001.patch > > > The memory footprint of #DFSClient is very considerable in some special > scenario since there are many #Configuration instances and occupy much memory > resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies > over 600MB we meet under HDFS Federation an HA with QJM and there are dozens > of NameNodes). I think some new Configuration instance is not necessary. Such > as #ConfiguredFailoverProxyProvider initialization. > {code:java} > public ConfiguredFailoverProxyProvider(Configuration conf, URI uri, > Class xface, HAProxyFactory factory) { > this.xface = xface; > this.conf = new Configuration(conf); > .. > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client
[ https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390298#comment-16390298 ] Kihwal Lee commented on HDFS-13234: --- New conf objects are created to prevent unintended conf update propagation. If we have an overlay config feature, we could achieve the same thing without duplicating the entire conf object. Configuration has something overlay-like, but I was told it does not work the way we want. > Remove renew configuration instance in ConfiguredFailoverProxyProvider and > reduce memory footprint for client > - > > Key: HDFS-13234 > URL: https://issues.apache.org/jira/browse/HDFS-13234 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, ha, hdfs-client >Reporter: He Xiaoqiao >Priority: Major > Attachments: HDFS-13234.001.patch > > > The memory footprint of #DFSClient is very considerable in some special > scenario since there are many #Configuration instances and occupy much memory > resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies > over 600MB we meet under HDFS Federation an HA with QJM and there are dozens > of NameNodes). I think some new Configuration instance is not necessary. Such > as #ConfiguredFailoverProxyProvider initialization. > {code:java} > public ConfiguredFailoverProxyProvider(Configuration conf, URI uri, > Class xface, HAProxyFactory factory) { > this.xface = xface; > this.conf = new Configuration(conf); > .. > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client
[ https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390147#comment-16390147 ] Kihwal Lee commented on HDFS-13234: --- bq. Configuration occupies over 600MB How big is a single instance in your use case? Bloated conf in dfs client is obviously a serious issue, but it can create bigger issues in apps/jobs. Sometimes a conf can get embedded in another conf. Avoiding unnecessarily duplicated confs is a good thing, but looking into what is causing the bloat and fixing that will also be important. > Remove renew configuration instance in ConfiguredFailoverProxyProvider and > reduce memory footprint for client > - > > Key: HDFS-13234 > URL: https://issues.apache.org/jira/browse/HDFS-13234 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, ha, hdfs-client >Reporter: He Xiaoqiao >Priority: Major > Attachments: HDFS-13234.001.patch > > > The memory footprint of #DFSClient is very considerable in some special > scenario since there are many #Configuration instances and occupy much memory > resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies > over 600MB we meet under HDFS Federation an HA with QJM and there are dozens > of NameNodes). I think some new Configuration instance is not necessary. Such > as #ConfiguredFailoverProxyProvider initialization. > {code:java} > public ConfiguredFailoverProxyProvider(Configuration conf, URI uri, > Class xface, HAProxyFactory factory) { > this.xface = xface; > this.conf = new Configuration(conf); > .. > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client
[ https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389843#comment-16389843 ] Íñigo Goiri commented on HDFS-13234: [~kihwal], you had some good points on HDFS-13195 to a somewhat related topic. Do you mind chiming in here? > Remove renew configuration instance in ConfiguredFailoverProxyProvider and > reduce memory footprint for client > - > > Key: HDFS-13234 > URL: https://issues.apache.org/jira/browse/HDFS-13234 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, ha, hdfs-client >Reporter: He Xiaoqiao >Priority: Major > Attachments: HDFS-13234.001.patch > > > The memory footprint of #DFSClient is very considerable in some special > scenario since there are many #Configuration instances and occupy much memory > resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies > over 600MB we meet under HDFS Federation an HA with QJM and there are dozens > of NameNodes). I think some new Configuration instance is not necessary. Such > as #ConfiguredFailoverProxyProvider initialization. > {code:java} > public ConfiguredFailoverProxyProvider(Configuration conf, URI uri, > Class xface, HAProxyFactory factory) { > this.xface = xface; > this.conf = new Configuration(conf); > .. > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13234) Remove renew configuration instance in ConfiguredFailoverProxyProvider and reduce memory footprint for client
[ https://issues.apache.org/jira/browse/HDFS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389346#comment-16389346 ] He Xiaoqiao commented on HDFS-13234: upload patch v1 for trunk and pending jenkins. > Remove renew configuration instance in ConfiguredFailoverProxyProvider and > reduce memory footprint for client > - > > Key: HDFS-13234 > URL: https://issues.apache.org/jira/browse/HDFS-13234 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, ha, hdfs-client >Reporter: He Xiaoqiao >Priority: Major > Attachments: HDFS-13234.001.patch > > > The memory footprint of #DFSClient is very considerable in some special > scenario since there are many #Configuration instances and occupy much memory > resource (In an extreme case, org.apache.hadoop.conf.Configuration occupies > over 600MB we meet under HDFS Federation an HA with QJM and there are dozens > of NameNodes). I think some new Configuration instance is not necessary. Such > as #ConfiguredFailoverProxyProvider initialization. > {code:java} > public ConfiguredFailoverProxyProvider(Configuration conf, URI uri, > Class xface, HAProxyFactory factory) { > this.xface = xface; > this.conf = new Configuration(conf); > .. > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org