[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054247#comment-13054247 ] Aaron T. Myers commented on HDFS-2092: -- Note: I'm not necessarily opposed to this change, but please justify its usefulness. From what I can tell so far, this patch seems to be optimizing something that's not actually an issue. bq. That was just a sample of measurement for a day. Sure, but what was it actually measuring? Increase in child heap size per task attempt? Increase in heap size per TT? Something else? bq. Also, Going forward, PIG 0.9 will store lots of meta data in the conf and also one can embed the PIG script itself in the conf. I don't know much about Pig, but that sounds like a bad idea on its part. Maybe I'm wrong about that. bq. This can potentially blow the TT. Can it? I've seen users have a lot of different problems with Hadoop, but Task Trackers falling over because of conf objects being too large isn't one I can recall. bq. Since one can store anything in the job conf, we should be careful with the references to this object - we should not hold for long duration. At most these references will be held for the lifetime of a task attempt, right? So not so long? Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054264#comment-13054264 ] Bharath Mundlapudi commented on HDFS-2092: -- We are not concerned about the task attempt. The problem here is for Task Tracker's availability. The way conf was designed has its own benefits. At the same time it comes with some disadvantages. What if a task attempt can run for a day or more? This is not uncommon in, our clusters. Again, I am listing couple of issues, 1. With UGI, conf will be created per user in TT. (Security folks?) 2. PIG or any other job can store arbitrary data. Hadoop framework should be able to deal with it as far as it can. 3. Last but not least, API should not hold on to client's data. As every job is different so can workloads can be different. So one can't see or hear all the problems. Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054285#comment-13054285 ] Aaron T. Myers commented on HDFS-2092: -- bq. We are not concerned about the task attempt. The problem here is for Task Tracker's availability. Have you actually experienced TTs crashing because conf objects were too large? Or where conf objects were taking up a substantial portion of the available heap space? bq. The way conf was designed has its own benefits. At the same time it comes with some disadvantages. What if a task attempt can run for a day or more? This is not uncommon in, our clusters. I would conjecture that such a task attempt is likely using many MBs or GBs of memory for the actual work it's doing. Is this patch which saves a few hundred KBs at the extreme end really going to move the needle? bq. 1. With UGI, conf will be created per user in TT. (Security folks?) But presumably only for every user which is concurrently running a task attempt on that TT, so not that many, right? Unless I'm missing something, which is certainly possible. bq. 2. PIG or any other job can store arbitrary data. Hadoop framework should be able to deal with it as far as it can. No disagreement there. bq. 3. Last but not least, API should not hold on to client's data. I see no principled reason the DFSClient should not hold on to client's data in the form of the conf object. If this is actually negatively impacting performance or availability, then we should certainly fix that, but you haven't demonstrated that yet. bq. As every job is different so can workloads can be different. So one can't see or hear all the problems. Certainly, but we can validate this issue with some testing. Can you please describe what you did to gather these measurements? What exactly are they actually measuring? My issue here is that this change is being done purely as an optimization, but it's unclear to me that negative issues exist without this patch, or that this patch necessarily addresses those issues. If you can demonstrate those, I'll shut up immediately. :) Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054421#comment-13054421 ] Hudson commented on HDFS-2092: -- Integrated in Hadoop-Hdfs-trunk #705 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/705/]) HDFS-2092. Remove some object references to Configuration in DFSClient. Contributed by Bharath Mundlapudi szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1139097 Files : * /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/DFSOutputStream.java * /hadoop/common/trunk/hdfs/CHANGES.txt * /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/DFSClient.java Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054632#comment-13054632 ] Bharath Mundlapudi commented on HDFS-2092: -- Todd, Thanks for the reasons. When we mean a client it can be anything, like TT/JT which has TIP/JIP. You are right, client TIP/JIP can have references to JobConf. But then reference scope is decided by client. And yes, eventually, we need to fix the FS cache you are referring also if there are any leaks. Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054224#comment-13054224 ] Aaron T. Myers commented on HDFS-2092: -- If I read that right, we're talking about a change that at the 99th percentile saves at most 386kb? I'm skeptical that those modest savings warrant this change. Also, how exactly were these gains measured? In what unit can we expect these memory savings? i.e. per TT? per DFSClient instance? Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054245#comment-13054245 ] Bharath Mundlapudi commented on HDFS-2092: -- Hi Aaron, That was just a sample of measurement for a day. We should care for MAX here in this case. Also, Going forward, PIG 0.9 will store lots of meta data in the conf and also one can embed the PIG script itself in the conf. This can potentially blow the TT. We can measure an approx size of conf by the job.xml file in the job history location. Since one can store anything in the job conf, we should be careful with the references to this object - we should not hold for long duration. Create a light inner conf class in DFSClient Key: HDFS-2092 URL: https://issues.apache.org/jira/browse/HDFS-2092 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira