[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388072#comment-15388072 ] Sangjin Lee commented on HADOOP-13070: -- One other aspect that needs to be addressed (that hasn't been spelled out) is the resource loading. The POC here doesn't cover the resource loading. The call patterns for resource loading are bit more varied as there are 3 distinct entry points: - {{ClassLoader.getResource()}} - {{ClassLoader.getResourceAsStream()}} - {{ClassLoader.getResources()}} I also find that the existing {{ApplicationClassLoader}} implementation doesn't cover {{ClassLoader.getResources()}}. :) > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: HADOOP-13070.poc.01.patch, Test.java, TestDriver.java, > classloading-improvements-ideas-v.3.pdf, classloading-improvements-ideas.pdf, > classloading-improvements-ideas.v.2.pdf, lib.jar > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371456#comment-15371456 ] Sean Busbey commented on HADOOP-13070: -- that sounds like a decent first pass sanity check. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: HADOOP-13070.poc.01.patch, Test.java, TestDriver.java, > classloading-improvements-ideas-v.3.pdf, classloading-improvements-ideas.pdf, > classloading-improvements-ideas.v.2.pdf, lib.jar > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368807#comment-15368807 ] Sangjin Lee commented on HADOOP-13070: -- Until JDK gives us an API for obtaining the caller class, this might be the best hope. One thing we could do might be to check whether the frames we're skipping are {{java.lang.Class}} or {{java.lang.ClassLoader}}. However, I'm not sure if that's any more reliable than the number of frames we're skipping... > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: HADOOP-13070.poc.01.patch, Test.java, TestDriver.java, > classloading-improvements-ideas-v.3.pdf, classloading-improvements-ideas.pdf, > classloading-improvements-ideas.v.2.pdf, lib.jar > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368600#comment-15368600 ] Sean Busbey commented on HADOOP-13070: -- I played with the POC for a while today. +1 on moving forward. I'm curious on what, if anything, we can do for runtime sanity checks on the call stack shape. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: HADOOP-13070.poc.01.patch, Test.java, TestDriver.java, > classloading-improvements-ideas-v.3.pdf, classloading-improvements-ideas.pdf, > classloading-improvements-ideas.v.2.pdf, lib.jar > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328733#comment-15328733 ] Sangjin Lee commented on HADOOP-13070: -- No worries. Thanks! > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328224#comment-15328224 ] Ravi Prakash commented on HADOOP-13070: --- I just wanted to clarify that I am still excited by this idea and looking forward to the implementation. We can always build in feature flags that enable / disable this and try it to see what works. Thanks for putting in the effort Sangjin! > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323708#comment-15323708 ] Sangjin Lee commented on HADOOP-13070: -- Thanks for the comments [~busbey] and [~ste...@apache.org]. {quote} Ugh. this gives me all kinds of bad feels, though I think I might agree. I know Steve Loughran has strong feelings on the maintenance burden of this kind of custom classloader work, so let's ping him early. {quote} I completely agree that this is bit of a bitter pill to swallow. But it is also rather clear to me that something like that is needed to pull off the stricter isolation (not letting user see parent classes). I worked out a quick working prototype of the idea recently, and I'll share it soon. {quote} If we go down this path, how concerned are we going to be with maintaining cross-JVM compatibility (vs falling back to some kind of "no isolation" approach)? {quote} That is definitely a concern. This is basically latching onto what {{ClassLoader.getCallerClassLoader()}} does. It relies on the shape of the call stack frame to determine what the (non-JDK) "calling" class is. It binds the JDK implementations to keep that shape uniform across different classloading patterns. But it is the case that if the JDK ever changes that frame depth we would likely have to update ours to match it. {quote} custom classloaders tend to cause problems downstream, both maintenance and use. That doesn't mean they don't solve some problems: it's just they are dogs to work with. {quote} Having dabbled a fair amount in classloaders, I am fully aware of the pain that can happen for implementors and users of custom classloaders. Again, if JDK's classloading wasn't so leaky to begin with (via things like TCCL or else), it could have been simpler. The problem with custom classloaders is that if it breaks it breaks in a pretty painful way and it is hard to work around it. That said, the current application classloader in hadoop works quite well in large part with relatively few issues (for disclosure our company pretty much enables it by default), and whenever an issue arose I was able to fix it in a fairly straightforward manner. So I would consider the current "working" level reasonably high. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323198#comment-15323198 ] Steve Loughran commented on HADOOP-13070: - correct. I'm not vetoing it. I'm just warning that it's really hard to get right > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323058#comment-15323058 ] Sean Busbey commented on HADOOP-13070: -- that sounds like a -0 rather than a -1? > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323042#comment-15323042 ] Steve Loughran commented on HADOOP-13070: - custom classloaders tend to cause problems downstream, both maintenance and use. That doesn't mean they don't solve some problems: it's just they are dogs to work with. Little dogs that seem like cute little puppies, which eventually become big ugly beasts you have to take for 3 h walks every day, eats everything in the kitchen, uses your bed as a toilet and, due to its habit of biting small children, is something you are scared of yourself. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322828#comment-15322828 ] Sean Busbey commented on HADOOP-13070: -- +1 on removing Configuration's dedicated classloader. That simplification helps limit our pain to the ones java folks expect to have in TCCL. {quote} We need to explore an op on that can let you determine the calling class and only block a user calling class to load a parent class (rule #4). We might be able to accomplish this by trying to determine the calling class and its classloader from the stack trace. This is something that the JDK’s ClassLoader does (via a non‐public JDK‐internal method), and we may be able to implement something similar. {quote} Ugh. this gives me all kinds of bad feels, though I think I might agree. I know [~ste...@apache.org] has strong feelings on the maintenance burden of this kind of custom classloader work, so let's ping him early. If we go down this path, how concerned are we going to be with maintaining cross-JVM compatibility (vs falling back to some kind of "no isolation" approach)? If we're at this point, is just shading every 3rd party dependency we use easier (barring the usual non-relocatable bits)? That would also prevent downstream folks from relying on them without a very clear at-your-own-risk step. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305081#comment-15305081 ] Sangjin Lee commented on HADOOP-13070: -- bq. I sure do wish Java 9 had something that would make it easier but I didn't see anything. There is jigsaw (in java 9), but then there is always jigsaw. :) > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305075#comment-15305075 ] Sangjin Lee commented on HADOOP-13070: -- Thanks for the comments [~raviprak]! To answer your questions... {quote} ApplicationClassLoader seems like its only being used by MR. I grepped in Tez and Spark source, and didn't find any instances. Even if we were to do this only for MR, it would be incredibly valuable. I feel it would also set a precedent / pattern that other frameworks can then leverage. {quote} If you meant that the only usage is hadoop itself, I believe that's correct. Within hadoop, there are 3 usages today: MR task class isolation, hadoop run jar class isolation, and more recently the NM aux service class isolation. Since {{ApplicationClassLoader}} is part of the public API, other frameworks can use it. bq. If we were to focus on MR, do you know what are the common problematic conflicting dependencies? Unfortunately there are many to choose from, and quite a few of the well-known ones fall into the problem category. Some of the more famous ones include guava and jackson to name a couple. But isolating class spaces has more benefits than simply preventing collisions. Since we're afraid of breaking users, hadoop has been very slow/conservative in upgrading any libraries it uses. As a result, we're stuck in the stone age for many of the libraries we use. Isolation would give hadoop more freedom to upgrade its dependencies without worrying about impacting users. That is of course provided that the isolation mode becomes the default, which may still be some time away. {quote} One alternative approach would be to start 2 JVMs for each MR-task: an MR-framework JVM and an MR-task JVM. We would do all MR-framework specific work in the MR-framework JVM and send raw Map-Reduce input key-value pairs over a socket and read output key value pairs over a socket from the MR-task JVM. The MR specific code running in the MR-task JVM would then be minimal and only needs to read over the socket and call the user code. {quote} That is an interesting idea to solve this problem. I still worry about the performance implication it has. Also, it still would not eliminate the problem entirely. As you pointed out, even in that separate process you still need a minimal amount of hadoop code which then pulls in the needed dependencies. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305006#comment-15305006 ] Sean Busbey commented on HADOOP-13070: -- thanks for all the work so far [~sjlee0]! I'm planning to catch up on this work over the weekend. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304825#comment-15304825 ] Ravi Prakash commented on HADOOP-13070: --- Hi Sangjin! Thanks for taking this up. I look forward to all your improvements. {{ApplicationClassLoader}} seems like its only being used by MR. I grepped in Tez and Spark source, and didn't find any instances. Even if we were to do this only for MR, it would be incredibly valuable. I feel it would also set a precedent / pattern that other frameworks can then leverage. If we were to focus on MR, do you know what are the common problematic conflicting dependencies? One alternative approach would be to start 2 JVMs for each MR-task: an MR-framework JVM and an MR-task JVM. We would do all MR-framework specific work in the MR-framework JVM and send raw Map-Reduce input key-value pairs over a socket and read output key value pairs over a socket from the MR-task JVM. The MR specific code running in the MR-task JVM would then be minimal and only needs to read over the socket and call the user code. I know protobuf (required for serialization / deserialization) is often the conflicting library, so it would be no help in that case. (We could still shade this minimal set of libraries.. although I personally dislike shading a lot). I sure do wish Java 9 had something that would make it easier but I didn't see anything. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303494#comment-15303494 ] Sangjin Lee commented on HADOOP-13070: -- I would greatly appreciate feedback on the proposal or thoughts and suggestions in general. Thanks! > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279034#comment-15279034 ] Sangjin Lee commented on HADOOP-13070: -- Got you. Thanks. I am comfortable with 3.0 being defined as primarily the java 8 release. Once we have something ready (and along with HADOOP-11656), it would be good to get this in afterwards. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278858#comment-15278858 ] Steve Loughran commented on HADOOP-13070: - When i said "flipping the maven switch" I meant "switching the build to being Java 8+ only" > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Attachments: classloading-improvements-ideas-v.3.pdf, > classloading-improvements-ideas.pdf, classloading-improvements-ideas.v.2.pdf > > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264496#comment-15264496 ] Sangjin Lee commented on HADOOP-13070: -- Will do. Thanks! > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264464#comment-15264464 ] Sean Busbey commented on HADOOP-13070: -- [~andrew.wang] and I were just earlier in the week discussing getting an approach like this in place as a first step to see if it would suffice for HADOOP-11656. Super glad to see more folks interested. On the launcher side, I'm planning to pick HADOOP-11804 back up in the next few weeks. Sangjin do let me know if there's anything I can help with on this, wether it's reviews or early work testing out approaches. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264412#comment-15264412 ] Sangjin Lee commented on HADOOP-13070: -- Hi [~ste...@apache.org], I'm not quite sure what you meant by flipping the maven switch. Could you kindly elaborate? This is a companion to HADOOP-11656, but this addresses more of a container-like situation (e.g. isolating user code in MR tasks, etc.). There are some long-standing improvements we can make that will make everyone's life easier but which may break some backward compatibility. I'm thinking of changing some APIs and replacing configs, etc. I'm not sure if this should target Hadoop 3 (I'm not entirely clear where we stand with Hadoop 3 at the moment). My hope is it would be in the first release where breaking backward compatibility is allowed. > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263774#comment-15263774 ] Steve Loughran commented on HADOOP-13070: - Sanglin: is this targeting Hadoop 3? As if so, we should just flip the maven switch, say Java 8+ and focus on that > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13070) classloading isolation improvements for cleaner and stricter dependencies
[ https://issues.apache.org/jira/browse/HADOOP-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263269#comment-15263269 ] Sangjin Lee commented on HADOOP-13070: -- I had an offline meeting with Sean and Andrew a while back. Filing an issue belatedly to make some progress. The basic premise is that we have the {{ApplicationClassLoader}} that gets us fairly close to where we want to be without introducing a whole lot of complication with little additional benefit. We should strengthen it and make it stricter to close the gap. And as part of the process, we can correct and revisit some of the pain points in terms of classpath isolation by making some backward-incompatible changes. I'll post a proposal some time soon. Here are the key ideas that I have at this point (in no priority order): - "fix/deprecate/remove" {{Configuration.setClassLoader()}}: causes a big anti-pattern that allows unsafe sharing and overwriting of classloaders - make {{ApplicationClassLoader}} stricter: completely separate user classpath from the system classpath so it doesn't fall back to parent if the user class is not found in the user classpath - update {{ApplicationClassLoader}} to be current with the java 8 {{ClassLoader}} implementation (e.g. classloading lock, etc.) - improve the system class override mechanism in {{ApplicationClassLoader}} There may be more... > classloading isolation improvements for cleaner and stricter dependencies > - > > Key: HADOOP-13070 > URL: https://issues.apache.org/jira/browse/HADOOP-13070 > Project: Hadoop Common > Issue Type: Improvement > Components: util >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > > Related to HADOOP-11656, we would like to make a number of improvements in > terms of classloading isolation so that user-code can run safely without > worrying about dependency collisions with the Hadoop dependencies. > By the same token, it should raised the quality of the user code and its > specified classpath so that users get clear signals if they specify incorrect > classpaths. > This will contain a proposal that will include several improvements some of > which may not be backward compatible. As such, it should be targeted to the > next major revision of Hadoop. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org