[ https://issues.apache.org/jira/browse/MAPREDUCE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126257#comment-13126257 ]
Luke Lu commented on MAPREDUCE-2858: ------------------------------------ bq. Bobby: as part of this I have been looking for a good pure Java streaming HTML parser I personally wouldn't use a generic html parser, even though tagsoup can handle some forms of invalid html, as the goal of these parser are different from our purpose. We're dealing with potentially adversarial content here. I'd use an enum based state machine that doesn't allocate memory (with Strings) for element content and attribute values etc. The goal is to minimize false negatives and never abort scanning even documents are "illegal" html, as many browsers have there own particular way to handle invalid html. You can start with tagsoup if it's easier. I can provide further patch to improve the scanner, when I'm done with the process here. > MRv2 WebApp Security > -------------------- > > Key: MAPREDUCE-2858 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2858 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster, mrv2, security > Affects Versions: 0.23.0 > Reporter: Luke Lu > Assignee: Luke Lu > Priority: Blocker > Fix For: 0.23.0 > > > In MRv2, while the system servers (ResourceManager (RM), NodeManager (NM) and > NameNode (NN)) run as "trusted" > system users, the application masters (AM) run as users who submit the > application. While this offers great flexibility > to run multiple version of mapreduce frameworks (including their UI) on the > same Hadoop cluster, it has significant > implication for the security of webapps (Please do not discuss company > specific vulnerabilities here). > Requirements: > # Secure authentication for AM (for app/job level ACLs). > # Webapp security should be optional via site configuration. > # Support existing pluggable single sign on mechanisms. > # Should not require per app/user configuration for deployment. > # Should not require special site-wide DNS configuration for deployment. > This the top jira for webapp security. A design doc/notes of threat-modeling > and counter measures will be posted on the wiki. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira