[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126257#comment-13126257
 ] 

Luke Lu commented on MAPREDUCE-2858:
------------------------------------

bq. Bobby: as part of this I have been looking for a good pure Java streaming 
HTML parser

I personally wouldn't use a generic html parser, even though tagsoup can handle 
some forms of invalid html, as the goal of these parser are different from our 
purpose. We're dealing with potentially adversarial content here. I'd use an 
enum based state machine that doesn't allocate memory (with Strings) for 
element content and attribute values etc. The goal is to minimize false 
negatives and never abort scanning even documents are "illegal" html, as many 
browsers have there own particular way to handle invalid html.

You can start with tagsoup if it's easier. I can provide further patch to 
improve the scanner, when I'm done with the process here.
                
> MRv2 WebApp Security
> --------------------
>
>                 Key: MAPREDUCE-2858
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2858
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2, security
>    Affects Versions: 0.23.0
>            Reporter: Luke Lu
>            Assignee: Luke Lu
>            Priority: Blocker
>             Fix For: 0.23.0
>
>
> In MRv2, while the system servers (ResourceManager (RM), NodeManager (NM) and 
> NameNode (NN)) run as "trusted"
> system users, the application masters (AM) run as users who submit the 
> application. While this offers great flexibility
> to run multiple version of mapreduce frameworks (including their UI) on the 
> same Hadoop cluster, it has significant
> implication for the security of webapps (Please do not discuss company 
> specific vulnerabilities here).
> Requirements:
> # Secure authentication for AM (for app/job level ACLs).
> # Webapp security should be optional via site configuration.
> # Support existing pluggable single sign on mechanisms.
> # Should not require per app/user configuration for deployment.
> # Should not require special site-wide DNS configuration for deployment.
> This the top jira for webapp security. A design doc/notes of threat-modeling 
> and counter measures will be posted on the wiki.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to