New revision...

I have incorporated additions from Mike and added a [DEFAULT] tag to those
items that should be considered for Secure by Default settings.
I am hoping that we can close down on the actual lists shortly and move to
discussing the meta points on how/when to require the completion of the
checklists and whether and how they should be included as docs for the
feature moving forward.

Some comments that I have gotten offline have included concern that
targeting merge requests would only capture a subset of new features and
may actually affect the decision to use branches or not. This is certainly
something that we wouldn't want to do. At the same time, we don't want to
be so intrusive in the development cycles to bog down those patches that
just fix bugs.

At any rate, let's close down on the checklists here first.

Thanks!

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?


------------------------------------------------------------
------------------------------------------------------------
--------------------------


*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
[DEFAULT] (pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting [DEFAULT]
  1.2.2. cross site request forgery [DEFAULT]
  1.2.3. click jacking (X-Frame-Options) [DEFAULT]
  1.2.4 If using cookies, is the secure flag for cookies turned on?
[DEFAULT]
  1.2.5 If using cookies, is the HTTPOnly flag turned on? [DEFAULT]
1.3. What sort of authentication is required for access to the UIs?
[DEFAULT]
  1.3.1. Kerberos
    1.3.1.1. has TGT renewal been accounted for
    1.3.1.2. SPNEGO support?
    1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes? [DEFAULT]
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
  1.5.1 If so, how is it validated before persistence? [DEFAULT]
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support? [DEFAULT]
  1.7.1 Is it possible to configure TLS protocols and cipher suites?
  1.7.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)?
1.8 Are accesses to the UIs audited? ("User X logged into Y from IP address
Z", etc) [DEFAULT]

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS) [DEFAULT]
  2.2.2. cross site request forgery (CSRF) [DEFAULT]
  2.2.3. XML External Entity (XXE) [DEFAULT]
2.3. What is being used for authentication - Hadoop Auth Module? [DEFAULT]
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support? [DEFAULT]
  2.5.1 Is it possible to configure TLS protocols and cipher suites?
  2.5.2 Is it possible to configure support for HTTP Strict Transport
Security (HSTS)? [DEFAULT]
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?
2.8 Are accesses to the REST APIs audited? ("User X accessed resource Y
from IP address Z", etc) [DEFAULT]

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?
3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
any other in computer science. Standard cryptographic libraries should
always be used. Does this work attempt to create an encryption scheme or
protocol? Does it have a "novel" or "unique" use of normal crypto?  There
be dragons. Even normal-looking use of cryptography must be carefully
reviewed.
3.5 If you need random bits for a security purpose, such as for a session
token or a cryptographic key, you need a cryptographically approved place
to acquire said bits. Use the SecureRandom class. [DEFAULT]

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

*7. Log Messages*

Do not write secrets or data into log files. This sounds obvious, but
mistakes happen.

7.1 Do not log passwords, keys, security-related tokens, or any sensitive
configuration item.
7.2 Do not log any user-supplied data, ever. Not even snippets of user
data, such as “I had an error parsing this line of text: xxxx” where the
xxxx’s are user data. You never know, it might contain secrets like credit
card numbers.

*8. Secure By Default*

Strive to be secure by default. This means that products should ship in a
secure state, and only by human tuning be put into an insecure state.
Exhibit A here is the MongoDB ransomware fiasco, where the
insecure-by-default MongoDB installation resulted in completely open
instances of mongodb on the open internet.  Attackers removed or encrypted
the data and left ransom notes behind. We don't want that sort of notoriety
for hadoop. Granted, it's not always possible to turn on all security
features: for example you have to have a KDC set up in order to enable
Kerberos.

8.1 Are there settings or configurations that can be shipped in a
default-secure state?


On Tue, Oct 31, 2017 at 10:36 AM, larry mccay <lmc...@apache.org> wrote:

> Thanks for the examples, Mike.
>
> I think some of those should actually just be added to the checklist in
> other places as they are best practices.
> Which raises an interesting point that some of those items can be enabled
> by default and maybe indicating so throughout the list makes sense.
>
> Then we can ask for a description of any other Secure by Default
> considerations at the end.
>
> I will work on a new revision this morning.
>
>
> On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <myo...@cloudera.com>
> wrote:
>
>> #8 is a great topic - given that Hadoop is insecure by default.
>>> Actual movement to Secure by Default would be a challenge both
>>> technically (given the need for kerberos) and discussion-wise.
>>> Asking whether you have considered any settings of configurations that
>>> can be secure by default is an interesting idea.
>>>
>>> Can you provide an example though?
>>>
>>
>> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
>> etc.  But here are some ideas:
>>
>> - Default to only listen for network traffic on the loopback interface.
>> The admin would have to take specific action to listen on a non-loopback
>> address. Hence secure by default. I've known web servers that ship like
>> this. The counter argument to this is that this is a "useless by default"
>> setting for a distributed system... which does have some validity.
>> - A more constrained version of the above is to not bind to any network
>> interface that has an internet-routable ip address. (That is, not in the
>> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
>> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
>> that's obviously headed towards the open internet.  Sure this isn't
>> perfect, but it would catch some cases. The admin could provide a specific
>> flag to override.  (I got this one from discussion with the Kudu folks.)
>> - The examples don't have to be big. Another example would be... if using
>> TLS, and if the certificate authority used to sign the certificate is in
>> the default certificate store, turn on HSTS automatically.
>> - Always turn off TLSv1 and TLSv1.1
>> - Forbid single-DES and RC4 encryption algorithms
>>
>> You get the idea.
>> -Mike
>>
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <myo...@cloudera.com>
>>> wrote:
>>>
>>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lmc...@apache.org> wrote:
>>>>
>>>>> New Revision...
>>>>>
>>>>
>>>> These lists are wonderful. I appreciate the split between the Tech
>>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>>> "don't enable by default" or at least "don't enable if security is on".  I
>>>> don't have any comments on that part.
>>>>
>>>> Additions inline below. If some of the additions are items covered by
>>>> existing frameworks that any code would use, please forgive my ignorance.
>>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>>
>>>> *GA Readiness Security Audit*
>>>>> At this point, we are merging full or partial security model
>>>>> implementations.
>>>>> Let's inventory what is covered by the model at this point and whether
>>>>> there are future merges required to be full.
>>>>>
>>>>> *1. UIs*
>>>>>
>>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>>> (pointers to code would be appreciated)
>>>>> 1.2. What explicit protections have been built in for (pointers to
>>>>> code would be appreciated):
>>>>>   1.2.1. cross site scripting
>>>>>   1.2.2. cross site request forgery
>>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>>
>>>>
>>>> 1.2.4 If using cookies, is the secure flag for cookies
>>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>>
>>>>
>>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>>   1.3.1. Kerberos
>>>>>     1.3.1.1. has TGT renewal been accounted for
>>>>>     1.3.1.2. SPNEGO support?
>>>>>     1.3.1.3. Delegation token?
>>>>>   1.3.2. Proxy User ACL?
>>>>> 1.4. What authorization is available for determining who can access
>>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>>> related processes?
>>>>> 1.5. Is there any input that will ultimately be persisted in
>>>>> configuration for executing shell commands or processes?
>>>>> 1.6. Do the UIs support the trusted proxy pattern with doas
>>>>> impersonation?
>>>>> 1.7. Is there TLS/SSL support?
>>>>>
>>>>
>>>> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
>>>> 1.7.2 Is it possible to configure support for HTTP Strict Transport
>>>> Security
>>>> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
>>>> (HSTS)?
>>>> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP
>>>> address Z", etc)
>>>>
>>>>
>>>>> *2. REST APIs*
>>>>>
>>>>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>>>>> impersonation capabilities?
>>>>> 2.2. What explicit protections have been built in for:
>>>>>   2.2.1. cross site scripting (XSS)
>>>>>   2.2.2. cross site request forgery (CSRF)
>>>>>   2.2.3. XML External Entity (XXE)
>>>>> 2.3. What is being used for authentication - Hadoop Auth Module?
>>>>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>>>>> endpoints) or are they part of existing processes?
>>>>> 2.5. Is there TLS/SSL support?
>>>>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>>>>> APIs?
>>>>> 2.7. What authorization enforcement points are there within the REST
>>>>> APIs?
>>>>>
>>>>
>>>> The TLS and audit comments above apply here, too.
>>>>
>>>>
>>>>> *3. Encryption*
>>>>>
>>>>> 3.1. Is there any support for encryption of persisted data?
>>>>> 3.2. If so, is KMS and the hadoop key command used for key management?
>>>>> 3.3. KMS interaction with Proxy Users?
>>>>>
>>>>
>>>> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto
>>>> than any other in computer science. Standard cryptographic libraries should
>>>> always be used. Does this work attempt to create an encryption scheme or
>>>> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
>>>> be dragons. Even normal-looking use of cryptography must be carefully
>>>> reviewed.
>>>> 3.5 If you need random bits for a security purpose, such as for a
>>>> session token or a cryptographic key, you need a cryptographically approved
>>>> place to acquire said bits. Use the SecureRandom class.
>>>>
>>>> *4. Configuration*
>>>>>
>>>>> 4.1. Are there any passwords or secrets being added to configuration?
>>>>> 4.2. If so, are they accessed via Configuration.getPassword() to allow
>>>>> for provisioning to credential providers?
>>>>> 4.3. Are there any settings that are used to launch docker containers
>>>>> or shell out command execution, etc?
>>>>>
>>>>
>>>> +1. So good.
>>>>
>>>>
>>>>> *5. HA*
>>>>>
>>>>> 5.1. Are there provisions for HA?
>>>>> 5.2. Are there any single point of failures?
>>>>>
>>>>> *6. CVEs*
>>>>>
>>>>> Dependencies need to have been checked for known issues before we
>>>>> merge.
>>>>> We don't however want to list any CVEs that have been fixed but not
>>>>> released yet.
>>>>>
>>>>> 6.1. All dependencies checked for CVEs?
>>>>>
>>>>
>>>> Big +1 for this, too.
>>>>
>>>> 7. Log Messages
>>>>
>>>> Do not write secrets or data into log files. This sounds obvious, but
>>>> mistakes happen.
>>>>
>>>> 7.1 Do not log passwords, keys, security-related tokens, or any
>>>> sensitive configuration item.
>>>> 7.2 Do not log any user-supplied data, ever. Not even snippets of user
>>>> data, such as “I had an error parsing this line of text: xxxx” where the
>>>> xxxx’s are user data. You never know, it might contain secrets like credit
>>>> card numbers.
>>>>
>>>> 8. Secure By Default
>>>>
>>>> Strive to be *secure by default*. This means that products should ship
>>>> in a secure state, and only by human tuning be put into an insecure state.
>>>> Exhibit A here is the MongoDB ransomware fiasco
>>>> <https://krebsonsecurity.com/tag/mongodb/>, where the
>>>> insecure-by-default MongoDB installation resulted in completely open
>>>> instances of mongodb on the open internet.  Attackers removed or encrypted
>>>> the data and left ransom notes behind. We don't want that sort of notoriety
>>>> for hadoop. Granted, it's not always possible to turn on all security
>>>> features: for example you have to have a KDC set up in order to enable
>>>> Kerberos.
>>>>
>>>> 8.1 Are there settings or configurations that can be shipped in a
>>>> default-secure state?
>>>>
>>>>
>>>> Thanks again for putting this list together!
>>>> -Mike
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to