Re: [Bug-wget] Overly permissive hostname matching

2014-03-21 Thread Tim Ruehsen
On Thursday 20 March 2014 23:11:31 Daniel Stenberg wrote:
 On Thu, 20 Mar 2014, Tim Rühsen wrote:
  I broke out the public suffix code together and created a first go (really
  very quick, distcheck fails - couldn't figure out this evening).
  
  https://github.com/rockdaboot/libpsl
 
 Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do
 this early.
 
 You do realize that with a *GPL license on the thing, you won't get adopted
 by OpenSSL, curl and possibly others?

I knew you were the first to bring this up ;-)

 I can't prevent you of course and the decision is yours to make, but I'd
 prefer a BSD style license as then I could really consider basing future
 enhancements of curl on this effort.

I don't care much about the license in this case, since we are talking about 
something pretty simple. I just concentrated on the code that evening and 
simply copied the license clauses...

I would like to see a consensus here just to avoid further discussions about 
licenses (there have been far too many).

Ángel González already 'voted' for a MIT license.
Daniel, you prefer a BSD license.
I am OK with either one.
Any other votes ?

Tim



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Tim Rühsen
Am Mittwoch, 19. März 2014, 10:59:05 schrieb Daniel Kahn Gillmor:
 I'm imagining a C library API that has a public suffix list context
 object that can do efficient lookups (however we define the lookups),
 and the library would bundle a pre-compiled context, based on the
 currently-known public suffix list.
 
 something like:
 
 ---
 struct psl_ctx;
 typedef struct psl_ctx * psl_ctx_t;
 const psl_ctx_t psl_builtin;
 
 psl_ctx_t psl_new_ctx_from_filename(const char* filename);
 psl_ctx_t psl_new_ctx_from_fd(int fd);
 void psl_free_ctx(psl_ctx_t ctx);
 
 /*
   query forms, very rough draft -- do we need both?
   need to consider memory allocation responsibilities and
   DNS internationalization/canonicalization issues
 */
 
 const char* psl_get_public_suffix(const psl_ctx_t, const char* domain);
 const char* psl_get_registered_domain(const psl_ctx_t, const char* d);
 ---

I broke out the public suffix code together and created a first go (really very 
quick, distcheck fails - couldn't figure out this evening).

https://github.com/rockdaboot/libpsl

The first step was a psl_is_tld() function.
There is a test case for some major things (wildcards, exceptions).

I hope there will be some interest and some contributions...

Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Jeffrey Walton
On Thu, Mar 20, 2014 at 5:52 PM, Tim Rühsen tim.rueh...@gmx.de wrote:
 Am Mittwoch, 19. März 2014, 10:59:05 schrieb Daniel Kahn Gillmor:
 I'm imagining a C library API that has a public suffix list context
 object that can do efficient lookups (however we define the lookups),
 and the library would bundle a pre-compiled context, based on the
 currently-known public suffix list.

 something like:

 ---
 struct psl_ctx;
 typedef struct psl_ctx * psl_ctx_t;
 const psl_ctx_t psl_builtin;

 psl_ctx_t psl_new_ctx_from_filename(const char* filename);
 psl_ctx_t psl_new_ctx_from_fd(int fd);
 void psl_free_ctx(psl_ctx_t ctx);

 /*
   query forms, very rough draft -- do we need both?
   need to consider memory allocation responsibilities and
   DNS internationalization/canonicalization issues
 */

 const char* psl_get_public_suffix(const psl_ctx_t, const char* domain);
 const char* psl_get_registered_domain(const psl_ctx_t, const char* d);
 ---

 I broke out the public suffix code together and created a first go (really 
 very
 quick, distcheck fails - couldn't figure out this evening).

 https://github.com/rockdaboot/libpsl

 The first step was a psl_is_tld() function.
 There is a test case for some major things (wildcards, exceptions).

 I hope there will be some interest and some contributions...
Yes, I'd be interested. Especially since Angel pointed out failures in
my use of the PSL (the close-open failures are troubling to me).

I had a sidebar with one of the OpenSSL devs because OpenSSL is
cutting in hostname matching in version 1.0.2.

He shared a link to a IETF working group on the subject:
https://www.ietf.org/mailman/listinfo/dbound.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Daniel Stenberg

On Thu, 20 Mar 2014, Tim Rühsen wrote:

I broke out the public suffix code together and created a first go (really 
very quick, distcheck fails - couldn't figure out this evening).


https://github.com/rockdaboot/libpsl


Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do this 
early.


You do realize that with a *GPL license on the thing, you won't get adopted by 
OpenSSL, curl and possibly others?


I can't prevent you of course and the decision is yours to make, but I'd 
prefer a BSD style license as then I could really consider basing future 
enhancements of curl on this effort.


--

 / daniel.haxx.se

Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Jeffrey Walton
On Thu, Mar 20, 2014 at 6:11 PM, Daniel Stenberg dan...@haxx.se wrote:
 On Thu, 20 Mar 2014, Tim Rühsen wrote:

 I broke out the public suffix code together and created a first go (really
 very quick, distcheck fails - couldn't figure out this evening).

 https://github.com/rockdaboot/libpsl


 Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do
 this early.

 You do realize that with a *GPL license on the thing, you won't get adopted
 by OpenSSL, curl and possibly others?
+1

 I can't prevent you of course and the decision is yours to make, but I'd
 prefer a BSD style license as then I could really consider basing future
 enhancements of curl on this effort.
Does GNU have a permissive license? I know permissive does not meet
all of Dr. Stallman's goals, but it will allow GNU more intellectual
property in the arena.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Ángel González

On 20/03/14 23:11, Daniel Stenberg wrote:
You do realize that with a *GPL license on the thing, you won't get 
adopted by OpenSSL, curl and possibly others?


I can't prevent you of course and the decision is yours to make, but 
I'd prefer a BSD style license as then I could really consider basing 
future enhancements of curl on this effort.
FWIW, I have to agree, I would have opted for a MIT license for this if 
I were writing it. The FSF would probably strongly oppose, though :)






Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Ángel González

On 20/03/14 22:52, Tim Rühsen wrote:

I broke out the public suffix code together and created a first go (really very
quick, distcheck fails - couldn't figure out this evening).

https://github.com/rockdaboot/libpsl

The first step was a psl_is_tld() function.
There is a test case for some major things (wildcards, exceptions).


So, your public api seems to be this:

psl_ctx_t *psl_load_file(const char *fname);

void psl_free(psl_ctx_t **psl)

int psl_is_tld(const psl_ctx_t *psl, const char *domain)

Fisrt, I wouldn't call the function is_tld(), not just because tlds 
simply won't have any dot inside, (just extract the last label in a DNS 
name)* since there are more ambiguous cases. I would name it 
is_public(), defining it as “one domain under which anyone* can register 
a subdomain”. Additionally, I think there should be a function to 
extract the public suffix from a given domain. Both functions should 
take a flags argument. The immediate use I foresee is to choose whether 
private registries should be taken into account or not. (a private 
registry is a domain used for the public but not owned by a registry, 
dyndns.org and blogspot.com are examples of that) * anyone understood 
as a random person unaffiliated with the owner of the parent domain, 
notwithstanding any condition that such anyone is required to fulfill 
in order to register it (such as residing in a given region or having 
payed certain fees). PS: It's funny to see 1994 rfc1591 talking about 
TLDs and saying «It is extremely unlikely that any other TLDs will be 
created.»





Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Ángel González

On 20/03/14 23:16, Jeffrey Walton wrote:



I can't prevent you of course and the decision is yours to make, but I'd
prefer a BSD style license as then I could really consider basing future
enhancements of curl on this effort.

Does GNU have a permissive license? I know permissive does not meet
all of Dr. Stallman's goals, but it will allow GNU more intellectual
property in the arena.

Jeff
The LGPL would be an option. I don't see why is intellectual property 
related.

The license and the copyright owner seem quite orthogonal to me.





Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Jeffrey Walton
On Thu, Mar 20, 2014 at 7:11 PM, Ángel González keis...@gmail.com wrote:
 On 20/03/14 23:16, Jeffrey Walton wrote:


 I can't prevent you of course and the decision is yours to make, but I'd
 prefer a BSD style license as then I could really consider basing future
 enhancements of curl on this effort.

 Does GNU have a permissive license? I know permissive does not meet
 all of Dr. Stallman's goals, but it will allow GNU more intellectual
 property in the arena.

 The LGPL would be an option. I don't see why is intellectual property
 related.
 The license and the copyright owner seem quite orthogonal to me.
Isn't copyright assigned to GNU or FSF?

(Sorry, I don't know. I'm not a lawyer, so my solution is usually to
avoid GPL code all together).

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Daniel Stenberg

On Fri, 21 Mar 2014, Ángel González wrote:


The LGPL would be an option.


Not for curl though and probably not to other BSD/MIT licensed projects...

--

 / daniel.haxx.se

Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Ángel González

On 21/03/14 00:21, Daniel Stenberg wrote:

On Fri, 21 Mar 2014, Ángel González wrote:


The LGPL would be an option.


Not for curl though and probably not to other BSD/MIT licensed 
projects...



That's a good point.



Jeff wrote:

Isn't copyright assigned to GNU or FSF?

No. By licensing something under GPL you don't assign any copyright to FSF.*

However, in some GNU projects, -and unlike most (all?) the rest of free 
projects- the
FSF additionally requests to be assigned the copyright of the work, in 
order to be

in a better position for enforcing its free license.
https://www.gnu.org/licenses/why-assign.html

* Actually, if it's licensed with an or later version, as the license 
gatekeepers,

they have an extra right to change it, but I wouldn't.



(Sorry, I don't know. I'm not a lawyer, so my solution is usually to
avoid GPL code all together).
That's a solution. Although it's a sad result from usage of a license 
intended to preserve freedoms.



Cheers




Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Jeffrey Walton
On Thu, Mar 20, 2014 at 8:12 PM, Ángel González keis...@gmail.com wrote:
 On 21/03/14 00:21, Daniel Stenberg wrote:

 ...
 (Sorry, I don't know. I'm not a lawyer, so my solution is usually to
 avoid GPL code all together).

 That's a solution. Although it's a sad result from usage of a license
 intended to preserve freedoms.
For what its worth, I agree with you.

I can't afford lawyers on retainer to untangle things or to defend a
suite. Hence the reason would be happy to use a permissive GPL license
and assign any IP to GNU or FSF.

I don't take the position due to philosophy or perceived moral high
ground. Its simply economic for me. Anyone who has not experienced the
economics of a technology lawsuit is in for a shock.

In the past, I spent 10,000's on a lawyer in a technology case. I'm
not going through that again, unless I'm independently wealthy. (I
think I had the moral high ground since I was suing a chronic spammer
who harassed me for nearly 15 years. I tried for years to get off the
lists, and finally had to resort to the courts).

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Stenberg

On Tue, 18 Mar 2014, Ángel González wrote:


Daniel, how does cURL check correctness of the certificate hostname suffix?


It insists on at least two dots. So yes, *.apple will cause problems for us 
too.


I view the public suffix list as one of the worst kludges in networking 
history and while I understand why it is necessary, it is next to impossible 
to actually use sensibly in lots of environments.


--

 / daniel.haxx.se

Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Stenberg

On Wed, 19 Mar 2014, Daniel Kahn Gillmor wrote:

It insists on at least two dots. So yes, *.apple will cause problems for 
us too.


There are also errors in the opposite direction: it sounds like curl will 
accept a cert for *.co.uk, right?


Exactly, due to the lack of public suffix awareness! :-(

--

 / daniel.haxx.se



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Kahn Gillmor
On 03/19/2014 06:19 AM, Tim Ruehsen wrote:
 As a programmer, I want to have control. E.g. the option to load from a 
 different file, or to switch off loading. Why ? e.g. for testing purposes, or 
 simply imagine a swiss army knife client for experts - maybe they want to 
 have control via CLI args. Or you are in a controlled environment and simply 
 don't want to waste CPU cycles when downloading a single file from a trusted 
 server. Just some examples.
 And than, clients like Wget would like to have access, at least for checking 
 cookies.

i understand, and i think we're probably not disagreeing -- you want the
ability to control it; i want sane defaults so that people who don't
touch it get sensible behavior.

 I just took a quick look but I am not sure about the API (i did not have this 
 'aha' effect). But what I don't like is the dependency on PHP which is used 
 to 
 'compile' the PSL before the C functions can use it. While the idea of 
 compilation/preprocessing is a good one, it should at least be optional.

pre-compilation/preprocessing is probably a reasonable performance
optimization for heavy use; we might even want a C library to embed a
precompiled version of the most recent known list at time of
compilation, so that it can be used with no initialization step or when
no file is available.  I don't think depending on php for the
pre-compilation step is a problem; that's just an additional
build-dependency, same as (for example) bison or cmake or python for
other C projects.  (though i confess i'd rather work with pretty much
any language other than PHP in general)

I agree that we probably want the library to support the generic case of
reading the PSL from a file, though.

I'm imagining a C library API that has a public suffix list context
object that can do efficient lookups (however we define the lookups),
and the library would bundle a pre-compiled context, based on the
currently-known public suffix list.

something like:

---
struct psl_ctx;
typedef struct psl_ctx * psl_ctx_t;
const psl_ctx_t psl_builtin;

psl_ctx_t psl_new_ctx_from_filename(const char* filename);
psl_ctx_t psl_new_ctx_from_fd(int fd);
void psl_free_ctx(psl_ctx_t ctx);

/*
  query forms, very rough draft -- do we need both?
  need to consider memory allocation responsibilities and
  DNS internationalization/canonicalization issues
*/

const char* psl_get_public_suffix(const psl_ctx_t, const char* domain);
const char* psl_get_registered_domain(const psl_ctx_t, const char* d);
---

 the folks it's me ;-)

Hi the folks :)  (and thanks for your work on mget!)

 I already thought of splitting libmget into several smaller libraries, like 
 libmget-common, libmget-cookies, libmget-psl ... whatever is needed.
 
 What exactly do you think of ? What can I do to make Debian packaging easy ?

hm, it looks like libmget isn't in debian at all right now.  I'm swamped
with packaging work, and i'm not prepared to review something as
full-featured as libmget itself.  if you could break out the
publicsuffix code so that it was a distinct project from mget, but
provided the API that met the needs of libmget-cookies, that would be
the simplest thing for me to review and package;  we could run any
proposed API by Nikos to make sure it meets the needs of GnuTLS as well,
if you think it's a good idea to push this verification into the TLS
stack itself.

thanks for thinking about this,

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 10:59 AM, Daniel Kahn Gillmor
d...@fifthhorseman.net wrote:
 On 03/19/2014 06:19 AM, Tim Ruehsen wrote:
 As a programmer, I want to have control. E.g. the option to load from a
 different file, or to switch off loading. Why ? e.g. for testing purposes, or
 simply imagine a swiss army knife client for experts - maybe they want to
 have control via CLI args. Or you are in a controlled environment and simply
 don't want to waste CPU cycles when downloading a single file from a trusted
 server. Just some examples.
 And than, clients like Wget would like to have access, at least for checking
 cookies.

 i understand, and i think we're probably not disagreeing -- you want the
 ability to control it; i want sane defaults so that people who don't
 touch it get sensible behavior.

 I just took a quick look but I am not sure about the API (i did not have this
 'aha' effect). But what I don't like is the dependency on PHP which is used 
 to
 'compile' the PSL before the C functions can use it. While the idea of
 compilation/preprocessing is a good one, it should at least be optional.

 pre-compilation/preprocessing is probably a reasonable performance
 optimization for heavy use; we might even want a C library to embed a
 precompiled version of the most recent known list at time of
 compilation, so that it can be used with no initialization step or when
 no file is available.
This may help with seeding thoughts for an implementation. I'm
fortunate because I work in C++.

I have a 'precooked' list with, com, mil, ...  ak.us, co.uk,
etc. One entry for each line.

There can be multiple dots. For example, sekikawa.niigata.jp.

I read the list into a vector, sort it in n*log(n), and then get
log(n) lookups for the lifetime of the program. I pay the cost of the
sort because I make frequent lookups.

When I match names with wild cards, I take a DNS name like
*.example.com. I change it to example.com, and see if its banned. Its
a simple algorithm but its effective.

I embed the list in my executable with GNU's assembler (*.S file). Its
essentially a string with both a length and a NULL terminator:

;; eff_tld_list.S
.section .rodata

;; Mozilla's Effective TLD list
.global eff_tld_list
.type   eff_tld_list, @object
.align  8
eff_tld_list:
eff_tld_list_start:
.incbin res/eff_tld_list.lst
eff_tld_list_end:
.byte 0

;; The string's size (if needed)
.global eff_tld_list_size
.type   eff_tld_list_size, @object
.align  4
eff_tld_list_size:
.inteff_tld_list_end - eff_tld_list_start

Below is the script I use to fetch Mozilla's list.

Jeff

**

#! /bin/bash

MOZILLA_LIST=MOZILLA_LIST=eff_tld_list.lst

wget http://publicsuffix.org/list/effective_tld_names.dat; -O $MOZILLA_LIST

# Remove comments
sed /^\/\//d $MOZILLA_LIST  temp-1.txt
mv temp-1.txt $MOZILLA_LIST

# Remove empty lines
sed /^$/d $MOZILLA_LIST  temp-2.txt
mv temp-2.txt $MOZILLA_LIST

# Remove lines that begin with !
sed s/^!//g $MOZILLA_LIST  temp-3.txt
mv temp-3.txt $MOZILLA_LIST

# Remove lines that begin with *.
sed s/^\*\.//g $MOZILLA_LIST  temp-4.txt
mv temp-4.txt $MOZILLA_LIST

# Pre-sort it
cat $MOZILLA_LIST | sort  temp-8.txt
mv temp-8.txt $MOZILLA_LIST

# Copy it to resources
cp $MOZILLA_LIST ../res



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Stenberg

On Wed, 19 Mar 2014, Jeffrey Walton wrote:


# Remove lines that begin with !


That sounds wrong:

  A rule may begin with a ! (exclamation mark). If it does, it is labelled
  as a exception rule and then treated as if the exclamation mark is not 
present.

--

 / daniel.haxx.se



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:37 AM, Jeffrey Walton noloa...@gmail.com wrote:
 On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg dan...@haxx.se wrote:
 On Wed, 19 Mar 2014, Jeffrey Walton wrote:

 # Remove lines that begin with !


 That sounds wrong:

   A rule may begin with a ! (exclamation mark). If it does, it is labelled
   as a exception rule and then treated as if the exclamation mark is not
 present.
 Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :)

 Anyway, I'll try to find the meaning of that bang. I seem to recall I
 could not find the meaning of it in the past.
After reading that again, I don't mean to sound rude. Sorry about
that. Thanks for pointing it out.

And it does bring up a good point: the data structure needs two thing:
(1) a name, and (2) a flag for white/black. White is white listed
while black is black listed.

The API needs (at minimum): (1) take a name, and (2) return
white/black/no entry. Everything else is just frills.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg dan...@haxx.se wrote:
 On Wed, 19 Mar 2014, Jeffrey Walton wrote:

 # Remove lines that begin with !


 That sounds wrong:

   A rule may begin with a ! (exclamation mark). If it does, it is labelled
   as a exception rule and then treated as if the exclamation mark is not
 present.
Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :)

Anyway, I'll try to find the meaning of that bang. I seem to recall I
could not find the meaning of it in the past.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:45 AM, Jeffrey Walton noloa...@gmail.com wrote:
 On Wed, Mar 19, 2014 at 11:37 AM, Jeffrey Walton noloa...@gmail.com wrote:
 On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg dan...@haxx.se wrote:
 On Wed, 19 Mar 2014, Jeffrey Walton wrote:

 # Remove lines that begin with !


 That sounds wrong:

   A rule may begin with a ! (exclamation mark). If it does, it is labelled
   as a exception rule and then treated as if the exclamation mark is not
 present.
 Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :)

 Anyway, I'll try to find the meaning of that bang. I seem to recall I
 could not find the meaning of it in the past.
 After reading that again, I don't mean to sound rude. Sorry about
 that. Thanks for pointing it out.

 And it does bring up a good point: the data structure needs two thing:
 (1) a name, and (2) a flag for white/black. White is white listed
 while black is black listed.

 The API needs (at minimum): (1) take a name, and (2) return
 white/black/no entry. Everything else is just frills.

Something else you may want in the API is a way to determine how,
exactly, a name matched if its a failure. So it might be usefult to
include a Generic Top Level Domain (gTLD), a Country Code Top Level
Domain (ccTLD), or an Effective Top Level Domain from a PSL.

I mention it because my code has some diagnostics in debug builds that
logs the info.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Kahn Gillmor
On 03/19/2014 11:55 AM, Jeffrey Walton wrote:
 Also, be careful of where you are pulling the list from. I got burned
 by pulling a list that was not being updated
 (https://bugzilla.mozilla.org/show_bug.cgi?id=968064).

i've been similarly burned before too, but i settled on the mxr address
i just posted after trying a few other places.

 The Mozilla folks state the canonical list is at
 http://publicsuffix.org/list/effective_tld_names.dat. See Comment 11
 at https://bugzilla.mozilla.org/show_bug.cgi?id=968064#c11.

i just followed up there to point out that the canonical location for
the data needs to have some form of cryptographic integrity mechanism.
thanks for pointing that out.

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Ángel González

On 19/03/14 16:37, Jeffrey Walton wrote:

On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenbergdan...@haxx.se  wrote:

On Wed, 19 Mar 2014, Jeffrey Walton wrote:


# Remove lines that begin with !


That sounds wrong:

   A rule may begin with a ! (exclamation mark). If it does, it is labelled
   as a exception rule and then treated as if the exclamation mark is not
present.

Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :)

Anyway, I'll try to find the meaning of that bang. I seem to recall I
could not find the meaning of it in the past.

Jeff
It excludes a hostname from a previous matching rule. See 
http://publicsuffix.org/list/#list-format
Currently, there doesn't seem to be any exclusion with a wildcard, so 
right now all lines beginning

with '!' are equivalent to accept this hostname.

Also note that by removing the *. from the beginning of the lines*, 
you are acepting more hosts than
you should, such as a certificate for *.com.bd (represented as *.bd in 
the PSL) which should have been

rejected.



* Your script comments are wrong btw, since you're not removing full lines.




Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 3:03 PM, Ángel González keis...@gmail.com wrote:
 On 19/03/14 16:37, Jeffrey Walton wrote:

 ...
 Also note that by removing the *. from the beginning of the lines*, you
 are acepting more hosts than
 you should, such as a certificate for *.com.bd (represented as *.bd in the
 PSL) which should have been
 rejected.
Oh, that's bad. I'll have to check that. Thanks.

Jeff



[Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Jeffrey Walton
I believe wget has a security flaw in its certificate hostname matching code.

In the attached server certificate, the hostname is provided via a
Subject Alt Name (SAN). The only SAN entry is a DNS name for *.com.
Also attached is the default CA, which was used to sign the server's
certificate.

Effectively, wget accepts a single certificate for the gTLD of .COM.
That's probably bad. If a CA is compromised, then the compromised CA
could issue a super certificate and cover the entire top level
domain space.

I suspect wget also accepts certificates for .COM's friends, like
.NET, .ORG, .MIL, etc.

Its probably not limited to gTLDs. Mozilla maintains a list of
effective TLDs at https://wiki.mozilla.org/Public_Suffix_List. The
1600+ effective TLDs are probably accepted, too.

Attached are the certificates, keys, and commands to set up a test rig
with OpenSSL's s_server. The certificates are issued for example.com,
and require a modification to /etc/hosts to make things work as
(un)expected.

Jeffrey Walton
Baltimore, MD, US


hostname-verification.tar.gz
Description: GNU Zip compressed data


Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Daniel Kahn Gillmor
Hi Jeffrey--

On 03/18/2014 01:43 AM, Jeffrey Walton wrote:
 I believe wget has a security flaw in its certificate hostname matching code.
 
 In the attached server certificate, the hostname is provided via a
 Subject Alt Name (SAN). The only SAN entry is a DNS name for *.com.
 Also attached is the default CA, which was used to sign the server's
 certificate.

thanks for raising this concern.

Have you tested this certificate and CA with other HTTPS clients (like
browsers?)

Section 11.1.3 of the CA/Browser Forum's baseline requirements for CAs
are that compliant CAs MUST NOT issue wildcard certs for an entire
registry-controlled zone or public suffix unless the applicant proves
its rightful control of the entire Domain Namespace:

https://cabforum.org/wp-content/uploads/Baseline_Requirements_V1_1_6.pdf

So arguably, it is the responsibility of the CA, not the responsibility
of the relying party, to determine what certs are legitimate.

Put another way: should every TLS client library embed the public suffix
list?  how often should they update it?  What if a certificate is issued
by a trusted CA that *does* match part of the public suffix list
(perhaps because the CA has determined tha tthe application has rightful
control over the entire zone)?

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Ángel González
I don't think wget should be checking correct hostname scope of the 
certificate.
I mean, it'd be ok to have some general rule as noone can use a 
certificate for

*.whatever or *. [1] but embedding the Public Suffix List seems overkill.
And the implementation should probably be performed at openssl/gnutls level.

If an attacker was able to get a CA-signed certificate for *.com (even 
though
browsers reject that), he is very likely to have also been able to 
create a certificate

for the domain you are browsing or directly a sub-CA.

Daniel, how does cURL check correctness of the certificate hostname suffix?

1- And even them, we might end up with a new TLD (eg.
*.apple ) where turns out to be correct.




Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Tim Rühsen
Hi Jeffrey,

thanks for pointing this out.

BTW, to reproduce the issue I used a GnuTLS compiled/linked version of Wget:

$ wget -d --ca-certificate=ca-rsa-cert.pem --private-key=ca-rsa-key-plain.pem 
https://example.com:8443
2014-03-18 21:48:04 (1.88 GB/s) - Read error at byte 5116 (The TLS connection 
was non-properly terminated.).Retrying.

There seems to be a problem in Wget 1.15 (on Debian SID)...


But despite from that, Wget uses the hostname checking facility of the GnuTLS 
library (or of OpenSSL library if appropriately compiled). And I saw you 
already addressed bug-gnutls, which seems the right way to go.

IHMO, the Public Suffix List (PSL) should not only be used to verify cookies 
but 
also be used for certificate hostname checking.

Libraries as GnuTLS should offer an API for this kind of checking, best would 
be having the PSL as a separate file, maintained by the distribution 
maintainers (or the user, if he wants to to it). The SSL library should 
load/unload the PSL under the applications control.

Maybe it would be a good idea to provide a separate PSL library that could be 
used by SSL libraries for hostname checking and HTTP(S) clients for cookie 
verification.

If of any interest, there is already some LGPLed code at
  https://github.com/rockdaboot/mget/blob/master/libmget/cookie.c
There are also some unit test routines in the project.

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Daniel Kahn Gillmor
On 03/18/2014 05:31 PM, Tim Rühsen wrote:
 $ wget -d --ca-certificate=ca-rsa-cert.pem --private-key=ca-rsa-key-plain.pem 
 https://example.com:8443
 2014-03-18 21:48:04 (1.88 GB/s) - Read error at byte 5116 (The TLS connection 
 was non-properly terminated.).Retrying.
 
 There seems to be a problem in Wget 1.15 (on Debian SID)...

hm, i'll try to take a look at this.

 But despite from that, Wget uses the hostname checking facility of the GnuTLS 
 library (or of OpenSSL library if appropriately compiled). And I saw you 
 already addressed bug-gnutls, which seems the right way to go.
 
 IHMO, the Public Suffix List (PSL) should not only be used to verify cookies 
 but 
 also be used for certificate hostname checking.
 
 Libraries as GnuTLS should offer an API for this kind of checking, best would 
 be having the PSL as a separate file, maintained by the distribution 
 maintainers (or the user, if he wants to to it). The SSL library should 
 load/unload the PSL under the applications control.

that sounds really fiddly to me -- you want the application to know why
the TLS stack needs to know about the public suffix list, and to be able
to control it appropriately?

I think we need good sensible defaults, and a locally-cached,
frequently-updated copy of the public suffix list; then if we really
really want the application to be able to control the use of an
alternate suffix list we can provide an API for that, but i can't
imagine we'd want to require the application to specify anything (even
asking the application to load the default local PSL seems like too much
to expect from most apps that just want to layer in some TLS).

 Maybe it would be a good idea to provide a separate PSL library that could be 
 used by SSL libraries for hostname checking and HTTP(S) clients for cookie 
 verification.

I maintain publicsuffix in debian, and i try to help on the gnutls side
of things too (both upstream and a little bit of kibbitzing about the
debian packaging).

debian has php, python, perl, and haskell bindings for the public suffix
list, but i don't think anyone has packaged a C library for it.

I've got discussion in my mailbox that i haven't processed in ages with
Florian Sager about packaging regdom-libs [0], though, and the library
looks like it's been revived a bit since i gave up on it last [1].  Do
you think this C interface would be a useful one or would you expect a
different API?

[0] http://www.dkim-reputation.org/regdom-libs/
[1] https://bugs.debian.org/683881

 If of any interest, there is already some LGPLed code at
   https://github.com/rockdaboot/mget/blob/master/libmget/cookie.c
 There are also some unit test routines in the project.

hm, do you know if the libmget folks are willing to break that code out
separately?  linking to all of libmget doesn't sound like a good idea,
and it would be a shame to have to maintain separate copies of this
codebase.

--dkg



signature.asc
Description: OpenPGP digital signature