Re: [Bug-wget] Overly permissive hostname matching

2014-03-21 Thread Ángel González

On 18/03/14 16:00, Jeffrey Walton wrote:

What if a certificate is issued by a trusted CA that *does*
match part of the public suffix list (perhaps because the
CA has determined tha tthe application has rightful
control over the entire zone)?
In practice we know four things. First, no one authority controls the
entire domain space in a gTLD. So its really a non-sequitur. We might
inadvertently see it in cases like Diginotar, but that's a negative
case and not a typical use case. However, we should expect these
corner cases on occasion.

Second, anyone claiming such is probably trying to subvert the secure
channel. (...)


I realised that there is a problem with private registries if trying to 
apply the

PSL to certificates.

There are two kinds of private registries in the PSL: those full-delegation
registries (you have whole control of the domain) and content-delegation
ones. In the later case, they are public suffixes since users can place are
as arbitrary content, but the servers are under control of a single org, 
and

thus they can (and do) use a wildcard certificate for their domain.
See for instance blogspot.com

It is easy to exclude the private registries but there's no difference 
between

them. I think we should request mozilla to split that section in two.

*

As a different comment, I discovered that although wildcards are not 
restricted

in the PSL description, they will in practise appear only at the beginning
and in fact, Mozilla implementation only supports that. This simplifies the
matching.




Re: [Bug-wget] Overly permissive hostname matching

2014-03-21 Thread Tim Ruehsen
On Thursday 20 March 2014 23:43:08 Ángel González wrote:
> On 20/03/14 22:52, Tim Rühsen wrote:
> > I broke out the public suffix code together and created a first go (really
> > very quick, distcheck fails - couldn't figure out this evening).
> > 
> > https://github.com/rockdaboot/libpsl
> > 
> > The first step was a psl_is_tld() function.
> > There is a test case for some major things (wildcards, exceptions).
> 
> So, your public api seems to be this:
> 
> psl_ctx_t *psl_load_file(const char *fname);
> 
> void psl_free(psl_ctx_t **psl)
> 
> int psl_is_tld(const psl_ctx_t *psl, const char *domain)
> 
> Fisrt, I wouldn't call the function is_tld(), not just because tlds
> simply won't have any dot inside, (just extract the last label in a DNS
> name)* since there are more ambiguous cases. I would name it
> is_public(), defining it as “one domain under which anyone* can register
> a subdomain”. Additionally, I think there should be a function to
> extract the public suffix from a given domain. Both functions should
> take a flags argument. The immediate use I foresee is to choose whether
> private registries should be taken into account or not. (a private
> registry is a domain used for the public but not owned by a registry,
> dyndns.org and blogspot.com are examples of that) * "anyone" understood
> as a random person unaffiliated with the owner of the parent domain,
> notwithstanding any condition that such "anyone" is required to fulfill
> in order to register it (such as residing in a given region or having
> payed certain fees). PS: It's funny to see 1994 rfc1591 talking about
> TLDs and saying «It is extremely unlikely that any other TLDs will be
> created.»

Thanks for your feedback.

Maybe you could just open issues (or even better, fork the repo, make your 
changes and create pull requests). That is much easier to maintain because it 
wastes time if I have to keep in mind the contents of the discussion here 
and/or to look it all up again when I find time for coding.

I agree with changing the function name and I agree that a function to extract 
the public suffix from a given domain is useful.

Is there anybody with time to brush up the autoconf stuff (just go through it, 
fix the warnings with ./autogen.sh, fix 'make distcheck'). ?

What about API docs - would Doxygen be oversized ?

Tim




Re: [Bug-wget] Overly permissive hostname matching

2014-03-21 Thread Tim Ruehsen
On Thursday 20 March 2014 17:58:05 Jeffrey Walton wrote:
> On Thu, Mar 20, 2014 at 5:52 PM, Tim Rühsen  wrote:
> I had a sidebar with one of the OpenSSL devs because OpenSSL is
> cutting in hostname matching in version 1.0.2.
> 
> He shared a link to a IETF working group on the subject:
> https://www.ietf.org/mailman/listinfo/dbound.

Thanks for sharing the link.

Interesting and I guess everybody agrees, that the PSL is not for the long 
term. I'll have a deeper look in DBOUND the next days.

Tim




Re: [Bug-wget] Overly permissive hostname matching

2014-03-21 Thread Tim Ruehsen
On Thursday 20 March 2014 23:11:31 Daniel Stenberg wrote:
> On Thu, 20 Mar 2014, Tim Rühsen wrote:
> > I broke out the public suffix code together and created a first go (really
> > very quick, distcheck fails - couldn't figure out this evening).
> > 
> > https://github.com/rockdaboot/libpsl
> 
> Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do
> this early.
> 
> You do realize that with a *GPL license on the thing, you won't get adopted
> by OpenSSL, curl and possibly others?

I knew you were the first to bring this up ;-)

> I can't prevent you of course and the decision is yours to make, but I'd
> prefer a BSD style license as then I could really consider basing future
> enhancements of curl on this effort.

I don't care much about the license in this case, since we are talking about 
something pretty simple. I just concentrated on the code that evening and 
simply copied the license clauses...

I would like to see a consensus here just to avoid further discussions about 
licenses (there have been far too many).

Ángel González already 'voted' for a MIT license.
Daniel, you prefer a BSD license.
I am OK with either one.
Any other votes ?

Tim



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Jeffrey Walton
On Thu, Mar 20, 2014 at 8:12 PM, Ángel González  wrote:
> On 21/03/14 00:21, Daniel Stenberg wrote:
>>
>> ...
>> (Sorry, I don't know. I'm not a lawyer, so my solution is usually to
>> avoid GPL code all together).
>
> That's a solution. Although it's a sad result from usage of a license
> intended to preserve freedoms.
For what its worth, I agree with you.

I can't afford lawyers on retainer to untangle things or to defend a
suite. Hence the reason would be happy to use a permissive GPL license
and assign any IP to GNU or FSF.

I don't take the position due to philosophy or perceived moral high
ground. Its simply economic for me. Anyone who has not experienced the
economics of a technology lawsuit is in for a shock.

In the past, I spent 10,000's on a lawyer in a technology case. I'm
not going through that again, unless I'm independently wealthy. (I
think I had the moral high ground since I was suing a chronic spammer
who harassed me for nearly 15 years. I tried for years to get off the
lists, and finally had to resort to the courts).

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Ángel González

On 21/03/14 00:21, Daniel Stenberg wrote:

On Fri, 21 Mar 2014, Ángel González wrote:


The LGPL would be an option.


Not for curl though and probably not to other BSD/MIT licensed 
projects...



That's a good point.



Jeff wrote:

Isn't copyright assigned to GNU or FSF?

No. By licensing something under GPL you don't assign any copyright to FSF.*

However, in some GNU projects, -and unlike most (all?) the rest of free 
projects- the
FSF additionally requests to be assigned the copyright of the work, in 
order to be

in a better position for enforcing its free license.
https://www.gnu.org/licenses/why-assign.html

* Actually, if it's licensed with an "or later version", as the license 
gatekeepers,

they have an extra right to change it, but I wouldn't.



(Sorry, I don't know. I'm not a lawyer, so my solution is usually to
avoid GPL code all together).
That's a solution. Although it's a sad result from usage of a license 
intended to preserve freedoms.



Cheers




Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Daniel Stenberg

On Fri, 21 Mar 2014, Ángel González wrote:


The LGPL would be an option.


Not for curl though and probably not to other BSD/MIT licensed projects...

--

 / daniel.haxx.se

Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Jeffrey Walton
On Thu, Mar 20, 2014 at 7:11 PM, Ángel González  wrote:
> On 20/03/14 23:16, Jeffrey Walton wrote:
>>
>>
>>> I can't prevent you of course and the decision is yours to make, but I'd
>>> prefer a BSD style license as then I could really consider basing future
>>> enhancements of curl on this effort.
>>
>> Does GNU have a permissive license? I know permissive does not meet
>> all of Dr. Stallman's goals, but it will allow GNU more intellectual
>> property in the arena.
>>
> The LGPL would be an option. I don't see why is intellectual property
> related.
> The license and the copyright owner seem quite orthogonal to me.
Isn't copyright assigned to GNU or FSF?

(Sorry, I don't know. I'm not a lawyer, so my solution is usually to
avoid GPL code all together).

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Ángel González

On 20/03/14 23:16, Jeffrey Walton wrote:



I can't prevent you of course and the decision is yours to make, but I'd
prefer a BSD style license as then I could really consider basing future
enhancements of curl on this effort.

Does GNU have a permissive license? I know permissive does not meet
all of Dr. Stallman's goals, but it will allow GNU more intellectual
property in the arena.

Jeff
The LGPL would be an option. I don't see why is intellectual property 
related.

The license and the copyright owner seem quite orthogonal to me.





Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Ángel González

On 20/03/14 22:52, Tim Rühsen wrote:

I broke out the public suffix code together and created a first go (really very
quick, distcheck fails - couldn't figure out this evening).

https://github.com/rockdaboot/libpsl

The first step was a psl_is_tld() function.
There is a test case for some major things (wildcards, exceptions).


So, your public api seems to be this:

psl_ctx_t *psl_load_file(const char *fname);

void psl_free(psl_ctx_t **psl)

int psl_is_tld(const psl_ctx_t *psl, const char *domain)

Fisrt, I wouldn't call the function is_tld(), not just because tlds 
simply won't have any dot inside, (just extract the last label in a DNS 
name)* since there are more ambiguous cases. I would name it 
is_public(), defining it as “one domain under which anyone* can register 
a subdomain”. Additionally, I think there should be a function to 
extract the public suffix from a given domain. Both functions should 
take a flags argument. The immediate use I foresee is to choose whether 
private registries should be taken into account or not. (a private 
registry is a domain used for the public but not owned by a registry, 
dyndns.org and blogspot.com are examples of that) * "anyone" understood 
as a random person unaffiliated with the owner of the parent domain, 
notwithstanding any condition that such "anyone" is required to fulfill 
in order to register it (such as residing in a given region or having 
payed certain fees). PS: It's funny to see 1994 rfc1591 talking about 
TLDs and saying «It is extremely unlikely that any other TLDs will be 
created.»





Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Ángel González

On 20/03/14 23:11, Daniel Stenberg wrote:
You do realize that with a *GPL license on the thing, you won't get 
adopted by OpenSSL, curl and possibly others?


I can't prevent you of course and the decision is yours to make, but 
I'd prefer a BSD style license as then I could really consider basing 
future enhancements of curl on this effort.
FWIW, I have to agree, I would have opted for a MIT license for this if 
I were writing it. The FSF would probably strongly oppose, though :)






Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Jeffrey Walton
On Thu, Mar 20, 2014 at 6:11 PM, Daniel Stenberg  wrote:
> On Thu, 20 Mar 2014, Tim Rühsen wrote:
>
>> I broke out the public suffix code together and created a first go (really
>> very quick, distcheck fails - couldn't figure out this evening).
>>
>> https://github.com/rockdaboot/libpsl
>
>
> Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do
> this early.
>
> You do realize that with a *GPL license on the thing, you won't get adopted
> by OpenSSL, curl and possibly others?
+1

> I can't prevent you of course and the decision is yours to make, but I'd
> prefer a BSD style license as then I could really consider basing future
> enhancements of curl on this effort.
Does GNU have a permissive license? I know permissive does not meet
all of Dr. Stallman's goals, but it will allow GNU more intellectual
property in the arena.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Daniel Stenberg

On Thu, 20 Mar 2014, Tim Rühsen wrote:

I broke out the public suffix code together and created a first go (really 
very quick, distcheck fails - couldn't figure out this evening).


https://github.com/rockdaboot/libpsl


Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do this 
early.


You do realize that with a *GPL license on the thing, you won't get adopted by 
OpenSSL, curl and possibly others?


I can't prevent you of course and the decision is yours to make, but I'd 
prefer a BSD style license as then I could really consider basing future 
enhancements of curl on this effort.


--

 / daniel.haxx.se

Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Jeffrey Walton
On Thu, Mar 20, 2014 at 5:52 PM, Tim Rühsen  wrote:
> Am Mittwoch, 19. März 2014, 10:59:05 schrieb Daniel Kahn Gillmor:
>> I'm imagining a C library API that has a public suffix list context
>> object that can do efficient lookups (however we define the lookups),
>> and the library would bundle a pre-compiled context, based on the
>> currently-known public suffix list.
>>
>> something like:
>>
>> ---
>> struct psl_ctx;
>> typedef struct psl_ctx * psl_ctx_t;
>> const psl_ctx_t psl_builtin;
>>
>> psl_ctx_t psl_new_ctx_from_filename(const char* filename);
>> psl_ctx_t psl_new_ctx_from_fd(int fd);
>> void psl_free_ctx(psl_ctx_t ctx);
>>
>> /*
>>   query forms, very rough draft -- do we need both?
>>   need to consider memory allocation responsibilities and
>>   DNS internationalization/canonicalization issues
>> */
>>
>> const char* psl_get_public_suffix(const psl_ctx_t, const char* domain);
>> const char* psl_get_registered_domain(const psl_ctx_t, const char* d);
>> ---
>
> I broke out the public suffix code together and created a first go (really 
> very
> quick, distcheck fails - couldn't figure out this evening).
>
> https://github.com/rockdaboot/libpsl
>
> The first step was a psl_is_tld() function.
> There is a test case for some major things (wildcards, exceptions).
>
> I hope there will be some interest and some contributions...
Yes, I'd be interested. Especially since Angel pointed out failures in
my use of the PSL (the close-open failures are troubling to me).

I had a sidebar with one of the OpenSSL devs because OpenSSL is
cutting in hostname matching in version 1.0.2.

He shared a link to a IETF working group on the subject:
https://www.ietf.org/mailman/listinfo/dbound.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-20 Thread Tim Rühsen
Am Mittwoch, 19. März 2014, 10:59:05 schrieb Daniel Kahn Gillmor:
> I'm imagining a C library API that has a public suffix list context
> object that can do efficient lookups (however we define the lookups),
> and the library would bundle a pre-compiled context, based on the
> currently-known public suffix list.
> 
> something like:
> 
> ---
> struct psl_ctx;
> typedef struct psl_ctx * psl_ctx_t;
> const psl_ctx_t psl_builtin;
> 
> psl_ctx_t psl_new_ctx_from_filename(const char* filename);
> psl_ctx_t psl_new_ctx_from_fd(int fd);
> void psl_free_ctx(psl_ctx_t ctx);
> 
> /*
>   query forms, very rough draft -- do we need both?
>   need to consider memory allocation responsibilities and
>   DNS internationalization/canonicalization issues
> */
> 
> const char* psl_get_public_suffix(const psl_ctx_t, const char* domain);
> const char* psl_get_registered_domain(const psl_ctx_t, const char* d);
> ---

I broke out the public suffix code together and created a first go (really very 
quick, distcheck fails - couldn't figure out this evening).

https://github.com/rockdaboot/libpsl

The first step was a psl_is_tld() function.
There is a test case for some major things (wildcards, exceptions).

I hope there will be some interest and some contributions...

Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 3:03 PM, Ángel González  wrote:
> On 19/03/14 16:37, Jeffrey Walton wrote:
>>
>> ...
> Also note that by removing the "*." from the beginning of the lines*, you
> are acepting more hosts than
> you should, such as a certificate for *.com.bd (represented as *.bd in the
> PSL) which should have been
> rejected.
Oh, that's bad. I'll have to check that. Thanks.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Ángel González

On 19/03/14 16:37, Jeffrey Walton wrote:

On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg  wrote:

On Wed, 19 Mar 2014, Jeffrey Walton wrote:


# Remove lines that begin with "!"


That sounds wrong:

   A rule may begin with a "!" (exclamation mark). If it does, it is labelled
   as a "exception rule" and then treated as if the exclamation mark is not
present.

Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :)

Anyway, I'll try to find the meaning of that bang. I seem to recall I
could not find the meaning of it in the past.

Jeff
It excludes a hostname from a previous matching rule. See 
http://publicsuffix.org/list/#list-format
Currently, there doesn't seem to be any exclusion with a wildcard, so 
right now all lines beginning

with '!' are equivalent to "accept this hostname".

Also note that by removing the "*." from the beginning of the lines*, 
you are acepting more hosts than
you should, such as a certificate for *.com.bd (represented as *.bd in 
the PSL) which should have been

rejected.



* Your script comments are wrong btw, since you're not removing full lines.




Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Kahn Gillmor
On 03/19/2014 11:55 AM, Jeffrey Walton wrote:
> Also, be careful of where you are pulling the list from. I got burned
> by pulling a list that was not being updated
> (https://bugzilla.mozilla.org/show_bug.cgi?id=968064).

i've been similarly burned before too, but i settled on the mxr address
i just posted after trying a few other places.

> The Mozilla folks state the canonical list is at
> http://publicsuffix.org/list/effective_tld_names.dat. See Comment 11
> at https://bugzilla.mozilla.org/show_bug.cgi?id=968064#c11.

i just followed up there to point out that the canonical location for
the data needs to have some form of cryptographic integrity mechanism.
thanks for pointing that out.

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:45 AM, Jeffrey Walton  wrote:
> On Wed, Mar 19, 2014 at 11:37 AM, Jeffrey Walton  wrote:
>> On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg  wrote:
>>> On Wed, 19 Mar 2014, Jeffrey Walton wrote:
>>>
 # Remove lines that begin with "!"
>>>
>>>
>>> That sounds wrong:
>>>
>>>   A rule may begin with a "!" (exclamation mark). If it does, it is labelled
>>>   as a "exception rule" and then treated as if the exclamation mark is not
>>> present.
>> Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :)
>>
>> Anyway, I'll try to find the meaning of that bang. I seem to recall I
>> could not find the meaning of it in the past.
> After reading that again, I don't mean to sound rude. Sorry about
> that. Thanks for pointing it out.
>
> And it does bring up a good point: the data structure needs two thing:
> (1) a name, and (2) a flag for white/black. White is white listed
> while black is black listed.
>
> The API needs (at minimum): (1) take a name, and (2) return
> white/black/no entry. Everything else is just frills.
>
Something else you may want in the API is a way to determine how,
exactly, a name matched if its a failure. So it might be usefult to
include a Generic Top Level Domain (gTLD), a Country Code Top Level
Domain (ccTLD), or an Effective Top Level Domain from a PSL.

I mention it because my code has some diagnostics in debug builds that
logs the info.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg  wrote:
> On Wed, 19 Mar 2014, Jeffrey Walton wrote:
>
>> # Remove lines that begin with "!"
>
>
> That sounds wrong:
>
>   A rule may begin with a "!" (exclamation mark). If it does, it is labelled
>   as a "exception rule" and then treated as if the exclamation mark is not
> present.
Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :)

Anyway, I'll try to find the meaning of that bang. I seem to recall I
could not find the meaning of it in the past.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:38 AM, Daniel Kahn Gillmor
 wrote:
> On 03/19/2014 11:26 AM, Jeffrey Walton wrote:
>
>> wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST
>
> I recommend using the following HTTPS URL instead, so that you have some
> level of cryptographic verification of the data before loading it:
>
> https://hg.mozilla.org/mozilla-central/raw-file/tip/netwerk/dns/effective_tld_names.dat
>
> (this is what i use to update the debian publicsuffix package)
Also, be careful of where you are pulling the list from. I got burned
by pulling a list that was not being updated
(https://bugzilla.mozilla.org/show_bug.cgi?id=968064).

The Mozilla folks state the canonical list is at
http://publicsuffix.org/list/effective_tld_names.dat. See Comment 11
at https://bugzilla.mozilla.org/show_bug.cgi?id=968064#c11.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:37 AM, Jeffrey Walton  wrote:
> On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg  wrote:
>> On Wed, 19 Mar 2014, Jeffrey Walton wrote:
>>
>>> # Remove lines that begin with "!"
>>
>>
>> That sounds wrong:
>>
>>   A rule may begin with a "!" (exclamation mark). If it does, it is labelled
>>   as a "exception rule" and then treated as if the exclamation mark is not
>> present.
> Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :)
>
> Anyway, I'll try to find the meaning of that bang. I seem to recall I
> could not find the meaning of it in the past.
After reading that again, I don't mean to sound rude. Sorry about
that. Thanks for pointing it out.

And it does bring up a good point: the data structure needs two thing:
(1) a name, and (2) a flag for white/black. White is white listed
while black is black listed.

The API needs (at minimum): (1) take a name, and (2) return
white/black/no entry. Everything else is just frills.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:42 AM, Jeffrey Walton  wrote:
> On Wed, Mar 19, 2014 at 11:38 AM, Daniel Kahn Gillmor
>  wrote:
>> On 03/19/2014 11:26 AM, Jeffrey Walton wrote:
>>
>>> wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST
>>
>> I recommend using the following HTTPS URL instead, so that you have some
>> level of cryptographic verification of the data before loading it:
>>
>> https://hg.mozilla.org/mozilla-central/raw-file/tip/netwerk/dns/effective_tld_names.dat
>>
>> (this is what i use to update the debian publicsuffix package)
> Ah, good point. I did not even notice that.
OK, here's the reason for that:

$ ./eff_tld_names.sh
--2014-03-19 12:46:20--  https://publicsuffix.org/list/effective_tld_names.dat
Resolving publicsuffix.org (publicsuffix.org)... 63.245.217.181
Connecting to publicsuffix.org
(publicsuffix.org)|63.245.217.181|:443... failed: Connection refused.

(It sucks getting old. I should have remembered that there's no HTTPS
access t that list).

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 11:38 AM, Daniel Kahn Gillmor
 wrote:
> On 03/19/2014 11:26 AM, Jeffrey Walton wrote:
>
>> wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST
>
> I recommend using the following HTTPS URL instead, so that you have some
> level of cryptographic verification of the data before loading it:
>
> https://hg.mozilla.org/mozilla-central/raw-file/tip/netwerk/dns/effective_tld_names.dat
>
> (this is what i use to update the debian publicsuffix package)
Ah, good point. I did not even notice that.

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Kahn Gillmor
On 03/19/2014 11:26 AM, Jeffrey Walton wrote:

> wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST

I recommend using the following HTTPS URL instead, so that you have some
level of cryptographic verification of the data before loading it:

https://hg.mozilla.org/mozilla-central/raw-file/tip/netwerk/dns/effective_tld_names.dat

(this is what i use to update the debian publicsuffix package)

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Stenberg

On Wed, 19 Mar 2014, Jeffrey Walton wrote:


# Remove lines that begin with "!"


That sounds wrong:

  A rule may begin with a "!" (exclamation mark). If it does, it is labelled
  as a "exception rule" and then treated as if the exclamation mark is not 
present.

--

 / daniel.haxx.se



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Jeffrey Walton
On Wed, Mar 19, 2014 at 10:59 AM, Daniel Kahn Gillmor
 wrote:
> On 03/19/2014 06:19 AM, Tim Ruehsen wrote:
>> As a programmer, I want to have control. E.g. the option to load from a
>> different file, or to switch off loading. Why ? e.g. for testing purposes, or
>> simply imagine a "swiss army knife" client for experts - maybe they want to
>> have control via CLI args. Or you are in a controlled environment and simply
>> don't want to waste CPU cycles when downloading a single file from a trusted
>> server. Just some examples.
>> And than, clients like Wget would like to have access, at least for checking
>> cookies.
>
> i understand, and i think we're probably not disagreeing -- you want the
> ability to control it; i want sane defaults so that people who don't
> touch it get sensible behavior.
>
>> I just took a quick look but I am not sure about the API (i did not have this
>> 'aha' effect). But what I don't like is the dependency on PHP which is used 
>> to
>> 'compile' the PSL before the C functions can use it. While the idea of
>> compilation/preprocessing is a good one, it should at least be optional.
>
> pre-compilation/preprocessing is probably a reasonable performance
> optimization for heavy use; we might even want a C library to embed a
> precompiled version of the most recent known list at time of
> compilation, so that it can be used with no initialization step or when
> no file is available.
This may help with seeding thoughts for an implementation. I'm
fortunate because I work in C++.

I have a 'precooked' list with, "com", "mil", ...  "ak.us, "co.uk",
etc. One entry for each line.

There can be multiple dots. For example, "sekikawa.niigata.jp".

I read the list into a vector, sort it in n*log(n), and then get
log(n) lookups for the lifetime of the program. I pay the cost of the
sort because I make frequent lookups.

When I match names with wild cards, I take a DNS name like
*.example.com. I change it to example.com, and see if its banned. Its
a simple algorithm but its effective.

I embed the list in my executable with GNU's assembler (*.S file). Its
essentially a string with both a length and a NULL terminator:

;; eff_tld_list.S
.section .rodata

;; Mozilla's Effective TLD list
.global eff_tld_list
.type   eff_tld_list, @object
.align  8
eff_tld_list:
eff_tld_list_start:
.incbin "res/eff_tld_list.lst"
eff_tld_list_end:
.byte 0

;; The string's size (if needed)
.global eff_tld_list_size
.type   eff_tld_list_size, @object
.align  4
eff_tld_list_size:
.inteff_tld_list_end - eff_tld_list_start

Below is the script I use to fetch Mozilla's list.

Jeff

**

#! /bin/bash

MOZILLA_LIST=MOZILLA_LIST=eff_tld_list.lst

wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST

# Remove comments
sed "/^\/\//d" $MOZILLA_LIST > temp-1.txt
mv temp-1.txt $MOZILLA_LIST

# Remove empty lines
sed "/^$/d" $MOZILLA_LIST > temp-2.txt
mv temp-2.txt $MOZILLA_LIST

# Remove lines that begin with "!"
sed "s/^!//g" $MOZILLA_LIST > temp-3.txt
mv temp-3.txt $MOZILLA_LIST

# Remove lines that begin with "*."
sed "s/^\*\.//g" $MOZILLA_LIST > temp-4.txt
mv temp-4.txt $MOZILLA_LIST

# Pre-sort it
cat $MOZILLA_LIST | sort > temp-8.txt
mv temp-8.txt $MOZILLA_LIST

# Copy it to resources
cp $MOZILLA_LIST ../res



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Kahn Gillmor
On 03/19/2014 06:19 AM, Tim Ruehsen wrote:
> As a programmer, I want to have control. E.g. the option to load from a 
> different file, or to switch off loading. Why ? e.g. for testing purposes, or 
> simply imagine a "swiss army knife" client for experts - maybe they want to 
> have control via CLI args. Or you are in a controlled environment and simply 
> don't want to waste CPU cycles when downloading a single file from a trusted 
> server. Just some examples.
> And than, clients like Wget would like to have access, at least for checking 
> cookies.

i understand, and i think we're probably not disagreeing -- you want the
ability to control it; i want sane defaults so that people who don't
touch it get sensible behavior.

> I just took a quick look but I am not sure about the API (i did not have this 
> 'aha' effect). But what I don't like is the dependency on PHP which is used 
> to 
> 'compile' the PSL before the C functions can use it. While the idea of 
> compilation/preprocessing is a good one, it should at least be optional.

pre-compilation/preprocessing is probably a reasonable performance
optimization for heavy use; we might even want a C library to embed a
precompiled version of the most recent known list at time of
compilation, so that it can be used with no initialization step or when
no file is available.  I don't think depending on php for the
pre-compilation step is a problem; that's just an additional
build-dependency, same as (for example) bison or cmake or python for
other C projects.  (though i confess i'd rather work with pretty much
any language other than PHP in general)

I agree that we probably want the library to support the generic case of
reading the PSL from a file, though.

I'm imagining a C library API that has a public suffix list context
object that can do efficient lookups (however we define the lookups),
and the library would bundle a pre-compiled context, based on the
currently-known public suffix list.

something like:

---
struct psl_ctx;
typedef struct psl_ctx * psl_ctx_t;
const psl_ctx_t psl_builtin;

psl_ctx_t psl_new_ctx_from_filename(const char* filename);
psl_ctx_t psl_new_ctx_from_fd(int fd);
void psl_free_ctx(psl_ctx_t ctx);

/*
  query forms, very rough draft -- do we need both?
  need to consider memory allocation responsibilities and
  DNS internationalization/canonicalization issues
*/

const char* psl_get_public_suffix(const psl_ctx_t, const char* domain);
const char* psl_get_registered_domain(const psl_ctx_t, const char* d);
---

> "the folks" it's me ;-)

Hi "the folks" :)  (and thanks for your work on mget!)

> I already thought of splitting libmget into several smaller libraries, like 
> libmget-common, libmget-cookies, libmget-psl ... whatever is needed.
> 
> What exactly do you think of ? What can I do to make Debian packaging easy ?

hm, it looks like libmget isn't in debian at all right now.  I'm swamped
with packaging work, and i'm not prepared to review something as
full-featured as libmget itself.  if you could break out the
publicsuffix code so that it was a distinct project from mget, but
provided the API that met the needs of libmget-cookies, that would be
the simplest thing for me to review and package;  we could run any
proposed API by Nikos to make sure it meets the needs of GnuTLS as well,
if you think it's a good idea to push this verification into the TLS
stack itself.

thanks for thinking about this,

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Stenberg

On Wed, 19 Mar 2014, Daniel Kahn Gillmor wrote:

It insists on at least two dots. So yes, "*.apple" will cause problems for 
us too.


There are also errors in the opposite direction: it sounds like curl will 
accept a cert for *.co.uk, right?


Exactly, due to the lack of public suffix awareness! :-(

--

 / daniel.haxx.se



Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Kahn Gillmor
On 03/19/2014 10:38 AM, Daniel Stenberg wrote:
> On Tue, 18 Mar 2014, Ángel González wrote:
> 
>> Daniel, how does cURL check correctness of the certificate hostname
>> suffix?
> 
> It insists on at least two dots. So yes, "*.apple" will cause problems
> for us too.

There are also errors in the opposite direction: it sounds like curl
will accept a cert for *.co.uk, right?

> I view the public suffix list as one of the worst kludges in networking
> history and while I understand why it is necessary, it is next to
> impossible to actually use sensibly in lots of environments.

I agree that the PSL is a horrible kludge; i'm not sure what other
solutions are possible though, until the DNS gets some way to specify
public registries itself (e.g. the DBOUND discussion going on in the IETF).

In the meantime, we need to figure something out, though :/

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Daniel Stenberg

On Tue, 18 Mar 2014, Ángel González wrote:


Daniel, how does cURL check correctness of the certificate hostname suffix?


It insists on at least two dots. So yes, "*.apple" will cause problems for us 
too.


I view the public suffix list as one of the worst kludges in networking 
history and while I understand why it is necessary, it is next to impossible 
to actually use sensibly in lots of environments.


--

 / daniel.haxx.se

Re: [Bug-wget] Overly permissive hostname matching

2014-03-19 Thread Tim Ruehsen
On Tuesday 18 March 2014 20:05:07 Daniel Kahn Gillmor wrote:
> On 03/18/2014 05:31 PM, Tim Rühsen wrote:
> > IHMO, the Public Suffix List (PSL) should not only be used to verify
> > cookies but also be used for certificate hostname checking.
> > 
> > Libraries as GnuTLS should offer an API for this kind of checking, best
> > would be having the PSL as a separate file, maintained by the
> > distribution maintainers (or the user, if he wants to to it). The SSL
> > library should load/unload the PSL under the applications control.
> 
> that sounds really fiddly to me -- you want the application to know why
> the TLS stack needs to know about the public suffix list, and to be able
> to control it appropriately?
> 
> I think we need good sensible defaults, and a locally-cached,
> frequently-updated copy of the public suffix list; then if we really
> really want the application to be able to control the use of an
> alternate suffix list we can provide an API for that, but i can't
> imagine we'd want to require the application to specify anything (even
> asking the application to load the default local PSL seems like too much
> to expect from most apps that just want "to layer in some TLS").

As a programmer, I want to have control. E.g. the option to load from a 
different file, or to switch off loading. Why ? e.g. for testing purposes, or 
simply imagine a "swiss army knife" client for experts - maybe they want to 
have control via CLI args. Or you are in a controlled environment and simply 
don't want to waste CPU cycles when downloading a single file from a trusted 
server. Just some examples.
And than, clients like Wget would like to have access, at least for checking 
cookies.

> > Maybe it would be a good idea to provide a separate PSL library that could
> > be used by SSL libraries for hostname checking and HTTP(S) clients for
> > cookie verification.
> 
> I maintain publicsuffix in debian, and i try to help on the gnutls side
> of things too (both upstream and a little bit of kibbitzing about the
> debian packaging).
> 
> debian has php, python, perl, and haskell bindings for the public suffix
> list, but i don't think anyone has packaged a C library for it.
> 
> I've got discussion in my mailbox that i haven't processed in ages with
> Florian Sager about packaging regdom-libs [0], though, and the library
> looks like it's been revived a bit since i gave up on it last [1].  Do
> you think this C interface would be a useful one or would you expect a
> different API?

I just took a quick look but I am not sure about the API (i did not have this 
'aha' effect). But what I don't like is the dependency on PHP which is used to 
'compile' the PSL before the C functions can use it. While the idea of 
compilation/preprocessing is a good one, it should at least be optional.

> [0] http://www.dkim-reputation.org/regdom-libs/
> [1] https://bugs.debian.org/683881
> 
> > If of any interest, there is already some LGPLed code at
> > 
> >   https://github.com/rockdaboot/mget/blob/master/libmget/cookie.c
> > 
> > There are also some unit test routines in the project.
> 
> hm, do you know if the libmget folks are willing to break that code out
> separately?  linking to all of libmget doesn't sound like a good idea,
> and it would be a shame to have to maintain separate copies of this
> codebase.

"the folks" it's me ;-)
I already thought of splitting libmget into several smaller libraries, like 
libmget-common, libmget-cookies, libmget-psl ... whatever is needed.

What exactly do you think of ? What can I do to make Debian packaging easy ?

Tim




Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Daniel Kahn Gillmor
On 03/18/2014 05:31 PM, Tim Rühsen wrote:
> $ wget -d --ca-certificate=ca-rsa-cert.pem --private-key=ca-rsa-key-plain.pem 
> https://example.com:8443
> 2014-03-18 21:48:04 (1.88 GB/s) - Read error at byte 5116 (The TLS connection 
> was non-properly terminated.).Retrying.
> 
> There seems to be a problem in Wget 1.15 (on Debian SID)...

hm, i'll try to take a look at this.

> But despite from that, Wget uses the hostname checking facility of the GnuTLS 
> library (or of OpenSSL library if appropriately compiled). And I saw you 
> already addressed bug-gnutls, which seems the right way to go.
> 
> IHMO, the Public Suffix List (PSL) should not only be used to verify cookies 
> but 
> also be used for certificate hostname checking.
> 
> Libraries as GnuTLS should offer an API for this kind of checking, best would 
> be having the PSL as a separate file, maintained by the distribution 
> maintainers (or the user, if he wants to to it). The SSL library should 
> load/unload the PSL under the applications control.

that sounds really fiddly to me -- you want the application to know why
the TLS stack needs to know about the public suffix list, and to be able
to control it appropriately?

I think we need good sensible defaults, and a locally-cached,
frequently-updated copy of the public suffix list; then if we really
really want the application to be able to control the use of an
alternate suffix list we can provide an API for that, but i can't
imagine we'd want to require the application to specify anything (even
asking the application to load the default local PSL seems like too much
to expect from most apps that just want "to layer in some TLS").

> Maybe it would be a good idea to provide a separate PSL library that could be 
> used by SSL libraries for hostname checking and HTTP(S) clients for cookie 
> verification.

I maintain publicsuffix in debian, and i try to help on the gnutls side
of things too (both upstream and a little bit of kibbitzing about the
debian packaging).

debian has php, python, perl, and haskell bindings for the public suffix
list, but i don't think anyone has packaged a C library for it.

I've got discussion in my mailbox that i haven't processed in ages with
Florian Sager about packaging regdom-libs [0], though, and the library
looks like it's been revived a bit since i gave up on it last [1].  Do
you think this C interface would be a useful one or would you expect a
different API?

[0] http://www.dkim-reputation.org/regdom-libs/
[1] https://bugs.debian.org/683881

> If of any interest, there is already some LGPLed code at
>   https://github.com/rockdaboot/mget/blob/master/libmget/cookie.c
> There are also some unit test routines in the project.

hm, do you know if the libmget folks are willing to break that code out
separately?  linking to all of libmget doesn't sound like a good idea,
and it would be a shame to have to maintain separate copies of this
codebase.

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Jeffrey Walton
Hi Tim,

On Tue, Mar 18, 2014 at 5:31 PM, Tim Rühsen  wrote:
> ...
> BTW, to reproduce the issue I used a GnuTLS compiled/linked version of Wget:
>
> $ wget -d --ca-certificate=ca-rsa-cert.pem --private-key=ca-rsa-key-plain.pem
> https://example.com:8443
> 2014-03-18 21:48:04 (1.88 GB/s) - Read error at byte 5116 (The TLS connection
> was non-properly terminated.).Retrying.
>
> There seems to be a problem in Wget 1.15 (on Debian SID)...
Confirmed on wheezy. I thought it was my OpenSSL server.

> But despite from that, Wget uses the hostname checking facility of the GnuTLS
> library (or of OpenSSL library if appropriately compiled).
OpenSSL won't have hostname checking until 1.0.2. See the CHANGELOG at
https://www.openssl.org/news/changelog.html.

(Mentioned in case you thought wget was performing it via OpenSSL).

> IHMO, the Public Suffix List (PSL) should not only be used to verify cookies 
> but
> also be used for certificate hostname checking.
+1

Jeff



Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Tim Rühsen
Hi Jeffrey,

thanks for pointing this out.

BTW, to reproduce the issue I used a GnuTLS compiled/linked version of Wget:

$ wget -d --ca-certificate=ca-rsa-cert.pem --private-key=ca-rsa-key-plain.pem 
https://example.com:8443
2014-03-18 21:48:04 (1.88 GB/s) - Read error at byte 5116 (The TLS connection 
was non-properly terminated.).Retrying.

There seems to be a problem in Wget 1.15 (on Debian SID)...


But despite from that, Wget uses the hostname checking facility of the GnuTLS 
library (or of OpenSSL library if appropriately compiled). And I saw you 
already addressed bug-gnutls, which seems the right way to go.

IHMO, the Public Suffix List (PSL) should not only be used to verify cookies 
but 
also be used for certificate hostname checking.

Libraries as GnuTLS should offer an API for this kind of checking, best would 
be having the PSL as a separate file, maintained by the distribution 
maintainers (or the user, if he wants to to it). The SSL library should 
load/unload the PSL under the applications control.

Maybe it would be a good idea to provide a separate PSL library that could be 
used by SSL libraries for hostname checking and HTTP(S) clients for cookie 
verification.

If of any interest, there is already some LGPLed code at
  https://github.com/rockdaboot/mget/blob/master/libmget/cookie.c
There are also some unit test routines in the project.

Regards, Tim


signature.asc
Description: This is a digitally signed message part.


Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Ángel González
I don't think wget should be checking correct hostname scope of the 
certificate.
I mean, it'd be ok to have some general rule as "noone can use a 
certificate for

*.whatever or *." [1] but embedding the Public Suffix List seems overkill.
And the implementation should probably be performed at openssl/gnutls level.

If an attacker was able to get a CA-signed certificate for *.com (even 
though
browsers reject that), he is very likely to have also been able to 
create a certificate

for the domain you are browsing or directly a sub-CA.

Daniel, how does cURL check correctness of the certificate hostname suffix?

1- And even them, we might end up with a new TLD (eg.
*.apple ) where turns out to be correct.




Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Daniel Stenberg

On Tue, 18 Mar 2014, Darshit Shah wrote:

I'll try and set up a test case as soon as I can using the materials 
provided by you. It would be even more helpful if someone could pitch in 
with more help since: 1. This is not my domain and I don't understand it 
much. 2. I'm keeping really busy with my real life work and GSoC right now.


While in this area, you may want to fix a few other problems with the wget 
pattern match function that I believe exist as well:


 1 - it allows wildcard matches IP-addresses against the CN field ("*.168.0.1")

 2 - it allows multiple '*' in the pattern

 3 - it allows the '*' to be elsewhere than first in the a wildcard

See rfc6125 section 6.4.3 and 7.2 for helpful hints on the two latter details.

--

 / daniel.haxx.se



Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Daniel Kahn Gillmor
Hi Jeffrey--

On 03/18/2014 01:43 AM, Jeffrey Walton wrote:
> I believe wget has a security flaw in its certificate hostname matching code.
> 
> In the attached server certificate, the hostname is provided via a
> Subject Alt Name (SAN). The only SAN entry is a DNS name for "*.com".
> Also attached is the default CA, which was used to sign the server's
> certificate.

thanks for raising this concern.

Have you tested this certificate and CA with other HTTPS clients (like
browsers?)

Section 11.1.3 of the CA/Browser Forum's baseline requirements for CAs
are that compliant CAs MUST NOT issue wildcard certs for an entire
registry-controlled zone or public suffix "unless the applicant proves
its rightful control of the entire Domain Namespace":

https://cabforum.org/wp-content/uploads/Baseline_Requirements_V1_1_6.pdf

So arguably, it is the responsibility of the CA, not the responsibility
of the relying party, to determine what certs are legitimate.

Put another way: should every TLS client library embed the public suffix
list?  how often should they update it?  What if a certificate is issued
by a trusted CA that *does* match part of the public suffix list
(perhaps because the CA has determined tha tthe application has rightful
control over the entire zone)?

--dkg



signature.asc
Description: OpenPGP digital signature


Re: [Bug-wget] Overly permissive hostname matching

2014-03-18 Thread Darshit Shah
Hi Jeffrey,

Thanks for pointing this out!
I am no expert in security or SSL for that matter. However, this does
seem like a huge security flaw.

I'll try and set up a test case as soon as I can using the materials
provided by you. It would be even more helpful if someone could pitch
in with more help since: 1. This is not my domain and I don't
understand it much. 2. I'm keeping really busy with my real life work
and GSoC right now.

The new test suite can implement a HTTPS Server, so it shouldn't be
too difficult to set this up.

On Tue, Mar 18, 2014 at 6:43 AM, Jeffrey Walton  wrote:
> I believe wget has a security flaw in its certificate hostname matching code.
>
> In the attached server certificate, the hostname is provided via a
> Subject Alt Name (SAN). The only SAN entry is a DNS name for "*.com".
> Also attached is the default CA, which was used to sign the server's
> certificate.
>
> Effectively, wget accepts a single certificate for the gTLD of .COM.
> That's probably bad. If a CA is compromised, then the compromised CA
> could issue a "super certificate" and cover the entire top level
> domain space.
>
> I suspect wget also accepts certificates for .COM's friends, like
> .NET, .ORG, .MIL, etc.
>
> Its probably not limited to gTLDs. Mozilla maintains a list of
> effective TLDs at https://wiki.mozilla.org/Public_Suffix_List. The
> 1600+ effective TLDs are probably accepted, too.
>
> Attached are the certificates, keys, and commands to set up a test rig
> with OpenSSL's s_server. The certificates are issued for example.com,
> and require a modification to /etc/hosts to make things work as
> (un)expected.
>
> Jeffrey Walton
> Baltimore, MD, US



-- 
Thanking You,
Darshit Shah