Re: [Bug-wget] Overly permissive hostname matching
On 18/03/14 16:00, Jeffrey Walton wrote: What if a certificate is issued by a trusted CA that *does* match part of the public suffix list (perhaps because the CA has determined tha tthe application has rightful control over the entire zone)? In practice we know four things. First, no one authority controls the entire domain space in a gTLD. So its really a non-sequitur. We might inadvertently see it in cases like Diginotar, but that's a negative case and not a typical use case. However, we should expect these corner cases on occasion. Second, anyone claiming such is probably trying to subvert the secure channel. (...) I realised that there is a problem with private registries if trying to apply the PSL to certificates. There are two kinds of private registries in the PSL: those full-delegation registries (you have whole control of the domain) and content-delegation ones. In the later case, they are public suffixes since users can place are as arbitrary content, but the servers are under control of a single org, and thus they can (and do) use a wildcard certificate for their domain. See for instance blogspot.com It is easy to exclude the private registries but there's no difference between them. I think we should request mozilla to split that section in two. * As a different comment, I discovered that although wildcards are not restricted in the PSL description, they will in practise appear only at the beginning and in fact, Mozilla implementation only supports that. This simplifies the matching.
Re: [Bug-wget] Overly permissive hostname matching
On Thursday 20 March 2014 23:43:08 Ángel González wrote: > On 20/03/14 22:52, Tim Rühsen wrote: > > I broke out the public suffix code together and created a first go (really > > very quick, distcheck fails - couldn't figure out this evening). > > > > https://github.com/rockdaboot/libpsl > > > > The first step was a psl_is_tld() function. > > There is a test case for some major things (wildcards, exceptions). > > So, your public api seems to be this: > > psl_ctx_t *psl_load_file(const char *fname); > > void psl_free(psl_ctx_t **psl) > > int psl_is_tld(const psl_ctx_t *psl, const char *domain) > > Fisrt, I wouldn't call the function is_tld(), not just because tlds > simply won't have any dot inside, (just extract the last label in a DNS > name)* since there are more ambiguous cases. I would name it > is_public(), defining it as “one domain under which anyone* can register > a subdomain”. Additionally, I think there should be a function to > extract the public suffix from a given domain. Both functions should > take a flags argument. The immediate use I foresee is to choose whether > private registries should be taken into account or not. (a private > registry is a domain used for the public but not owned by a registry, > dyndns.org and blogspot.com are examples of that) * "anyone" understood > as a random person unaffiliated with the owner of the parent domain, > notwithstanding any condition that such "anyone" is required to fulfill > in order to register it (such as residing in a given region or having > payed certain fees). PS: It's funny to see 1994 rfc1591 talking about > TLDs and saying «It is extremely unlikely that any other TLDs will be > created.» Thanks for your feedback. Maybe you could just open issues (or even better, fork the repo, make your changes and create pull requests). That is much easier to maintain because it wastes time if I have to keep in mind the contents of the discussion here and/or to look it all up again when I find time for coding. I agree with changing the function name and I agree that a function to extract the public suffix from a given domain is useful. Is there anybody with time to brush up the autoconf stuff (just go through it, fix the warnings with ./autogen.sh, fix 'make distcheck'). ? What about API docs - would Doxygen be oversized ? Tim
Re: [Bug-wget] Overly permissive hostname matching
On Thursday 20 March 2014 17:58:05 Jeffrey Walton wrote: > On Thu, Mar 20, 2014 at 5:52 PM, Tim Rühsen wrote: > I had a sidebar with one of the OpenSSL devs because OpenSSL is > cutting in hostname matching in version 1.0.2. > > He shared a link to a IETF working group on the subject: > https://www.ietf.org/mailman/listinfo/dbound. Thanks for sharing the link. Interesting and I guess everybody agrees, that the PSL is not for the long term. I'll have a deeper look in DBOUND the next days. Tim
Re: [Bug-wget] Overly permissive hostname matching
On Thursday 20 March 2014 23:11:31 Daniel Stenberg wrote: > On Thu, 20 Mar 2014, Tim Rühsen wrote: > > I broke out the public suffix code together and created a first go (really > > very quick, distcheck fails - couldn't figure out this evening). > > > > https://github.com/rockdaboot/libpsl > > Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do > this early. > > You do realize that with a *GPL license on the thing, you won't get adopted > by OpenSSL, curl and possibly others? I knew you were the first to bring this up ;-) > I can't prevent you of course and the decision is yours to make, but I'd > prefer a BSD style license as then I could really consider basing future > enhancements of curl on this effort. I don't care much about the license in this case, since we are talking about something pretty simple. I just concentrated on the code that evening and simply copied the license clauses... I would like to see a consensus here just to avoid further discussions about licenses (there have been far too many). Ángel González already 'voted' for a MIT license. Daniel, you prefer a BSD license. I am OK with either one. Any other votes ? Tim
Re: [Bug-wget] Overly permissive hostname matching
On Thu, Mar 20, 2014 at 8:12 PM, Ángel González wrote: > On 21/03/14 00:21, Daniel Stenberg wrote: >> >> ... >> (Sorry, I don't know. I'm not a lawyer, so my solution is usually to >> avoid GPL code all together). > > That's a solution. Although it's a sad result from usage of a license > intended to preserve freedoms. For what its worth, I agree with you. I can't afford lawyers on retainer to untangle things or to defend a suite. Hence the reason would be happy to use a permissive GPL license and assign any IP to GNU or FSF. I don't take the position due to philosophy or perceived moral high ground. Its simply economic for me. Anyone who has not experienced the economics of a technology lawsuit is in for a shock. In the past, I spent 10,000's on a lawyer in a technology case. I'm not going through that again, unless I'm independently wealthy. (I think I had the moral high ground since I was suing a chronic spammer who harassed me for nearly 15 years. I tried for years to get off the lists, and finally had to resort to the courts). Jeff
Re: [Bug-wget] Overly permissive hostname matching
On 21/03/14 00:21, Daniel Stenberg wrote: On Fri, 21 Mar 2014, Ángel González wrote: The LGPL would be an option. Not for curl though and probably not to other BSD/MIT licensed projects... That's a good point. Jeff wrote: Isn't copyright assigned to GNU or FSF? No. By licensing something under GPL you don't assign any copyright to FSF.* However, in some GNU projects, -and unlike most (all?) the rest of free projects- the FSF additionally requests to be assigned the copyright of the work, in order to be in a better position for enforcing its free license. https://www.gnu.org/licenses/why-assign.html * Actually, if it's licensed with an "or later version", as the license gatekeepers, they have an extra right to change it, but I wouldn't. (Sorry, I don't know. I'm not a lawyer, so my solution is usually to avoid GPL code all together). That's a solution. Although it's a sad result from usage of a license intended to preserve freedoms. Cheers
Re: [Bug-wget] Overly permissive hostname matching
On Fri, 21 Mar 2014, Ángel González wrote: The LGPL would be an option. Not for curl though and probably not to other BSD/MIT licensed projects... -- / daniel.haxx.se
Re: [Bug-wget] Overly permissive hostname matching
On Thu, Mar 20, 2014 at 7:11 PM, Ángel González wrote: > On 20/03/14 23:16, Jeffrey Walton wrote: >> >> >>> I can't prevent you of course and the decision is yours to make, but I'd >>> prefer a BSD style license as then I could really consider basing future >>> enhancements of curl on this effort. >> >> Does GNU have a permissive license? I know permissive does not meet >> all of Dr. Stallman's goals, but it will allow GNU more intellectual >> property in the arena. >> > The LGPL would be an option. I don't see why is intellectual property > related. > The license and the copyright owner seem quite orthogonal to me. Isn't copyright assigned to GNU or FSF? (Sorry, I don't know. I'm not a lawyer, so my solution is usually to avoid GPL code all together). Jeff
Re: [Bug-wget] Overly permissive hostname matching
On 20/03/14 23:16, Jeffrey Walton wrote: I can't prevent you of course and the decision is yours to make, but I'd prefer a BSD style license as then I could really consider basing future enhancements of curl on this effort. Does GNU have a permissive license? I know permissive does not meet all of Dr. Stallman's goals, but it will allow GNU more intellectual property in the arena. Jeff The LGPL would be an option. I don't see why is intellectual property related. The license and the copyright owner seem quite orthogonal to me.
Re: [Bug-wget] Overly permissive hostname matching
On 20/03/14 22:52, Tim Rühsen wrote: I broke out the public suffix code together and created a first go (really very quick, distcheck fails - couldn't figure out this evening). https://github.com/rockdaboot/libpsl The first step was a psl_is_tld() function. There is a test case for some major things (wildcards, exceptions). So, your public api seems to be this: psl_ctx_t *psl_load_file(const char *fname); void psl_free(psl_ctx_t **psl) int psl_is_tld(const psl_ctx_t *psl, const char *domain) Fisrt, I wouldn't call the function is_tld(), not just because tlds simply won't have any dot inside, (just extract the last label in a DNS name)* since there are more ambiguous cases. I would name it is_public(), defining it as “one domain under which anyone* can register a subdomain”. Additionally, I think there should be a function to extract the public suffix from a given domain. Both functions should take a flags argument. The immediate use I foresee is to choose whether private registries should be taken into account or not. (a private registry is a domain used for the public but not owned by a registry, dyndns.org and blogspot.com are examples of that) * "anyone" understood as a random person unaffiliated with the owner of the parent domain, notwithstanding any condition that such "anyone" is required to fulfill in order to register it (such as residing in a given region or having payed certain fees). PS: It's funny to see 1994 rfc1591 talking about TLDs and saying «It is extremely unlikely that any other TLDs will be created.»
Re: [Bug-wget] Overly permissive hostname matching
On 20/03/14 23:11, Daniel Stenberg wrote: You do realize that with a *GPL license on the thing, you won't get adopted by OpenSSL, curl and possibly others? I can't prevent you of course and the decision is yours to make, but I'd prefer a BSD style license as then I could really consider basing future enhancements of curl on this effort. FWIW, I have to agree, I would have opted for a MIT license for this if I were writing it. The FSF would probably strongly oppose, though :)
Re: [Bug-wget] Overly permissive hostname matching
On Thu, Mar 20, 2014 at 6:11 PM, Daniel Stenberg wrote: > On Thu, 20 Mar 2014, Tim Rühsen wrote: > >> I broke out the public suffix code together and created a first go (really >> very quick, distcheck fails - couldn't figure out this evening). >> >> https://github.com/rockdaboot/libpsl > > > Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do > this early. > > You do realize that with a *GPL license on the thing, you won't get adopted > by OpenSSL, curl and possibly others? +1 > I can't prevent you of course and the decision is yours to make, but I'd > prefer a BSD style license as then I could really consider basing future > enhancements of curl on this effort. Does GNU have a permissive license? I know permissive does not meet all of Dr. Stallman's goals, but it will allow GNU more intellectual property in the arena. Jeff
Re: [Bug-wget] Overly permissive hostname matching
On Thu, 20 Mar 2014, Tim Rühsen wrote: I broke out the public suffix code together and created a first go (really very quick, distcheck fails - couldn't figure out this evening). https://github.com/rockdaboot/libpsl Ok, I'll be the first to rain on the parade. Sorry but it seems fit to do this early. You do realize that with a *GPL license on the thing, you won't get adopted by OpenSSL, curl and possibly others? I can't prevent you of course and the decision is yours to make, but I'd prefer a BSD style license as then I could really consider basing future enhancements of curl on this effort. -- / daniel.haxx.se
Re: [Bug-wget] Overly permissive hostname matching
On Thu, Mar 20, 2014 at 5:52 PM, Tim Rühsen wrote: > Am Mittwoch, 19. März 2014, 10:59:05 schrieb Daniel Kahn Gillmor: >> I'm imagining a C library API that has a public suffix list context >> object that can do efficient lookups (however we define the lookups), >> and the library would bundle a pre-compiled context, based on the >> currently-known public suffix list. >> >> something like: >> >> --- >> struct psl_ctx; >> typedef struct psl_ctx * psl_ctx_t; >> const psl_ctx_t psl_builtin; >> >> psl_ctx_t psl_new_ctx_from_filename(const char* filename); >> psl_ctx_t psl_new_ctx_from_fd(int fd); >> void psl_free_ctx(psl_ctx_t ctx); >> >> /* >> query forms, very rough draft -- do we need both? >> need to consider memory allocation responsibilities and >> DNS internationalization/canonicalization issues >> */ >> >> const char* psl_get_public_suffix(const psl_ctx_t, const char* domain); >> const char* psl_get_registered_domain(const psl_ctx_t, const char* d); >> --- > > I broke out the public suffix code together and created a first go (really > very > quick, distcheck fails - couldn't figure out this evening). > > https://github.com/rockdaboot/libpsl > > The first step was a psl_is_tld() function. > There is a test case for some major things (wildcards, exceptions). > > I hope there will be some interest and some contributions... Yes, I'd be interested. Especially since Angel pointed out failures in my use of the PSL (the close-open failures are troubling to me). I had a sidebar with one of the OpenSSL devs because OpenSSL is cutting in hostname matching in version 1.0.2. He shared a link to a IETF working group on the subject: https://www.ietf.org/mailman/listinfo/dbound. Jeff
Re: [Bug-wget] Overly permissive hostname matching
Am Mittwoch, 19. März 2014, 10:59:05 schrieb Daniel Kahn Gillmor: > I'm imagining a C library API that has a public suffix list context > object that can do efficient lookups (however we define the lookups), > and the library would bundle a pre-compiled context, based on the > currently-known public suffix list. > > something like: > > --- > struct psl_ctx; > typedef struct psl_ctx * psl_ctx_t; > const psl_ctx_t psl_builtin; > > psl_ctx_t psl_new_ctx_from_filename(const char* filename); > psl_ctx_t psl_new_ctx_from_fd(int fd); > void psl_free_ctx(psl_ctx_t ctx); > > /* > query forms, very rough draft -- do we need both? > need to consider memory allocation responsibilities and > DNS internationalization/canonicalization issues > */ > > const char* psl_get_public_suffix(const psl_ctx_t, const char* domain); > const char* psl_get_registered_domain(const psl_ctx_t, const char* d); > --- I broke out the public suffix code together and created a first go (really very quick, distcheck fails - couldn't figure out this evening). https://github.com/rockdaboot/libpsl The first step was a psl_is_tld() function. There is a test case for some major things (wildcards, exceptions). I hope there will be some interest and some contributions... Tim signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] Overly permissive hostname matching
On Wed, Mar 19, 2014 at 3:03 PM, Ángel González wrote: > On 19/03/14 16:37, Jeffrey Walton wrote: >> >> ... > Also note that by removing the "*." from the beginning of the lines*, you > are acepting more hosts than > you should, such as a certificate for *.com.bd (represented as *.bd in the > PSL) which should have been > rejected. Oh, that's bad. I'll have to check that. Thanks. Jeff
Re: [Bug-wget] Overly permissive hostname matching
On 19/03/14 16:37, Jeffrey Walton wrote: On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg wrote: On Wed, 19 Mar 2014, Jeffrey Walton wrote: # Remove lines that begin with "!" That sounds wrong: A rule may begin with a "!" (exclamation mark). If it does, it is labelled as a "exception rule" and then treated as if the exclamation mark is not present. Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :) Anyway, I'll try to find the meaning of that bang. I seem to recall I could not find the meaning of it in the past. Jeff It excludes a hostname from a previous matching rule. See http://publicsuffix.org/list/#list-format Currently, there doesn't seem to be any exclusion with a wildcard, so right now all lines beginning with '!' are equivalent to "accept this hostname". Also note that by removing the "*." from the beginning of the lines*, you are acepting more hosts than you should, such as a certificate for *.com.bd (represented as *.bd in the PSL) which should have been rejected. * Your script comments are wrong btw, since you're not removing full lines.
Re: [Bug-wget] Overly permissive hostname matching
On 03/19/2014 11:55 AM, Jeffrey Walton wrote: > Also, be careful of where you are pulling the list from. I got burned > by pulling a list that was not being updated > (https://bugzilla.mozilla.org/show_bug.cgi?id=968064). i've been similarly burned before too, but i settled on the mxr address i just posted after trying a few other places. > The Mozilla folks state the canonical list is at > http://publicsuffix.org/list/effective_tld_names.dat. See Comment 11 > at https://bugzilla.mozilla.org/show_bug.cgi?id=968064#c11. i just followed up there to point out that the canonical location for the data needs to have some form of cryptographic integrity mechanism. thanks for pointing that out. --dkg signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] Overly permissive hostname matching
On Wed, Mar 19, 2014 at 11:45 AM, Jeffrey Walton wrote: > On Wed, Mar 19, 2014 at 11:37 AM, Jeffrey Walton wrote: >> On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg wrote: >>> On Wed, 19 Mar 2014, Jeffrey Walton wrote: >>> # Remove lines that begin with "!" >>> >>> >>> That sounds wrong: >>> >>> A rule may begin with a "!" (exclamation mark). If it does, it is labelled >>> as a "exception rule" and then treated as if the exclamation mark is not >>> present. >> Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :) >> >> Anyway, I'll try to find the meaning of that bang. I seem to recall I >> could not find the meaning of it in the past. > After reading that again, I don't mean to sound rude. Sorry about > that. Thanks for pointing it out. > > And it does bring up a good point: the data structure needs two thing: > (1) a name, and (2) a flag for white/black. White is white listed > while black is black listed. > > The API needs (at minimum): (1) take a name, and (2) return > white/black/no entry. Everything else is just frills. > Something else you may want in the API is a way to determine how, exactly, a name matched if its a failure. So it might be usefult to include a Generic Top Level Domain (gTLD), a Country Code Top Level Domain (ccTLD), or an Effective Top Level Domain from a PSL. I mention it because my code has some diagnostics in debug builds that logs the info. Jeff
Re: [Bug-wget] Overly permissive hostname matching
On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg wrote: > On Wed, 19 Mar 2014, Jeffrey Walton wrote: > >> # Remove lines that begin with "!" > > > That sounds wrong: > > A rule may begin with a "!" (exclamation mark). If it does, it is labelled > as a "exception rule" and then treated as if the exclamation mark is not > present. Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :) Anyway, I'll try to find the meaning of that bang. I seem to recall I could not find the meaning of it in the past. Jeff
Re: [Bug-wget] Overly permissive hostname matching
On Wed, Mar 19, 2014 at 11:38 AM, Daniel Kahn Gillmor wrote: > On 03/19/2014 11:26 AM, Jeffrey Walton wrote: > >> wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST > > I recommend using the following HTTPS URL instead, so that you have some > level of cryptographic verification of the data before loading it: > > https://hg.mozilla.org/mozilla-central/raw-file/tip/netwerk/dns/effective_tld_names.dat > > (this is what i use to update the debian publicsuffix package) Also, be careful of where you are pulling the list from. I got burned by pulling a list that was not being updated (https://bugzilla.mozilla.org/show_bug.cgi?id=968064). The Mozilla folks state the canonical list is at http://publicsuffix.org/list/effective_tld_names.dat. See Comment 11 at https://bugzilla.mozilla.org/show_bug.cgi?id=968064#c11. Jeff
Re: [Bug-wget] Overly permissive hostname matching
On Wed, Mar 19, 2014 at 11:37 AM, Jeffrey Walton wrote: > On Wed, Mar 19, 2014 at 11:30 AM, Daniel Stenberg wrote: >> On Wed, 19 Mar 2014, Jeffrey Walton wrote: >> >>> # Remove lines that begin with "!" >> >> >> That sounds wrong: >> >> A rule may begin with a "!" (exclamation mark). If it does, it is labelled >> as a "exception rule" and then treated as if the exclamation mark is not >> present. > Oh well. I'm too aggressive on the ban. I'd rather fail closed than open :) > > Anyway, I'll try to find the meaning of that bang. I seem to recall I > could not find the meaning of it in the past. After reading that again, I don't mean to sound rude. Sorry about that. Thanks for pointing it out. And it does bring up a good point: the data structure needs two thing: (1) a name, and (2) a flag for white/black. White is white listed while black is black listed. The API needs (at minimum): (1) take a name, and (2) return white/black/no entry. Everything else is just frills. Jeff
Re: [Bug-wget] Overly permissive hostname matching
On Wed, Mar 19, 2014 at 11:42 AM, Jeffrey Walton wrote: > On Wed, Mar 19, 2014 at 11:38 AM, Daniel Kahn Gillmor > wrote: >> On 03/19/2014 11:26 AM, Jeffrey Walton wrote: >> >>> wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST >> >> I recommend using the following HTTPS URL instead, so that you have some >> level of cryptographic verification of the data before loading it: >> >> https://hg.mozilla.org/mozilla-central/raw-file/tip/netwerk/dns/effective_tld_names.dat >> >> (this is what i use to update the debian publicsuffix package) > Ah, good point. I did not even notice that. OK, here's the reason for that: $ ./eff_tld_names.sh --2014-03-19 12:46:20-- https://publicsuffix.org/list/effective_tld_names.dat Resolving publicsuffix.org (publicsuffix.org)... 63.245.217.181 Connecting to publicsuffix.org (publicsuffix.org)|63.245.217.181|:443... failed: Connection refused. (It sucks getting old. I should have remembered that there's no HTTPS access t that list). Jeff
Re: [Bug-wget] Overly permissive hostname matching
On Wed, Mar 19, 2014 at 11:38 AM, Daniel Kahn Gillmor wrote: > On 03/19/2014 11:26 AM, Jeffrey Walton wrote: > >> wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST > > I recommend using the following HTTPS URL instead, so that you have some > level of cryptographic verification of the data before loading it: > > https://hg.mozilla.org/mozilla-central/raw-file/tip/netwerk/dns/effective_tld_names.dat > > (this is what i use to update the debian publicsuffix package) Ah, good point. I did not even notice that. Jeff
Re: [Bug-wget] Overly permissive hostname matching
On 03/19/2014 11:26 AM, Jeffrey Walton wrote: > wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST I recommend using the following HTTPS URL instead, so that you have some level of cryptographic verification of the data before loading it: https://hg.mozilla.org/mozilla-central/raw-file/tip/netwerk/dns/effective_tld_names.dat (this is what i use to update the debian publicsuffix package) --dkg signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] Overly permissive hostname matching
On Wed, 19 Mar 2014, Jeffrey Walton wrote: # Remove lines that begin with "!" That sounds wrong: A rule may begin with a "!" (exclamation mark). If it does, it is labelled as a "exception rule" and then treated as if the exclamation mark is not present. -- / daniel.haxx.se
Re: [Bug-wget] Overly permissive hostname matching
On Wed, Mar 19, 2014 at 10:59 AM, Daniel Kahn Gillmor wrote: > On 03/19/2014 06:19 AM, Tim Ruehsen wrote: >> As a programmer, I want to have control. E.g. the option to load from a >> different file, or to switch off loading. Why ? e.g. for testing purposes, or >> simply imagine a "swiss army knife" client for experts - maybe they want to >> have control via CLI args. Or you are in a controlled environment and simply >> don't want to waste CPU cycles when downloading a single file from a trusted >> server. Just some examples. >> And than, clients like Wget would like to have access, at least for checking >> cookies. > > i understand, and i think we're probably not disagreeing -- you want the > ability to control it; i want sane defaults so that people who don't > touch it get sensible behavior. > >> I just took a quick look but I am not sure about the API (i did not have this >> 'aha' effect). But what I don't like is the dependency on PHP which is used >> to >> 'compile' the PSL before the C functions can use it. While the idea of >> compilation/preprocessing is a good one, it should at least be optional. > > pre-compilation/preprocessing is probably a reasonable performance > optimization for heavy use; we might even want a C library to embed a > precompiled version of the most recent known list at time of > compilation, so that it can be used with no initialization step or when > no file is available. This may help with seeding thoughts for an implementation. I'm fortunate because I work in C++. I have a 'precooked' list with, "com", "mil", ... "ak.us, "co.uk", etc. One entry for each line. There can be multiple dots. For example, "sekikawa.niigata.jp". I read the list into a vector, sort it in n*log(n), and then get log(n) lookups for the lifetime of the program. I pay the cost of the sort because I make frequent lookups. When I match names with wild cards, I take a DNS name like *.example.com. I change it to example.com, and see if its banned. Its a simple algorithm but its effective. I embed the list in my executable with GNU's assembler (*.S file). Its essentially a string with both a length and a NULL terminator: ;; eff_tld_list.S .section .rodata ;; Mozilla's Effective TLD list .global eff_tld_list .type eff_tld_list, @object .align 8 eff_tld_list: eff_tld_list_start: .incbin "res/eff_tld_list.lst" eff_tld_list_end: .byte 0 ;; The string's size (if needed) .global eff_tld_list_size .type eff_tld_list_size, @object .align 4 eff_tld_list_size: .inteff_tld_list_end - eff_tld_list_start Below is the script I use to fetch Mozilla's list. Jeff ** #! /bin/bash MOZILLA_LIST=MOZILLA_LIST=eff_tld_list.lst wget "http://publicsuffix.org/list/effective_tld_names.dat"; -O $MOZILLA_LIST # Remove comments sed "/^\/\//d" $MOZILLA_LIST > temp-1.txt mv temp-1.txt $MOZILLA_LIST # Remove empty lines sed "/^$/d" $MOZILLA_LIST > temp-2.txt mv temp-2.txt $MOZILLA_LIST # Remove lines that begin with "!" sed "s/^!//g" $MOZILLA_LIST > temp-3.txt mv temp-3.txt $MOZILLA_LIST # Remove lines that begin with "*." sed "s/^\*\.//g" $MOZILLA_LIST > temp-4.txt mv temp-4.txt $MOZILLA_LIST # Pre-sort it cat $MOZILLA_LIST | sort > temp-8.txt mv temp-8.txt $MOZILLA_LIST # Copy it to resources cp $MOZILLA_LIST ../res
Re: [Bug-wget] Overly permissive hostname matching
On 03/19/2014 06:19 AM, Tim Ruehsen wrote: > As a programmer, I want to have control. E.g. the option to load from a > different file, or to switch off loading. Why ? e.g. for testing purposes, or > simply imagine a "swiss army knife" client for experts - maybe they want to > have control via CLI args. Or you are in a controlled environment and simply > don't want to waste CPU cycles when downloading a single file from a trusted > server. Just some examples. > And than, clients like Wget would like to have access, at least for checking > cookies. i understand, and i think we're probably not disagreeing -- you want the ability to control it; i want sane defaults so that people who don't touch it get sensible behavior. > I just took a quick look but I am not sure about the API (i did not have this > 'aha' effect). But what I don't like is the dependency on PHP which is used > to > 'compile' the PSL before the C functions can use it. While the idea of > compilation/preprocessing is a good one, it should at least be optional. pre-compilation/preprocessing is probably a reasonable performance optimization for heavy use; we might even want a C library to embed a precompiled version of the most recent known list at time of compilation, so that it can be used with no initialization step or when no file is available. I don't think depending on php for the pre-compilation step is a problem; that's just an additional build-dependency, same as (for example) bison or cmake or python for other C projects. (though i confess i'd rather work with pretty much any language other than PHP in general) I agree that we probably want the library to support the generic case of reading the PSL from a file, though. I'm imagining a C library API that has a public suffix list context object that can do efficient lookups (however we define the lookups), and the library would bundle a pre-compiled context, based on the currently-known public suffix list. something like: --- struct psl_ctx; typedef struct psl_ctx * psl_ctx_t; const psl_ctx_t psl_builtin; psl_ctx_t psl_new_ctx_from_filename(const char* filename); psl_ctx_t psl_new_ctx_from_fd(int fd); void psl_free_ctx(psl_ctx_t ctx); /* query forms, very rough draft -- do we need both? need to consider memory allocation responsibilities and DNS internationalization/canonicalization issues */ const char* psl_get_public_suffix(const psl_ctx_t, const char* domain); const char* psl_get_registered_domain(const psl_ctx_t, const char* d); --- > "the folks" it's me ;-) Hi "the folks" :) (and thanks for your work on mget!) > I already thought of splitting libmget into several smaller libraries, like > libmget-common, libmget-cookies, libmget-psl ... whatever is needed. > > What exactly do you think of ? What can I do to make Debian packaging easy ? hm, it looks like libmget isn't in debian at all right now. I'm swamped with packaging work, and i'm not prepared to review something as full-featured as libmget itself. if you could break out the publicsuffix code so that it was a distinct project from mget, but provided the API that met the needs of libmget-cookies, that would be the simplest thing for me to review and package; we could run any proposed API by Nikos to make sure it meets the needs of GnuTLS as well, if you think it's a good idea to push this verification into the TLS stack itself. thanks for thinking about this, --dkg signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] Overly permissive hostname matching
On Wed, 19 Mar 2014, Daniel Kahn Gillmor wrote: It insists on at least two dots. So yes, "*.apple" will cause problems for us too. There are also errors in the opposite direction: it sounds like curl will accept a cert for *.co.uk, right? Exactly, due to the lack of public suffix awareness! :-( -- / daniel.haxx.se
Re: [Bug-wget] Overly permissive hostname matching
On 03/19/2014 10:38 AM, Daniel Stenberg wrote: > On Tue, 18 Mar 2014, Ángel González wrote: > >> Daniel, how does cURL check correctness of the certificate hostname >> suffix? > > It insists on at least two dots. So yes, "*.apple" will cause problems > for us too. There are also errors in the opposite direction: it sounds like curl will accept a cert for *.co.uk, right? > I view the public suffix list as one of the worst kludges in networking > history and while I understand why it is necessary, it is next to > impossible to actually use sensibly in lots of environments. I agree that the PSL is a horrible kludge; i'm not sure what other solutions are possible though, until the DNS gets some way to specify public registries itself (e.g. the DBOUND discussion going on in the IETF). In the meantime, we need to figure something out, though :/ --dkg signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] Overly permissive hostname matching
On Tue, 18 Mar 2014, Ángel González wrote: Daniel, how does cURL check correctness of the certificate hostname suffix? It insists on at least two dots. So yes, "*.apple" will cause problems for us too. I view the public suffix list as one of the worst kludges in networking history and while I understand why it is necessary, it is next to impossible to actually use sensibly in lots of environments. -- / daniel.haxx.se
Re: [Bug-wget] Overly permissive hostname matching
On Tuesday 18 March 2014 20:05:07 Daniel Kahn Gillmor wrote: > On 03/18/2014 05:31 PM, Tim Rühsen wrote: > > IHMO, the Public Suffix List (PSL) should not only be used to verify > > cookies but also be used for certificate hostname checking. > > > > Libraries as GnuTLS should offer an API for this kind of checking, best > > would be having the PSL as a separate file, maintained by the > > distribution maintainers (or the user, if he wants to to it). The SSL > > library should load/unload the PSL under the applications control. > > that sounds really fiddly to me -- you want the application to know why > the TLS stack needs to know about the public suffix list, and to be able > to control it appropriately? > > I think we need good sensible defaults, and a locally-cached, > frequently-updated copy of the public suffix list; then if we really > really want the application to be able to control the use of an > alternate suffix list we can provide an API for that, but i can't > imagine we'd want to require the application to specify anything (even > asking the application to load the default local PSL seems like too much > to expect from most apps that just want "to layer in some TLS"). As a programmer, I want to have control. E.g. the option to load from a different file, or to switch off loading. Why ? e.g. for testing purposes, or simply imagine a "swiss army knife" client for experts - maybe they want to have control via CLI args. Or you are in a controlled environment and simply don't want to waste CPU cycles when downloading a single file from a trusted server. Just some examples. And than, clients like Wget would like to have access, at least for checking cookies. > > Maybe it would be a good idea to provide a separate PSL library that could > > be used by SSL libraries for hostname checking and HTTP(S) clients for > > cookie verification. > > I maintain publicsuffix in debian, and i try to help on the gnutls side > of things too (both upstream and a little bit of kibbitzing about the > debian packaging). > > debian has php, python, perl, and haskell bindings for the public suffix > list, but i don't think anyone has packaged a C library for it. > > I've got discussion in my mailbox that i haven't processed in ages with > Florian Sager about packaging regdom-libs [0], though, and the library > looks like it's been revived a bit since i gave up on it last [1]. Do > you think this C interface would be a useful one or would you expect a > different API? I just took a quick look but I am not sure about the API (i did not have this 'aha' effect). But what I don't like is the dependency on PHP which is used to 'compile' the PSL before the C functions can use it. While the idea of compilation/preprocessing is a good one, it should at least be optional. > [0] http://www.dkim-reputation.org/regdom-libs/ > [1] https://bugs.debian.org/683881 > > > If of any interest, there is already some LGPLed code at > > > > https://github.com/rockdaboot/mget/blob/master/libmget/cookie.c > > > > There are also some unit test routines in the project. > > hm, do you know if the libmget folks are willing to break that code out > separately? linking to all of libmget doesn't sound like a good idea, > and it would be a shame to have to maintain separate copies of this > codebase. "the folks" it's me ;-) I already thought of splitting libmget into several smaller libraries, like libmget-common, libmget-cookies, libmget-psl ... whatever is needed. What exactly do you think of ? What can I do to make Debian packaging easy ? Tim
Re: [Bug-wget] Overly permissive hostname matching
On 03/18/2014 05:31 PM, Tim Rühsen wrote: > $ wget -d --ca-certificate=ca-rsa-cert.pem --private-key=ca-rsa-key-plain.pem > https://example.com:8443 > 2014-03-18 21:48:04 (1.88 GB/s) - Read error at byte 5116 (The TLS connection > was non-properly terminated.).Retrying. > > There seems to be a problem in Wget 1.15 (on Debian SID)... hm, i'll try to take a look at this. > But despite from that, Wget uses the hostname checking facility of the GnuTLS > library (or of OpenSSL library if appropriately compiled). And I saw you > already addressed bug-gnutls, which seems the right way to go. > > IHMO, the Public Suffix List (PSL) should not only be used to verify cookies > but > also be used for certificate hostname checking. > > Libraries as GnuTLS should offer an API for this kind of checking, best would > be having the PSL as a separate file, maintained by the distribution > maintainers (or the user, if he wants to to it). The SSL library should > load/unload the PSL under the applications control. that sounds really fiddly to me -- you want the application to know why the TLS stack needs to know about the public suffix list, and to be able to control it appropriately? I think we need good sensible defaults, and a locally-cached, frequently-updated copy of the public suffix list; then if we really really want the application to be able to control the use of an alternate suffix list we can provide an API for that, but i can't imagine we'd want to require the application to specify anything (even asking the application to load the default local PSL seems like too much to expect from most apps that just want "to layer in some TLS"). > Maybe it would be a good idea to provide a separate PSL library that could be > used by SSL libraries for hostname checking and HTTP(S) clients for cookie > verification. I maintain publicsuffix in debian, and i try to help on the gnutls side of things too (both upstream and a little bit of kibbitzing about the debian packaging). debian has php, python, perl, and haskell bindings for the public suffix list, but i don't think anyone has packaged a C library for it. I've got discussion in my mailbox that i haven't processed in ages with Florian Sager about packaging regdom-libs [0], though, and the library looks like it's been revived a bit since i gave up on it last [1]. Do you think this C interface would be a useful one or would you expect a different API? [0] http://www.dkim-reputation.org/regdom-libs/ [1] https://bugs.debian.org/683881 > If of any interest, there is already some LGPLed code at > https://github.com/rockdaboot/mget/blob/master/libmget/cookie.c > There are also some unit test routines in the project. hm, do you know if the libmget folks are willing to break that code out separately? linking to all of libmget doesn't sound like a good idea, and it would be a shame to have to maintain separate copies of this codebase. --dkg signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] Overly permissive hostname matching
Hi Tim, On Tue, Mar 18, 2014 at 5:31 PM, Tim Rühsen wrote: > ... > BTW, to reproduce the issue I used a GnuTLS compiled/linked version of Wget: > > $ wget -d --ca-certificate=ca-rsa-cert.pem --private-key=ca-rsa-key-plain.pem > https://example.com:8443 > 2014-03-18 21:48:04 (1.88 GB/s) - Read error at byte 5116 (The TLS connection > was non-properly terminated.).Retrying. > > There seems to be a problem in Wget 1.15 (on Debian SID)... Confirmed on wheezy. I thought it was my OpenSSL server. > But despite from that, Wget uses the hostname checking facility of the GnuTLS > library (or of OpenSSL library if appropriately compiled). OpenSSL won't have hostname checking until 1.0.2. See the CHANGELOG at https://www.openssl.org/news/changelog.html. (Mentioned in case you thought wget was performing it via OpenSSL). > IHMO, the Public Suffix List (PSL) should not only be used to verify cookies > but > also be used for certificate hostname checking. +1 Jeff
Re: [Bug-wget] Overly permissive hostname matching
Hi Jeffrey, thanks for pointing this out. BTW, to reproduce the issue I used a GnuTLS compiled/linked version of Wget: $ wget -d --ca-certificate=ca-rsa-cert.pem --private-key=ca-rsa-key-plain.pem https://example.com:8443 2014-03-18 21:48:04 (1.88 GB/s) - Read error at byte 5116 (The TLS connection was non-properly terminated.).Retrying. There seems to be a problem in Wget 1.15 (on Debian SID)... But despite from that, Wget uses the hostname checking facility of the GnuTLS library (or of OpenSSL library if appropriately compiled). And I saw you already addressed bug-gnutls, which seems the right way to go. IHMO, the Public Suffix List (PSL) should not only be used to verify cookies but also be used for certificate hostname checking. Libraries as GnuTLS should offer an API for this kind of checking, best would be having the PSL as a separate file, maintained by the distribution maintainers (or the user, if he wants to to it). The SSL library should load/unload the PSL under the applications control. Maybe it would be a good idea to provide a separate PSL library that could be used by SSL libraries for hostname checking and HTTP(S) clients for cookie verification. If of any interest, there is already some LGPLed code at https://github.com/rockdaboot/mget/blob/master/libmget/cookie.c There are also some unit test routines in the project. Regards, Tim signature.asc Description: This is a digitally signed message part.
Re: [Bug-wget] Overly permissive hostname matching
I don't think wget should be checking correct hostname scope of the certificate. I mean, it'd be ok to have some general rule as "noone can use a certificate for *.whatever or *." [1] but embedding the Public Suffix List seems overkill. And the implementation should probably be performed at openssl/gnutls level. If an attacker was able to get a CA-signed certificate for *.com (even though browsers reject that), he is very likely to have also been able to create a certificate for the domain you are browsing or directly a sub-CA. Daniel, how does cURL check correctness of the certificate hostname suffix? 1- And even them, we might end up with a new TLD (eg. *.apple ) where turns out to be correct.
Re: [Bug-wget] Overly permissive hostname matching
On Tue, 18 Mar 2014, Darshit Shah wrote: I'll try and set up a test case as soon as I can using the materials provided by you. It would be even more helpful if someone could pitch in with more help since: 1. This is not my domain and I don't understand it much. 2. I'm keeping really busy with my real life work and GSoC right now. While in this area, you may want to fix a few other problems with the wget pattern match function that I believe exist as well: 1 - it allows wildcard matches IP-addresses against the CN field ("*.168.0.1") 2 - it allows multiple '*' in the pattern 3 - it allows the '*' to be elsewhere than first in the a wildcard See rfc6125 section 6.4.3 and 7.2 for helpful hints on the two latter details. -- / daniel.haxx.se
Re: [Bug-wget] Overly permissive hostname matching
Hi Jeffrey-- On 03/18/2014 01:43 AM, Jeffrey Walton wrote: > I believe wget has a security flaw in its certificate hostname matching code. > > In the attached server certificate, the hostname is provided via a > Subject Alt Name (SAN). The only SAN entry is a DNS name for "*.com". > Also attached is the default CA, which was used to sign the server's > certificate. thanks for raising this concern. Have you tested this certificate and CA with other HTTPS clients (like browsers?) Section 11.1.3 of the CA/Browser Forum's baseline requirements for CAs are that compliant CAs MUST NOT issue wildcard certs for an entire registry-controlled zone or public suffix "unless the applicant proves its rightful control of the entire Domain Namespace": https://cabforum.org/wp-content/uploads/Baseline_Requirements_V1_1_6.pdf So arguably, it is the responsibility of the CA, not the responsibility of the relying party, to determine what certs are legitimate. Put another way: should every TLS client library embed the public suffix list? how often should they update it? What if a certificate is issued by a trusted CA that *does* match part of the public suffix list (perhaps because the CA has determined tha tthe application has rightful control over the entire zone)? --dkg signature.asc Description: OpenPGP digital signature
Re: [Bug-wget] Overly permissive hostname matching
Hi Jeffrey, Thanks for pointing this out! I am no expert in security or SSL for that matter. However, this does seem like a huge security flaw. I'll try and set up a test case as soon as I can using the materials provided by you. It would be even more helpful if someone could pitch in with more help since: 1. This is not my domain and I don't understand it much. 2. I'm keeping really busy with my real life work and GSoC right now. The new test suite can implement a HTTPS Server, so it shouldn't be too difficult to set this up. On Tue, Mar 18, 2014 at 6:43 AM, Jeffrey Walton wrote: > I believe wget has a security flaw in its certificate hostname matching code. > > In the attached server certificate, the hostname is provided via a > Subject Alt Name (SAN). The only SAN entry is a DNS name for "*.com". > Also attached is the default CA, which was used to sign the server's > certificate. > > Effectively, wget accepts a single certificate for the gTLD of .COM. > That's probably bad. If a CA is compromised, then the compromised CA > could issue a "super certificate" and cover the entire top level > domain space. > > I suspect wget also accepts certificates for .COM's friends, like > .NET, .ORG, .MIL, etc. > > Its probably not limited to gTLDs. Mozilla maintains a list of > effective TLDs at https://wiki.mozilla.org/Public_Suffix_List. The > 1600+ effective TLDs are probably accepted, too. > > Attached are the certificates, keys, and commands to set up a test rig > with OpenSSL's s_server. The certificates are issued for example.com, > and require a modification to /etc/hosts to make things work as > (un)expected. > > Jeffrey Walton > Baltimore, MD, US -- Thanking You, Darshit Shah