On 03/21/2014 04:54 PM, Ángel González wrote: > On 21/03/14 21:13, Daniel Kahn Gillmor wrote: >> i've just pushed some cleanup suggestions here: >> >> https://github.com/rockdaboot/libpsl/pull/1 >> >> i see you've pulled them already, thanks! >> >> i've got three more conceptual issues which warrant discussion, rather >> than a patch, though. If there's a better place to have this discussion >> than this mailing list, i'm happy to move to it, please let me know >> where. >> >> psl_is_tld() semantics >> ---------------------- >> >> the way i see it, we know what it means for psl_is_tld() to return >> "true" -- but "false" could mean either: >> >> (A) "this zone is subordinate to a TLD" (as example.com is to com) >> or >> (B) "this zone is superior to a TLD" (as uk is to co.uk). Note that >> "uk" is not a public suffix. > Hmm, actually uk is a public suffix, since not matching anything > explictely in > the list, it will be caught by the implicit last-resource rule '*'. > > Also, what would you do with a domain such as his.name? > It is both inferior to a public suffix (.name) and superior > (forgot.his.name).
hm, the same problem is present for amazonaws.com; it is superior to s3.amazonaws.com (and 32 other public suffixes), and subordinate to .com > I think it should have a different return code, though. can you propose a specific API? the devil is in the details. >> https://www.gnu.org/software/libidn/ > I would expect the input in punycode and optionally in utf-8. This means > a preprocessing step from the original list is needed. This implies that people wouldn't be able to use effective_tld_names.dat as distributed, right? I can see this working for OS-level distributions (I can preprocess effective_tld_names.dat when distributing it in publicsuffix for debian), but for regular users it sounds terrible. > If we are handed a i18n domain, punycode them with libidn if we are > linked to it, else return an error. How do you propose we determine that we're handed an i18n domain if we're not linked to libidn? just check for any byte other than printable ascii? should we do the same thing for psl_load_file() ? If we implement somthing like psl_get_private_zone(), what form should the returned name be? > It is disgusting to do a roundtrip utf-8 -> punycode -> utf-8 for > extracting the base domain, though. that does sound ugly. >> malformed inputs >> ---------------- >> >> What should the library do with malformed inputs? i'm thinking about >> super-long strings, strings starting with more than one dot, or with >> multiple dots adjacent to each other, strings that don't match whatever >> encoding we're expecting users to send, etc. > > Return an error. I'm asking what API we think is reasonable for handling errors here. Do we need to distinguish between the malformed input error and the kind of error we might get by calling psl_get_private_zone("uk")? what makes sense for callers? --dkg
signature.asc
Description: OpenPGP digital signature