PyPI already endures case insensitive uniqueness and considers - and _ the same for uniqueness checks
On May 15, 2013, at 12:08 PM, James Carpenter <[email protected]> wrote: > While your at it, you might consider not allowing variation in case and dash > vs. underscore when specifying a dependency. A project should have only one > concrete name, without fuzziness. A fuzzy match should result in a match > failure. Fuzzy matches for a manual search is a different thing. > > > On Wed, May 15, 2013 at 9:31 AM, Daniel Holth <[email protected]> wrote: >> How to avoid confusables. >> >> These scripts are recommended for use in identifiers: >> http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts >> >> This report details a confusables detection algorithm: >> http://www.unicode.org/reports/tr39/#Confusable_Detection >> >> And ICU implements it: >> http://www.icu-project.org/apiref/icu4c/uspoof_8h.html (see also >> PyICU). >> >> The package index would enforce uniqueness of the "skeleton" of each >> registered package which is just an internal normalization based on >> confusability. if skeleton(identifier1) == skeleton(identifier2) then >> id1 and id2 are confusable. >> >> The tooling could get away with a simpler rule like >> re.sub("[^\w\d.]+", "_", distribution, re.UNICODE) >> >> As a bonus to including the world, this should be able to prevent >> people from exchanging zeroes for capital O. >> >> On Wed, May 15, 2013 at 7:17 AM, Eric V. Smith <[email protected]> wrote: >> > On 05/15/2013 07:10 AM, Donald Stufft wrote: >> >>>>> Anyone want to run a scan over the PyPI package set to see >> >>>>> how many packages would cause problems for a "[a-zA-Z0-9_.-]" >> >>>>> only filter? >> >>>> >> >>>> See my previous email where I did queries against my local DB. >> >>>> It's 225 total projects that wouldn't be allowed. >> >>> >> >>> Can you send the list of those projects? >> >>> >> >>> Eric. >> >>> >> >> >> >> Here you go https://gist.github.com/dstufft/5583225 used a Python >> >> oneliner and the PyPI API so others can reproduce easily if they >> >> wish. >> > >> > Perfect. Thanks. >> > >> > It looks like space causes most of the issues. I'm not sure how >> > "Twisted Flow >= 1.0" would be expected to parse. >> > >> > Eric. >> > >> > >> > _______________________________________________ >> > Distutils-SIG maillist - [email protected] >> > http://mail.python.org/mailman/listinfo/distutils-sig >> _______________________________________________ >> Distutils-SIG maillist - [email protected] >> http://mail.python.org/mailman/listinfo/distutils-sig > > _______________________________________________ > Distutils-SIG maillist - [email protected] > http://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/distutils-sig
