[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-09-18 Thread Greg Price


Greg Price  added the comment:

Thanks Benjamin for reviewing and merging this series!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-09-12 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-09-12 Thread Benjamin Peterson


Change by Benjamin Peterson :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-09-12 Thread Benjamin Peterson


Benjamin Peterson  added the comment:


New changeset a65678c5c90002c5e40fa82746de07e6217df625 by Benjamin Peterson 
(Greg Price) in branch 'master':
bpo-37760: Convert from length-18 lists to a dataclass, in makeunicodedata. 
(GH-15265)
https://github.com/python/cpython/commit/a65678c5c90002c5e40fa82746de07e6217df625


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-20 Thread Greg Price


Greg Price  added the comment:

(A bit easy to miss in the way this thread gets displayed, so to highlight in a 
comment: GH-15265 is up, following the 5 other patches which have now all been 
merged.  That's the one that replaces the length-18 tuples with a dataclass.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-14 Thread Benjamin Peterson


Benjamin Peterson  added the comment:

On Wed, Aug 14, 2019, at 03:25, STINNER Victor wrote:
> 
> STINNER Victor  added the comment:
> 
> > From my perspective, the main problem with using type annotations is that 
> > there's nothing checking them in CI.
> 
> Even if unchecked, type annotations can serve as builtin documentation, 
> as docstrings (even when docstrings are not checked ;-)).

Yes, but no one has the expectation that docstrings are automatically verified 
in any way. :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-14 Thread Benjamin Peterson


Benjamin Peterson  added the comment:


New changeset 3e4498d35c34aeaf4a9c3d57509b0d3277048ac6 by Benjamin Peterson 
(Greg Price) in branch 'master':
bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128)
https://github.com/python/cpython/commit/3e4498d35c34aeaf4a9c3d57509b0d3277048ac6


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-14 Thread STINNER Victor


STINNER Victor  added the comment:

> From my perspective, the main problem with using type annotations is that 
> there's nothing checking them in CI.

Even if unchecked, type annotations can serve as builtin documentation, as 
docstrings (even when docstrings are not checked ;-)).

Well, I don't have a strong opinion on type annotations in the Python stdlib. 
But here we are talking about a tool which is not installed, so I don't think 
that it matters much. It's not really part of the "stdlib".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Greg Price


Change by Greg Price :


--
pull_requests: +14985
pull_request: https://github.com/python/cpython/pull/15265

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Greg Price


Greg Price  added the comment:

> From my perspective, the main problem with using type annotations is that 
> there's nothing checking them in CI.

Yeah, fair concern. In fact I think I'm on video (from PyCon 2018) warning 
everyone not to do that in their codebases, because what you really don't want 
is a bunch of annotations that have gradually accumulated falsehoods as the 
code has changed around them.

Still, I think from "some annotations + no checking" the good equilibrium to 
land in "some annotations + checking", not "no annotations + no checking". (I 
do mean "some" -- I don't predict we'll ever go sweep all over adding them.) 
And I think the highest-probability way to get there is to let them continue to 
accumulate where people occasionally add them in new/revised code... because 
that holds a door open for someone to step up to start checking them, and then 
to do the work to make that part of CI. (That someone might even be me! But I 
can think of plenty of other likely folks to do it.)

If we accumulated quite a lot of them and nobody had yet stepped up to make 
checking happen, I'd worry.  But with the smattering we currently have, I think 
that point is far off.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Benjamin Peterson


Benjamin Peterson  added the comment:


New changeset c03e698c344dfc557555b6b07a3ee2702e45f6ee by Benjamin Peterson 
(Greg Price) in branch 'master':
bpo-37760: Factor out standard range-expanding logic in makeunicodedata. 
(GH-15248)
https://github.com/python/cpython/commit/c03e698c344dfc557555b6b07a3ee2702e45f6ee


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Benjamin Peterson


Benjamin Peterson  added the comment:

>From my perspective, the main problem with using type annotations is that 
>there's nothing checking them in CI.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Greg Price


Greg Price  added the comment:

> This is good. But the title mentioned dataclasses, and they are 3.7+.

Ahh, sorry, I think now I understand you. :-)

Indeed, when I switch to the branch with that change 
(https://github.com/gnprice/cpython/commit/2b4aec4dd -- it comes after the 
patch that's GH-15248, so I haven't yet sent it as a PR), then `python3.6 
Tools/unicode/makeunicodedata.py` no longer works.

I think this is fine. Most of all that's because this always works:

./python Tools/unicode/makeunicodedata.py

Anyone who's going to be running that script will want to build a `./python` 
right afterward, in order to at least run the tests. So it doesn't seem like 
much trouble to do the build first and then run the script (and then a quick 
rebuild for the handful of changed files), if indeed the person doesn't already 
have a `./python` lying around.

In fact `./python` is exactly what I used most of the time to run this script 
when I was developing these changes, simply because it seemed like the natural 
thing to do.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

> I just checked and `python3.6 Tools/unicode/makeunicodedata.py` works fine, 
> both at master and with GH-15248.

This is good. But the title mentioned dataclasses, and they are 3.7+.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Greg Price


Greg Price  added the comment:

> What is the minimal Python version for developing CPython? The system Python 
> 3 on current Ubuntu LTS (18.04) is 3.6, so I think it should not be larger.

Ah, I think my previous message had an ambiguous parse: the earliest that 
*uses* of the typing module appeared in the stdlib was 3.7. The typing module 
has been around longer than that.

I just checked and `python3.6 Tools/unicode/makeunicodedata.py` works fine, 
both at master and with GH-15248.

I think it would be OK for doing development on CPython to require the latest 
minor version (i.e. 3.7) -- after all, if you're doing development, you're 
already building it, so you can always get a newer version than your system 
provides if needed.  But happily the question is moot here, so I guess the 
place to discuss that further would be a new thread.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

What is the minimal Python version for developing CPython? The system Python 3 
on current Ubuntu LTS (18.04) is 3.6, so I think it should not be larger.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Greg Price


Greg Price  added the comment:

> BTW: Since when do we use type annotations in Python's stdlib ?

Hmm, interesting question!

At a quick grep, it's in a handful of places in the stdlib: asyncio, functools, 
importlib. The earliest it appeared was in 3.7.0a4.

It's in more places in the test suite, which I think is a closer parallel to 
this maintainer script in Tools/.

The typing module itself is in the stdlib, so I don't see any obstacle to using 
it more widely. I imagine the main reason it doesn't appear more widely already 
is simply that it's new, and most of the stdlib is quite stable.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Greg Price


Greg Price  added the comment:

> I like to run pyflakes time to time on the Python code base. Please avoid 
> "import *" since it prevents pyflakes (and other code analyzers) to find bugs.

Ah fair enough, thanks!

Pushed that change to the next/current PR, GH-15248.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

BTW: Since when do we use type annotations in Python's stdlib ?

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread STINNER Victor


STINNER Victor  added the comment:

"from typing import *"

I like to run pyflakes time to time on the Python code base. Please avoid 
"import *" since it prevents pyflakes (and other code analyzers) to find bugs. 
Example:

$ pyflakes Tools/unicode/makeunicodedata.py
Tools/unicode/makeunicodedata.py:35: 'from typing import *' used; unable to 
detect undefined names
Tools/unicode/makeunicodedata.py:921: 'Iterator' may be undefined, or defined 
from star imports: typing
Tools/unicode/makeunicodedata.py:921: 'List' may be undefined, or defined from 
star imports: typing
Tools/unicode/makeunicodedata.py:921: 'Iterator' may be undefined, or defined 
from star imports: typing
...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread STINNER Victor


STINNER Victor  added the comment:

$ find -name "*db.h"
./Modules/unicodedata_db.h
./Modules/unicodename_db.h
./Objects/unicodetype_db.h

For .gitattributes, I would prefer to use unicode*_db.h pattern rather than "a 
generic" *_db.h, to avoid masking other potential file called "something_db.h" 
in the future in GitHub. But it's not really an issue right now ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Greg Price


Change by Greg Price :


--
pull_requests: +14969
pull_request: https://github.com/python/cpython/pull/15248

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread miss-islington


miss-islington  added the comment:


New changeset b02e148a0d6e7a11df93a09ea5f4e1b0ad9b77b8 by Miss Islington (bot) 
in branch '3.8':
bpo-37760: Mark all generated Unicode data headers as generated. (GH-15171)
https://github.com/python/cpython/commit/b02e148a0d6e7a11df93a09ea5f4e1b0ad9b77b8


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread miss-islington


miss-islington  added the comment:


New changeset c2b9d9f202e4a99fc0800b7a0f0944ac4c2382e3 by Miss Islington (bot) 
in branch '3.8':
bpo-37760: Mark all generated Unicode data headers as generated. (GH-15171)
https://github.com/python/cpython/commit/c2b9d9f202e4a99fc0800b7a0f0944ac4c2382e3


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-13 Thread Benjamin Peterson


Benjamin Peterson  added the comment:


New changeset 99d208efed97e02d813e8166925b998bbd0d3993 by Benjamin Peterson 
(Greg Price) in branch 'master':
bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129)
https://github.com/python/cpython/commit/99d208efed97e02d813e8166925b998bbd0d3993


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +14967
pull_request: https://github.com/python/cpython/pull/15246

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-12 Thread miss-islington


miss-islington  added the comment:


New changeset 4e3dfcc4b987e683476a1b16456e57d3c9f581cb by Miss Islington (bot) 
(Greg Price) in branch 'master':
bpo-37760: Mark all generated Unicode data headers as generated. (GH-15171)
https://github.com/python/cpython/commit/4e3dfcc4b987e683476a1b16456e57d3c9f581cb


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-12 Thread miss-islington


Change by miss-islington :


--
pull_requests: +14966
pull_request: https://github.com/python/cpython/pull/15245

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-12 Thread Benjamin Peterson


Benjamin Peterson  added the comment:


New changeset ef2af1ad44be0542a47270d5173a0b920c3a450d by Benjamin Peterson 
(Greg Price) in branch 'master':
bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130)
https://github.com/python/cpython/commit/ef2af1ad44be0542a47270d5173a0b920c3a450d


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-12 Thread Benjamin Peterson


Benjamin Peterson  added the comment:

Thanks for working this. In your interested in doing some more hacking on 
Unicode data, there's #32771.

--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-07 Thread Greg Price


Change by Greg Price :


--
pull_requests: +14903
pull_request: https://github.com/python/cpython/pull/15171

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-04 Thread Greg Price


Greg Price  added the comment:

Just posted three PRs:
 * GH-15128 and GH-15129 are both quite small
 * GH-15130 is the first of two patches factoring out common parsing logic.

Two remaining patches go on top of GH-15130.  Here are drafts, in case they're 
helpful for reference:
* Patch 2/2 factoring out common parsing logic: 
https://github.com/gnprice/cpython/commit/0a32a7111
* Patch converting the big tuples to a dataclass: 
https://github.com/gnprice/cpython/commit/6d8103bbc

I figure they may be easiest to review after the PR they depend on is merged, 
so my plan is to send PRs for them each in turn.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-04 Thread Greg Price


Change by Greg Price :


--
pull_requests: +14870
pull_request: https://github.com/python/cpython/pull/15130

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-04 Thread Greg Price


Change by Greg Price :


--
pull_requests: +14869
pull_request: https://github.com/python/cpython/pull/15129

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-04 Thread Greg Price


Change by Greg Price :


--
keywords: +patch
pull_requests: +14868
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/15128

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

2019-08-04 Thread Greg Price


New submission from Greg Price :

I spent some time yesterday on #18236, and I have a patch for it.

Most of that work happens in the script Tools/unicode/makeunicode.py , and 
along the way I made several changes there that I found made it somewhat nicer 
to work on, and I think will help other people reading that script too.  I'd 
like to try to merge those improvements first.

The main changes are:

 * As the script has grown over the years, it's gained many copies and 
reimplementations of logic to parse the standard format of the Unicode 
character database.  I factored those out into a single place, which makes the 
parsing code shorter and the interesting parts stand out more easily.

 * The main per-character record type in the script's data structures is a 
length-18 tuple.  Using the magic of dataclasses, I converted this so that e.g. 
the code says `record.numeric_value` instead of `record[8]`.

There's no radical restructuring or rewrite here; this script has served us 
well.  I've kept these changes focused where there's a high ratio of value, in 
future ease of development, to cost, in a reviewer's effort as well as mine.

I'll send PRs of my changes shortly.

--
components: Unicode
messages: 349020
nosy: Greg Price, ezio.melotti, vstinner
priority: normal
severity: normal
status: open
title: Refactor makeunicodedata.py: dedupe parsing, use dataclass
type: enhancement
versions: Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com