[Python-announce] Python 3.12.0 alpha 1 released.

2022-10-24 Thread Thomas Wouters
As Pablo released Python 3.11.0 final earlier today, now it's my turn to
release Python 3.12.0 alpha 1.


*This is an early developer preview of Python 3.12*
Major new features of the 3.12 series, compared to 3.11

Python 3.12 is still in development. This release, 3.12.0a1 is the first of
seven planned alpha releases.

Alpha releases are intended to make it easier to test the current state of
new features and bug fixes and to test the release process.

During the alpha phase, features may be added up until the start of the
beta phase (2023-05-08) and, if necessary, may be modified or deleted up
until the release candidate phase (2023-07-31). Please keep in mind that
this is a preview release and its use is *not *recommended for production
environments.

Many new features for Python 3.12 are still being planned and written.
Among the new major new features and changes so far:

   - The deprecated `wstr` and `wstr_length` members of the C
   implementation of unicode objects were removed, per PEP 623
   .
   - In the `unittest` module, a number of long deprecated methods and
   classes were removed. (They had been deprecated since Python 3.1 or 3.2).
   - The deprecated `smtpd` module has been removed.
   - A number of other old, broken and deprecated functions, classes and
   methods have been removed.
   - (Hey, **fellow core developer,** if a feature you find important
   is missing from this list, let Thomas know .)

The next pre-release of Python 3.12 will be 3.12.0a2, currently scheduled
for 2022-11-14.

More resources

   - Online Documentation 
   - PEP 693 , the 3.12 Release
   Schedule
   - Report bugs at https://github.com/python/cpython/issues.
   - Help fund Python and its community at
   https://www.python.org/psf/donations/.


And now for something completely different

This is Not the Poem that I Had Hoped to Write


This is not the poem that I had hoped to write
when I sat at my desk and the page was white.
You see, there were other words that I’d had in mind,
yet this is what I leave behind.

I thought it was a poem to eradicate war;
one of such power, it would heal all the sores
of a world torn apart by conflict and schism.
But it isn’t.

Lovers, I’d imagined, would quote from it daily,
Mothers would sing it to soothe crying babies.
And whole generations would be given new hope.
Nope.

I had grand aspirations. Believe me, I tried.
Humanity examined with lessons applied.
But the right words escaped me; so often they do.
Have these in lieu.

Brian Bilston 

Enjoy the new releases

Thanks to all of the many volunteers who help make Python Development and
these releases possible! Please consider supporting our efforts by
volunteering yourself or through organization contributions to the Python
Software Foundation.

Regards from dusky California,

Your release team,
Thomas Wouters @Yhg1s
Ned Deily @nad
Steve Dower @steve.dower

-- 
Thomas Wouters 
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Chris Angelico
On Tue, 25 Oct 2022 at 09:34, Peter J. Holzer  wrote:
> > One thing I find quite interesting, though, is the way that browsers
> > *differ* in the face of bad nesting of tags. Recently I was struggling
> > to figure out a problem with an HTML form, and eventually found that
> > there was a spurious  tag way up higher in the page. Forms don't
> > nest, so that's invalid, but different browsers had slightly different
> > ways of showing it.
>
> Yeah, mismatched form tags can have weird effects. I don't remember the
> details but I scratched my head over that one more than once.
>

Yeah. I think my weirdest issue was one time when I inadvertently had
a  element (with a form inside it) inside something else with
a form (because the  was missing). Neither "dialog inside main"
nor "form in  dialog separate from form in main" is a problem, and
even "oops, missed a closing form tag" isn't that big a deal, but put
them all together, and you end up with a bizarre situation where
Firefox 91 behaves one way and Chrome (some-version) behaves another
way.

That was a fun day. Remember, folks, even if you think you ran the W3C
validator on your code recently, it can still be worth checking. Just
in case.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Peter J. Holzer
On 2022-10-25 06:56:58 +1100, Chris Angelico wrote:
> On Tue, 25 Oct 2022 at 04:22, Peter J. Holzer  wrote:
> > There may be several reasons:
> >
> > * Historically, some browsers differed in which end tags were actually
> >   optional. Since (AFAIK) no mainstream browser ever implemented a real
> >   SGML parser (they were always "tag soup" parsers with lots of ad-hoc
> >   rules) this sometimes even changed within the same browser depending
> >   on context (e.g. a simple table might work but nested tables woudn't).
> >   So people started to use end-tags defensively.
> > * XHTML was for some time popular and it doesn't have any optional tags.
> >   So people got into the habit of always using end tags and writing
> >   empty tags as .
> > * Aesthetics: Always writing the end tags is more consistent and may
> >   look more balanced.
> > * Cargo-cult: People saw other people do that and copied the habit
> >   without thinking about it.
> >
> >
> > > Are you saying that it's better to omit them all?
> >
> > If you want to conserve keystrokes :-)
> >
> > I think it doesn't matter. Both are valid.
> >
> > > More importantly: Would you omit all the  closing tags you can, or
> > > would you include them?
> >
> > I usually write them.
> 
> Interesting. So which of the above reasons is yours?

Mostly the third one at this point I think. The first one has gone away
for me with HTML5. The second one still lingers at the back of
my brain, but I've gotten rid of the habit of writing , so I'm
recevering ;-). But I still like my code to be nice and tidy, and
whether my sense of tidyness was influenced by XML or not, if the end
tags are missing it looks off, somehow.

(That said, I do sometimes leave them off to reduce visual clutter.)


> One thing I find quite interesting, though, is the way that browsers
> *differ* in the face of bad nesting of tags. Recently I was struggling
> to figure out a problem with an HTML form, and eventually found that
> there was a spurious  tag way up higher in the page. Forms don't
> nest, so that's invalid, but different browsers had slightly different
> ways of showing it.

Yeah, mismatched form tags can have weird effects. I don't remember the
details but I scratched my head over that one more than once.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


[Python-announce] [RELEASE] Python 3.11 final (3.11.0) is available

2022-10-24 Thread Pablo Galindo Salgado
Python 3.11 is finally released. In the CPython release team, we have put a
lot of effort into making 3.11 the best version of Python possible. Better
tracebacks, faster Python, exception groups and except*, typing
improvements and much more. Get it here:

https://www.python.org/downloads/release/python-3110/

## This is the stable release of Python 3.11.0

Python 3.11.0 is the newest major release of the Python programming
language, and it contains many new features and optimizations.

# Major new features of the 3.11 series, compared to 3.10

Some of the new major new features and changes in Python 3.11 are:

## General changes

* [PEP 657](https://www.python.org/dev/peps/pep-0657/) -- Include
Fine-Grained Error Locations in Tracebacks
* [PEP 654](https://www.python.org/dev/peps/pep-0654/) -- Exception Groups
and `except*`
* [PEP 680](https://www.python.org/dev/peps/pep-0680/) -- tomllib: Support
for Parsing TOML in the Standard Library
* [gh-90908](https://github.com/python/cpython/issues/90908) -- Introduce
task groups to asyncio
* [gh-34627](https://github.com/python/cpython/issues/34627/) -- Atomic
grouping (`(?>...)`) and possessive quantifiers (`*+, ++, ?+, {m,n}+`) are
now supported in regular expressions.
* The [Faster CPython Project](https://github.com/faster-cpython/) is
already yielding some exciting results. Python 3.11 is up to 10-60% faster
than Python 3.10. On average, we measured a 1.22x speedup on the standard
benchmark suite. See [Faster CPython](
https://docs.python.org/3.11/whatsnew/3.11.html#faster-cpython) for details.

## Typing and typing language changes

* [PEP 673](https://www.python.org/dev/peps/pep-0673/) --  Self Type
* [PEP 646](https://www.python.org/dev/peps/pep-0646/) -- Variadic Generics
* [PEP 675](https://www.python.org/dev/peps/pep-0675/) -- Arbitrary Literal
String Type
* [PEP 655](https://www.python.org/dev/peps/pep-0655/) -- Marking
individual TypedDict items as required or potentially-missing
* [PEP 681](https://www.python.org/dev/peps/pep-0681/) -- Data Class
Transforms

# More resources

* [Online Documentation](https://docs.python.org/3.11/)
* [PEP 664](https://www.python.org/dev/peps/pep-0664/), 3.11 Release
Schedule
* Report bugs at [
https://github.com/python/cpython/issues](https://github.com/python/cpython/issues)
.
* [Help fund Python and its community](/psf/donations/).

# And now for something completely different

When a spherical non-rotating body of a critical radius collapses under its
own gravitation under general relativity, theory suggests it will collapse
to a single point. This is not the case with a rotating black hole (a Kerr
black hole). With a fluid rotating body, its distribution of mass is not
spherical (it shows an equatorial bulge), and it has angular momentum.
Since a point cannot support rotation or angular momentum in classical
physics (general relativity being a classical theory), the minimal shape of
the singularity that can support these properties is instead a ring with
zero thickness but non-zero radius, and this is referred to as a
ringularity or Kerr singularity.

This kind of singularity has the following peculiar property. The spacetime
allows a geodesic curve (describing the movement of observers and photons
in spacetime) to pass through the center of this ring singularity. The
region beyond permits closed time-like curves. Since the trajectory of
observers and particles in general relativity are described by time-like
curves, it is possible for observers in this region to return to their
past. This interior solution is not likely to be physical and is considered
a purely mathematical artefact.

There are some other interesting free-fall trajectories. For example, there
is a point in the axis of symmetry that has the property that if an
observer is below this point, the pull from the singularity will force the
observer to pass through the middle of the ring singularity to the region
with closed time-like curves and it will experience repulsive gravity that
will push it back to the original region, but then it will experience the
pull from the singularity again and will repeat this process forever. This
is, of course, only if the extreme gravity doesn’t destroy the observer
first.

# We hope you enjoy the new releases!

Thanks to all of the many volunteers who help make Python Development and
these releases possible! Please consider supporting our efforts by
volunteering yourself or through organization contributions to the Python
Software Foundation.

https://www.python.org/psf/

If you have any questions, please reach out to me or another member of the
release team :)

Your friendly release team,

Ned Deily @nad https://discuss.python.org/u/nad
Steve Dower @steve.dower https://discuss.python.org/u/steve.dower
Pablo Galindo Salgado @pablogsal https://discuss.python.org/u/pablogsal
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an 

Re: Ref-strings in logging messages (was: Performance issue with CPython 3.10 + Cython)

2022-10-24 Thread Barry Scott


> On 8 Oct 2022, at 11:50, Weatherby,Gerard  wrote:
> 
> Logging does support passing a callable, if indirectly. It only calls __str__ 
> on the object passed if debugging is enabled.
>  
> class Defer:
> 
> def __init__(self,fn):
> self.fn = fn
> 
> def __str__(self):
> return self.fn()
> 
> def some_expensive_function():
> return "hello"
> 
> logging.basicConfig()
> logging.debug(Defer(some_expensive_function))

Oh what a clever hack. Took a few minutes of code reading to see why this works.
You are exploiting the str(msg) that is in class LogRecords getMessage().

```
def getMessage(self):
"""
Return the message for this LogRecord.

Return the message for this LogRecord after merging any user-supplied
arguments with the message.
"""
msg = str(self.msg)
if self.args:
msg = msg % self.args
return msg
```

Barry


>  
>  
> From: Python-list  > on behalf of 
> Barry mailto:ba...@barrys-emacs.org>>
> Date: Friday, October 7, 2022 at 1:30 PM
> To: MRAB mailto:pyt...@mrabarnett.plus.com>>
> Cc: python-list@python.org  
> mailto:python-list@python.org>>
> Subject: Re: Ref-strings in logging messages (was: Performance issue with 
> CPython 3.10 + Cython)
> 
> *** Attention: This is an external email. Use caution responding, opening 
> attachments or clicking on links. ***
> 
> > On 7 Oct 2022, at 18:16, MRAB  wrote:
> >
> > On 2022-10-07 16:45, Skip Montanaro wrote:
> >>> On Fri, Oct 7, 2022 at 9:42 AM Andreas Ames 
> >>> 
> >>> wrote:
> >>> 1. The culprit was me. As lazy as I am, I have used f-strings all over the
> >>> place in calls to `logging.logger.debug()` and friends, evaluating all
> >>> arguments regardless of whether the logger was enabled or not.
> >>>
> >> I thought there was some discussion about whether and how to efficiently
> >> admit f-strings to the logging package. I'm guessing that's not gone
> >> anywhere (yet).
> > Letting you pass in a callable to call might help because that you could 
> > use lambda.
> 
> Yep, that’s the obvious way to avoid expensive log data generation.
> Would need logging module to support that use case.
> 
> Barry
> 
> > --
> > https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$
> >  
> > 
> >
> 
> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!mrESxAj9YCHsdtNAfkNiY-Zf6U3WTIqaNrgBmbw1ELlQy51ilob43dD0ONsqvg4a94MEdOdwomgyqfyABbvRnA$
>  
> 
-- 
https://mail.python.org/mailman/listinfo/python-list


[RELEASE] Python 3.11 final (3.11.0) is available

2022-10-24 Thread Pablo Galindo Salgado
Python 3.11 is finally released. In the CPython release team, we have put a
lot of effort into making 3.11 the best version of Python possible. Better
tracebacks, faster Python, exception groups and except*, typing
improvements and much more. Get it here:

https://www.python.org/downloads/release/python-3110/

## This is the stable release of Python 3.11.0

Python 3.11.0 is the newest major release of the Python programming
language, and it contains many new features and optimizations.

# Major new features of the 3.11 series, compared to 3.10

Some of the new major new features and changes in Python 3.11 are:

## General changes

* [PEP 657](https://www.python.org/dev/peps/pep-0657/) -- Include
Fine-Grained Error Locations in Tracebacks
* [PEP 654](https://www.python.org/dev/peps/pep-0654/) -- Exception Groups
and `except*`
* [PEP 680](https://www.python.org/dev/peps/pep-0680/) -- tomllib: Support
for Parsing TOML in the Standard Library
* [gh-90908](https://github.com/python/cpython/issues/90908) -- Introduce
task groups to asyncio
* [gh-34627](https://github.com/python/cpython/issues/34627/) -- Atomic
grouping (`(?>...)`) and possessive quantifiers (`*+, ++, ?+, {m,n}+`) are
now supported in regular expressions.
* The [Faster CPython Project](https://github.com/faster-cpython/) is
already yielding some exciting results. Python 3.11 is up to 10-60% faster
than Python 3.10. On average, we measured a 1.22x speedup on the standard
benchmark suite. See [Faster CPython](
https://docs.python.org/3.11/whatsnew/3.11.html#faster-cpython) for details.

## Typing and typing language changes

* [PEP 673](https://www.python.org/dev/peps/pep-0673/) --  Self Type
* [PEP 646](https://www.python.org/dev/peps/pep-0646/) -- Variadic Generics
* [PEP 675](https://www.python.org/dev/peps/pep-0675/) -- Arbitrary Literal
String Type
* [PEP 655](https://www.python.org/dev/peps/pep-0655/) -- Marking
individual TypedDict items as required or potentially-missing
* [PEP 681](https://www.python.org/dev/peps/pep-0681/) -- Data Class
Transforms

# More resources

* [Online Documentation](https://docs.python.org/3.11/)
* [PEP 664](https://www.python.org/dev/peps/pep-0664/), 3.11 Release
Schedule
* Report bugs at [
https://github.com/python/cpython/issues](https://github.com/python/cpython/issues)
.
* [Help fund Python and its community](/psf/donations/).

# And now for something completely different

When a spherical non-rotating body of a critical radius collapses under its
own gravitation under general relativity, theory suggests it will collapse
to a single point. This is not the case with a rotating black hole (a Kerr
black hole). With a fluid rotating body, its distribution of mass is not
spherical (it shows an equatorial bulge), and it has angular momentum.
Since a point cannot support rotation or angular momentum in classical
physics (general relativity being a classical theory), the minimal shape of
the singularity that can support these properties is instead a ring with
zero thickness but non-zero radius, and this is referred to as a
ringularity or Kerr singularity.

This kind of singularity has the following peculiar property. The spacetime
allows a geodesic curve (describing the movement of observers and photons
in spacetime) to pass through the center of this ring singularity. The
region beyond permits closed time-like curves. Since the trajectory of
observers and particles in general relativity are described by time-like
curves, it is possible for observers in this region to return to their
past. This interior solution is not likely to be physical and is considered
a purely mathematical artefact.

There are some other interesting free-fall trajectories. For example, there
is a point in the axis of symmetry that has the property that if an
observer is below this point, the pull from the singularity will force the
observer to pass through the middle of the ring singularity to the region
with closed time-like curves and it will experience repulsive gravity that
will push it back to the original region, but then it will experience the
pull from the singularity again and will repeat this process forever. This
is, of course, only if the extreme gravity doesn’t destroy the observer
first.

# We hope you enjoy the new releases!

Thanks to all of the many volunteers who help make Python Development and
these releases possible! Please consider supporting our efforts by
volunteering yourself or through organization contributions to the Python
Software Foundation.

https://www.python.org/psf/

If you have any questions, please reach out to me or another member of the
release team :)

Your friendly release team,

Ned Deily @nad https://discuss.python.org/u/nad
Steve Dower @steve.dower https://discuss.python.org/u/steve.dower
Pablo Galindo Salgado @pablogsal https://discuss.python.org/u/pablogsal
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Chris Angelico
On Tue, 25 Oct 2022 at 04:22, Peter J. Holzer  wrote:
> There may be several reasons:
>
> * Historically, some browsers differed in which end tags were actually
>   optional. Since (AFAIK) no mainstream browser ever implemented a real
>   SGML parser (they were always "tag soup" parsers with lots of ad-hoc
>   rules) this sometimes even changed within the same browser depending
>   on context (e.g. a simple table might work but nested tables woudn't).
>   So people started to use end-tags defensively.
> * XHTML was for some time popular and it doesn't have any optional tags.
>   So people got into the habit of always using end tags and writing
>   empty tags as .
> * Aesthetics: Always writing the end tags is more consistent and may
>   look more balanced.
> * Cargo-cult: People saw other people do that and copied the habit
>   without thinking about it.
>
>
> > Are you saying that it's better to omit them all?
>
> If you want to conserve keystrokes :-)
>
> I think it doesn't matter. Both are valid.
>
> > More importantly: Would you omit all the  closing tags you can, or
> > would you include them?
>
> I usually write them.

Interesting. So which of the above reasons is yours? Personally, I do
it for a slightly different reason: Many end tags are *situationally*
optional, and it's much easier to debug code when you
change/insert/remove something and nothing changes, than when doing so
affects the implicit closing tags.

> I also indent the contents of an element, so I
> would write your example as:
>
> 
> 
>   
> Hello, world!
> 
>   Paragraph 2
> 
> 
>   Hey look, a third paragraph!
> 
>   
> 
>
> (As you can see I would also include the body tags to make that element
> explicit. I would normally also add a bit of boilerplate (especially a
> head with a charset and viewport definition), but I omit them here since
> they would change the parse tree)
>

Yeah - any REAL page would want quite a bit (very few pages these days
manage without a style sheet, and it seems that hardly any survive
without importing a few gigabytes of JavaScript, but that's not
mandatory), but in ancient pages, there's still a well-defined parse
structure for every tag sequences.

One thing I find quite interesting, though, is the way that browsers
*differ* in the face of bad nesting of tags. Recently I was struggling
to figure out a problem with an HTML form, and eventually found that
there was a spurious  tag way up higher in the page. Forms don't
nest, so that's invalid, but different browsers had slightly different
ways of showing it. (Obviously the W3C Validator was the most helpful
tool here, since it reports it as an error rather than constructing
any sort of DOM tree.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Are Floating Point Numbers still a Can of Worms?

2022-10-24 Thread Dennis Lee Bieber
On Mon, 24 Oct 2022 14:52:28 +, "Schachner, Joseph (US)"
 declaimed the following:

>Floating point will always be a can of worms, as long as people expect it to 
>represent real numbers with more precision that float has.  Usually this is 
>not an issue, but sometimes it is.  And, although this example does not 
>exhibit subtractive cancellation, that is the surest way to have less 
>precision that the two values you subtracted.  And if you try to add up lots 
>of values, if your sum grows large enough, tiny values will not change it 
>anymore, even if there are many of them  - there are simple algorithms to 
>avoid this effect.  But all of this is because float has limited precision.
>

Might I suggest this to those affected...
https://www.amazon.com/Real-Computing-Made-Engineering-Calculations/dp/0486442217/ref=tmm_pap_swatch_0?_encoding=UTF8=134371=8-1

(Wow -- they want a fortune for the original hard-cover, which I own)


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


python 3.11

2022-10-24 Thread jschwar
Is python 3.11 still being release today?

 

Just wondering.  Not sure when during the day this is done.  

 

Thanks.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Roel Schroeven

Jon Ribbens via Python-list schreef op 24/10/2022 om 19:01:

On 2022-10-24, Chris Angelico  wrote:
> On Tue, 25 Oct 2022 at 02:45, Jon Ribbens via Python-list 
  wrote:
>> Adding in the omitted , , , , and 
>> would make no difference and there's no particular reason to recommend
>> doing so as far as I'm aware.
>
> And yet most people do it. Why?

They agree with Tim Peters that "Explicit is better than implicit",
I suppose? ;-)


I don't write all that much HTML, but when I do, it include those tags 
largely for that reason indeed. We don't write HTML just for the 
browser, we also write it for the web developer. And I think it's easier 
for the web developer when the different sections are clearly 
distinguished, and what better way to do it than use their tags.



> More importantly: Would you omit all the  closing tags you can, or
> would you include them?

It would depend on how much content was inside them I guess.
Something like:

   
 First item
 Second item
 Third item
   

is very easy to understand, but if each item was many lines long then it
may be less confusing to explicitly close - not least for indentation
purposes.
I mostly include closing tags, if for no other reason than that I have 
the impression that editors generally work better (i.e. get things like 
indentation and syntax highlighting right) that way.


--
"Je ne suis pas d’accord avec ce que vous dites, mais je me battrai jusqu’à
la mort pour que vous ayez le droit de le dire."
-- Attribué à Voltaire
"I disapprove of what you say, but I will defend to the death your right to
say it."
-- Attributed to Voltaire
"Ik ben het niet eens met wat je zegt, maar ik zal je recht om het te zeggen
tot de dood toe verdedigen"
-- Toegeschreven aan Voltaire
--
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Peter J. Holzer
On 2022-10-25 03:09:33 +1100, Chris Angelico wrote:
> On Tue, 25 Oct 2022 at 02:45, Jon Ribbens via Python-list
>  wrote:
> > On 2022-10-24, Chris Angelico  wrote:
> > > On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer  wrote:
> > >> Yes, I got that. What I wanted to say was that this is indeed a bug in
> > >> html.parser and not an error (or sloppyness, as you called it) in the
> > >> input or ambiguity in the HTML standard.
> > >
> > > I described the HTML as "sloppy" for a number of reasons, but I was of
> > > the understanding that it's generally recommended to have the closing
> > > tags. Not that it matters much.
> >
> > Some elements don't need close tags, or even open tags. Unless you're
> > using XHTML you don't need them and indeed for the case of void tags
> > (e.g. , ) you must not include the close tags.
> 
> Yep, I'm aware of void tags, but I'm talking about the container tags
> - in this case,  and  - which, in a lot of older HTML pages,
> are treated as "separator" tags. Consider this content:
> 
> 
> Hello, world!
> 
> Paragraph 2
> 
> Hey look, a third paragraph!
> 
> 
> Stick a doctype onto that and it should be valid HTML5, but as it is,
> it's the exact sort of thing that was quite common in the 90s.
> 
> The  tag is not a void tag, but according to the spec, it's legal
> to omit the  if the element is followed directly by another 
> element (or any of a specific set of others), or if there is no
> further content.

Right. The parser knows the structure of an HTML document, which tags
are optional and which elements can be inside of which other elements.
For SGML-based HTML versions (2.0 to 4.01) this is formally described by
the DTD.

So when parsing your file, an HTML parser would work like this

 - Yup, I expect an HTML element here:
HTML
Hello, world! - #PCDATA? Not allowed as a child of HTML. There must
be a HEAD and a BODY, both of which have optional start tags.
HEAD can't contain #PCDATA either, so we must be inside of BODY
and HEAD was empty:
HTML
  ├─ HEAD
  └─ BODY
   └─ Hello, world!
 - Allowed in BODY, so just add that:
HTML
  ├─ HEAD
  └─ BODY
   ├─ #PCDATA: Hello, world!
   └─ P
Paragraph 2 - #PCDATA is allowed in P, so add it as a child:
HTML
  ├─ HEAD
  └─ BODY
   ├─ #PCDATA: Hello, world!
   └─ P
   └─ #PCDATA: Paragraph 2
 - Not allowed inside of P, so that implicitely closes the
previous P element and we go up one level:
HTML
  ├─ HEAD
  └─ BODY
   ├─ #PCDATA: Hello, world!
   ├─ P
   │   └─ #PCDATA: Paragraph 2
   └─ P
Hey look, a third paragraph! - Same as above:
HTML
  ├─ HEAD
  └─ BODY
   ├─ #PCDATA: Hello, world!
   ├─ P
   │   └─ #PCDATA: Paragraph 2
   └─ P
   └─ #PCDATA: Hey look, a third paragraph!
 - The end tags of P and BODY are optional, so the end of
HTML closes them implicitely, and we have our final parse tree
(unchanged from the last step):
HTML
  ├─ HEAD
  └─ BODY
   ├─ #PCDATA: Hello, world!
   ├─ P
   │   └─ #PCDATA: Paragraph 2
   └─ P
   └─ #PCDATA: Hey look, a third paragraph!

For a human, the  tags might feel like separators here. But
syntactically they aren't - they start a new element. Note especially
that "Hello, world!" is not part of a P element but a direct child of
BODY (which may or may not be intended by the author).

> 
> > Adding in the omitted , , , , and 
> > would make no difference and there's no particular reason to recommend
> > doing so as far as I'm aware.
> 
> And yet most people do it. Why?

There may be several reasons:

* Historically, some browsers differed in which end tags were actually
  optional. Since (AFAIK) no mainstream browser ever implemented a real
  SGML parser (they were always "tag soup" parsers with lots of ad-hoc
  rules) this sometimes even changed within the same browser depending
  on context (e.g. a simple table might work but nested tables woudn't).
  So people started to use end-tags defensively.
* XHTML was for some time popular and it doesn't have any optional tags.
  So people got into the habit of always using end tags and writing
  empty tags as .
* Aesthetics: Always writing the end tags is more consistent and may
  look more balanced.
* Cargo-cult: People saw other people do that and copied the habit
  without thinking about it.


> Are you saying that it's better to omit them all?

If you want to conserve keystrokes :-)

I think it doesn't matter. Both are valid.

> More importantly: Would you omit all the  closing tags you can, or
> would you include them?

I usually write them. I also indent the contents of an 

Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Jon Ribbens via Python-list
On 2022-10-24, Chris Angelico  wrote:
> On Tue, 25 Oct 2022 at 02:45, Jon Ribbens via Python-list
> wrote:
>>
>> On 2022-10-24, Chris Angelico  wrote:
>> > On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer  wrote:
>> >> Yes, I got that. What I wanted to say was that this is indeed a bug in
>> >> html.parser and not an error (or sloppyness, as you called it) in the
>> >> input or ambiguity in the HTML standard.
>> >
>> > I described the HTML as "sloppy" for a number of reasons, but I was of
>> > the understanding that it's generally recommended to have the closing
>> > tags. Not that it matters much.
>>
>> Some elements don't need close tags, or even open tags. Unless you're
>> using XHTML you don't need them and indeed for the case of void tags
>> (e.g. , ) you must not include the close tags.
>
> Yep, I'm aware of void tags, but I'm talking about the container tags
> - in this case,  and  - which, in a lot of older HTML pages,
> are treated as "separator" tags.

Yes, hence why I went on to talk about container tags.

> Consider this content:
>
>
> Hello, world!
>
> Paragraph 2
>
> Hey look, a third paragraph!
>
>
> Stick a doctype onto that and it should be valid HTML5,

Nope, it's missing a .

>> Adding in the omitted , , , , and 
>> would make no difference and there's no particular reason to recommend
>> doing so as far as I'm aware.
>
> And yet most people do it. Why?

They agree with Tim Peters that "Explicit is better than implicit",
I suppose? ;-)

> Are you saying that it's better to omit them all?

No, I'm saying it's neither option is necessarily better than the other.

> More importantly: Would you omit all the  closing tags you can, or
> would you include them?

It would depend on how much content was inside them I guess.
Something like:

  
First item
Second item
Third item
  

is very easy to understand, but if each item was many lines long then it
may be less confusing to explicitly close - not least for indentation
purposes.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: A trivial question that I don't know - document a function/method

2022-10-24 Thread Mats Wichmann

On 10/23/22 14:20, Paulo da Silva wrote:

Às 21:58 de 22/10/22, Paulo da Silva escreveu:

Hi all!

What is the correct way, if any, of documenting a function/method?



Thank you all for the, valuable as usual, suggestions.
I am now able to make my choices.

Paulo


It also matters whether you expect the docstring to stand on its own, or 
to be processed by a doc-generation tool (like Sphinx).  In the former 
case, make it look nice in a way that suits you. In the latter case, use 
the reStructuredText conventions that exist (there are at least three 
common styles) for the document processor.  While these styles are 
intended to also be very human-readable, they may not be exactly how you 
wanted to format things for visual display when looking directly at 
code, so it's worth thinking about this up front.


See for example:

https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html



--
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Chris Angelico
On Tue, 25 Oct 2022 at 02:45, Jon Ribbens via Python-list
 wrote:
>
> On 2022-10-24, Chris Angelico  wrote:
> > On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer  wrote:
> >> Yes, I got that. What I wanted to say was that this is indeed a bug in
> >> html.parser and not an error (or sloppyness, as you called it) in the
> >> input or ambiguity in the HTML standard.
> >
> > I described the HTML as "sloppy" for a number of reasons, but I was of
> > the understanding that it's generally recommended to have the closing
> > tags. Not that it matters much.
>
> Some elements don't need close tags, or even open tags. Unless you're
> using XHTML you don't need them and indeed for the case of void tags
> (e.g. , ) you must not include the close tags.

Yep, I'm aware of void tags, but I'm talking about the container tags
- in this case,  and  - which, in a lot of older HTML pages,
are treated as "separator" tags. Consider this content:


Hello, world!

Paragraph 2

Hey look, a third paragraph!


Stick a doctype onto that and it should be valid HTML5, but as it is,
it's the exact sort of thing that was quite common in the 90s. (I'm
not sure when lowercase tags became more popular, but in any case (pun
intended), that won't affect validity.)

The  tag is not a void tag, but according to the spec, it's legal
to omit the  if the element is followed directly by another 
element (or any of a specific set of others), or if there is no
further content.

> Adding in the omitted , , , , and 
> would make no difference and there's no particular reason to recommend
> doing so as far as I'm aware.

And yet most people do it. Why? Are you saying that it's better to
omit them all?

More importantly: Would you omit all the  closing tags you can, or
would you include them?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Jon Ribbens via Python-list
On 2022-10-24, Chris Angelico  wrote:
> On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer  wrote:
>> Yes, I got that. What I wanted to say was that this is indeed a bug in
>> html.parser and not an error (or sloppyness, as you called it) in the
>> input or ambiguity in the HTML standard.
>
> I described the HTML as "sloppy" for a number of reasons, but I was of
> the understanding that it's generally recommended to have the closing
> tags. Not that it matters much.

Some elements don't need close tags, or even open tags. Unless you're
using XHTML you don't need them and indeed for the case of void tags
(e.g. , ) you must not include the close tags.

A minimal HTML file might look like this:


Minimal HTML file
Minimal HTML fileThis is a minimal HTML file.

which would be parsed into this:



  

Minimal HTML file
  
  

  Minimal HTML file
  This is a minimal HTML file.

  


Adding in the omitted , , , , and 
would make no difference and there's no particular reason to recommend
doing so as far as I'm aware.
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Are Floating Point Numbers still a Can of Worms?

2022-10-24 Thread Schachner, Joseph (US)
Floating point will always be a can of worms, as long as people expect it to 
represent real numbers with more precision that float has.  Usually this is not 
an issue, but sometimes it is.  And, although this example does not exhibit 
subtractive cancellation, that is the surest way to have less precision that 
the two values you subtracted.  And if you try to add up lots of values, if 
your sum grows large enough, tiny values will not change it anymore, even if 
there are many of them  - there are simple algorithms to avoid this effect.  
But all of this is because float has limited precision.

--- Joseph S.


Teledyne Confidential; Commercially Sensitive Business Data

-Original Message-
From: Pieter van Oostrum  
Sent: Sunday, October 23, 2022 10:25 AM
To: python-list@python.org
Subject: Re: Are Floating Point Numbers still a Can of Worms?

Mostowski Collapse  writes:

> I also get:
>
> Python 3.11.0rc1 (main, Aug 8 2022, 11:30:54)
 2.718281828459045**0.8618974796837966
> 2.367649
>
> Nice try, but isn't this one the more correct?
>
> ?- X is 2.718281828459045**0.8618974796837966.
> X = 2.36764897.
>

That's probably the accuracy of the underlying C implementation of the exp 
function.

In [25]: exp(0.8618974796837966)
Out[25]: 2.367649

But even your answer can be improved:

Maxima:

(%i1) fpprec:30$

(%i2) bfloat(2.718281828459045b0)^bfloat(.8618974796837966b0);
(%o2)  2.367648983187397393143b0

but:

(%i7) bfloat(%e)^bfloat(.8618974796837966b0);
(%o7)  2.36764900085638369695b0
surprisingly closer to Python's answer.

but 2.718281828459045 isn't e. Close but no cigar.

(%i10) bfloat(2.718281828459045b0) - bfloat(%e);
(%o10)   - 2.35360287471352802147785151603b-16

Fricas:

(1) -> 2.718281828459045^0.8618974796837966 

   (1)  2.367648_98319

(2) -> exp(0.8618974796837966)

   (2)  2.367649_00086

-- 
Pieter van Oostrum 
www: http://pieter.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python For TinyML

2022-10-24 Thread Kisakye Moses
On Sunday, October 23, 2022 at 8:23:26 PM UTC+3, rbowman wrote:
> On Sun, 23 Oct 2022 08:46:10 -0700 (PDT), Kisakye Moses wrote: 
> 
> > Hello am a (M) and glad that I've joined this group. 
> > Any help in python for TinyML, i will honored
> https://tinynet.autoai.org/en/latest/

Thank you so much
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: score function in linear regression model

2022-10-24 Thread Reto
On Sun, Oct 23, 2022 at 05:11:10AM -0700, Fatemeh Heydari wrote:
> model.score(X,Y)

That will basically check how good your model is.

It takes a bunch of X values with known values, which you provide in Y
and compares the output of model.Predict(X) with the Y's and gives you
some metrics as to how good that performed.

In the case of linear regression that be R^2, the coefficient of determination
of the prediction.

Cheers,
Reto
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Chris Angelico
On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer  wrote:
>
> On 2022-10-24 21:56:13 +1100, Chris Angelico wrote:
> > On Mon, 24 Oct 2022 at 21:33, Peter J. Holzer  wrote:
> > > Ron has already noted that the lxml and html5 parser do the right thing,
> > > so just for the record:
> > >
> > > The HTML fragment above is well-formed and contains a number of li
> > > elements at the same level directly below the ol element, not lots of
> > > nested li elements. The end tag of the li element is optional (except in
> > > XHTML) and li elements don't nest.
> >
> > That's correct. However, parsing it with html.parser and then
> > reconstituting it as shown in the example code results in all the
> >  tags coming up right before the , indicating that the 
> > tags were parsed as deeply nested rather than as siblings.
>
> Yes, I got that. What I wanted to say was that this is indeed a bug in
> html.parser and not an error (or sloppyness, as you called it) in the
> input or ambiguity in the HTML standard.

I described the HTML as "sloppy" for a number of reasons, but I was of
the understanding that it's generally recommended to have the closing
tags. Not that it matters much.

> > which html5lib seems to be doing fine. Whether
> > it has other issues, I don't know, but I guess I'll find out
>
> The link somebody posted mentions that it's "very slow". Which may or
> may not be a problem when you have to parse 9000 files. But if it does
> implement HTML5 correctly, it should parse any file the same as a modern
> browser does (maybe excluding quirks mode).
>

Yeah. TBH I think the two-hour run time is primarily dominated by
network delays, not parsing time, but if I had a service where people
could upload HTML to be parsed, that might affect throughput.

For the record, if anyone else is considering html5lib: It is likely
"fast enough", even if not fast. Give it a try.

(And I know what slow parsing feels like. Parsing a ~100MB file with a
decently-fast grammar-based lexer takes a good while. Parsing the same
content after it's been converted to JSON? Fast.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: A trivial question that I don't know - document a function/method

2022-10-24 Thread Schachner, Joseph (US)
I head a small software team much of whose output is Python.   I would 
gratefully accept any of the formats you show below.  My preference is #1.

--- Joseph S.


Teledyne Confidential; Commercially Sensitive Business Data

-Original Message-
From: Paulo da Silva  
Sent: Saturday, October 22, 2022 4:58 PM
To: python-list@python.org
Subject: A trivial question that I don't know - document a function/method

Hi all!

What is the correct way, if any, of documenting a function/method?

1.
def foo(a,b):
""" A description.
a: Whatever 1
b: Whatever 2
"""
...

2.
def foo(a,b):
""" A description.
a -- Whatever 1
b -- Whatever 2
"""
...

3.
def foo(a,b):
""" A description.
@param a: Whatever 1
@param b: Whatever 2
"""
...

4.
def foo(a,b):
""" A description.
:param a: Whatever 1
:param b: Whatever 2
"""
...

5.
Any other ...

Any comments/suggestions are welcome.
Thanks.
Paulo

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Peter J. Holzer
On 2022-10-24 21:56:13 +1100, Chris Angelico wrote:
> On Mon, 24 Oct 2022 at 21:33, Peter J. Holzer  wrote:
> > Ron has already noted that the lxml and html5 parser do the right thing,
> > so just for the record:
> >
> > The HTML fragment above is well-formed and contains a number of li
> > elements at the same level directly below the ol element, not lots of
> > nested li elements. The end tag of the li element is optional (except in
> > XHTML) and li elements don't nest.
> 
> That's correct. However, parsing it with html.parser and then
> reconstituting it as shown in the example code results in all the
>  tags coming up right before the , indicating that the 
> tags were parsed as deeply nested rather than as siblings.

Yes, I got that. What I wanted to say was that this is indeed a bug in
html.parser and not an error (or sloppyness, as you called it) in the
input or ambiguity in the HTML standard.


> In order to get a successful parse out of this, I need something which
> sees them as siblings,

Right, but Roel (correct name this time) had already posted that lxml
and html5lib parse this correctly, so I saw no need to belabour that
point.

> which html5lib seems to be doing fine. Whether
> it has other issues, I don't know, but I guess I'll find out

The link somebody posted mentions that it's "very slow". Which may or
may not be a problem when you have to parse 9000 files. But if it does
implement HTML5 correctly, it should parse any file the same as a modern
browser does (maybe excluding quirks mode).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


[Python-announce] PyCon Tanzania 2022 - Call for Speakers, Presentations, Hackathons and Workshops - 1st Reminder

2022-10-24 Thread Noah .
Dear Python Community,

We hope that you are all well to that end and that you have been busy
working on various awesome Python Code Bases. It's almost that time of the
year and we would like to engage the community for the Fourth ever Python
Conference which is planned to take place from* 4th - 7th December 2022** in
the beautiful and magnificent Island of ZANZIBAR.*

*PyCon Tanzania*, is seeking keynote speakers and instructors to contribute
to the Python Conference Program! *We are looking for speakers who would:*

- Offer a Keynote speaker on an appropriate technical topic;
- Offer a Technical Tutorial or Hackathon on an appropriate Python
topic;

*Topics must be relevant to the Python Language and Open Source Software:*

   - Python in Education
   - Python in Statistical Research
   - Python in Scientific Research
   - Python Machine Learning
   - Python & Artificial Intelligence
   - Open Source Software
   - Python  & Cyber Security
   - Python Gaming Development
   - Cloud Computing & Virtualisation
   - Ideas on improving diversity and inclusiveness
   - Python Functional programming etc
   - Python and IoT

*SUBMIT YOUR PRESENTATION / WORKSHOP/ HACKATHON / TUTORIAL BEFORE 05th Nov
2022 **To*:  *sp...@pycon.or.tz  *
Regards,
PyCon Tanzania 2022
Program Committee
http://www.pycon.or.tz/
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Chris Angelico
On Mon, 24 Oct 2022 at 21:33, Peter J. Holzer  wrote:
> Ron has already noted that the lxml and html5 parser do the right thing,
> so just for the record:
>
> The HTML fragment above is well-formed and contains a number of li
> elements at the same level directly below the ol element, not lots of
> nested li elements. The end tag of the li element is optional (except in
> XHTML) and li elements don't nest.

That's correct. However, parsing it with html.parser and then
reconstituting it as shown in the example code results in all the
 tags coming up right before the , indicating that the 
tags were parsed as deeply nested rather than as siblings.

In order to get a successful parse out of this, I need something which
sees them as siblings, which html5lib seems to be doing fine. Whether
it has other issues, I don't know, but I guess I'll find out it's
currently running on the live site and taking several hours (due to
network delays and the server being slow, so I don't really want to
parallelize and overload the thing).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Peter J. Holzer
On 2022-10-24 12:32:11 +0200, Peter J. Holzer wrote:
> Ron has already noted that the lxml and html5 parser do the right thing,
  ^^^
  Oops, sorry. That was Roel.

hp



-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Peter J. Holzer
On 2022-10-24 13:29:13 +1100, Chris Angelico wrote:
> Parsing ancient HTML files is something Beautiful Soup is normally
> great at. But I've run into a small problem, caused by this sort of
> sloppy HTML:
> 
> from bs4 import BeautifulSoup
> # See: https://gsarchive.net/gilbert/plays/princess/tennyson/tenniv.htm
> blob = b"""
> 
> 'THERE sinks the nebulous star we call the Sun,
> If that hypothesis of theirs be sound,'
[...]
> Stirring a sudden transport rose and fell.
> 
> """
> soup = BeautifulSoup(blob, "html.parser")
> print(soup)
> 
> 
> On this small snippet, it works acceptably, but puts a large number of
>  tags immediately before the .

Ron has already noted that the lxml and html5 parser do the right thing,
so just for the record:

The HTML fragment above is well-formed and contains a number of li
elements at the same level directly below the ol element, not lots of
nested li elements. The end tag of the li element is optional (except in
XHTML) and li elements don't nest.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: need help

2022-10-24 Thread Peter J. Holzer
On 2022-10-24 01:02:24 +, rbowman wrote:
> On Mon, 24 Oct 2022 10:02:10 +1100, Cameron Simpson wrote:
> > I'd say GMail are rudely dropping traffic to port 2525. Maybe try just
> > 25,
> > the normal SMTP port?
> 
> 2525 is an alternative to 587, the standard TLS port.

Port 587 is not the standard TLS port. Port 587 is the standard mail
submission port (i.e., a MUA should use port 587 when sending mail; MTAs
should use port 25 when relaying mails to other MTAs).

Traffic on both port 25 and 587 starts in plain text. The server can
indicate that it supports TLS and the client can then send a STARTTLS
command to start a TLS session.

If you want to start the connection with TLS, you can (usually) use port
465. Like 587, this is only intended for mail submission, not mail
transport.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Roel Schroeven

(Oops, accidentally only sent to Chris instead of to the list)

Op 24/10/2022 om 10:02 schreef Chris Angelico:
On Mon, 24 Oct 2022 at 18:43, Roel Schroeven  
wrote:

> Using html5lib (install package html5lib) instead of html.parser seems
> to do the trick: it inserts  right before the next , and one
> before the closing  . On my system the same happens when I don't
> specify a parser, but IIRC that's a bit fragile because other systems
> can choose different parsers of you don't explicity specify one.
>

Ah, cool. Thanks. I'm not entirely sure of the various advantages and
disadvantages of the different parsers; is there a tabulation
anywhere, or at least a list of recommendations on choosing a suitable
parser?
There's a bit of information here: 
https://beautiful-soup-4.readthedocs.io/en/latest/#installing-a-parser

Not much but maybe it can be helpful.

I'm dealing with a HUGE mess of different coding standards, all the
way from 1990s-level stuff (images for indentation, tables for
formatting, and ) up through HTML4 (a good few
of the pages have at least some  tags and declare their
encodings, mostly ISO-8859-1 or similar), to fairly modern HTML5.
There's even a couple of pages that use frames - yes, the old style
with a  block in case the browser can't handle it. I went
with html.parser on the expectation that it'd give the best "across
all standards" results, but I'll give html5lib a try and see if it
does better.

Would rather not try to use different parsers for different files, but
if necessary, I'll figure something out.

(For reference, this is roughly 9000 HTML files that have to be
parsed. Doing things by hand is basically not an option.)

I'd give lxml a try too. Maybe try to preprocess the HTML using 
html-tidy (https://www.html-tidy.org/), that might actually do a pretty 
good job of getting rid of all kinds of historical inconsistencies.
Somehow checking if any solution works for thousands of input files will 
always be a pain, I'm afraid.


--
"I've come up with a set of rules that describe our reactions to technologies:
1. Anything that is in the world when you’re born is normal and ordinary and is
   just a natural part of the way the world works.
2. Anything that's invented between when you’re fifteen and thirty-five is new
   and exciting and revolutionary and you can probably get a career in it.
3. Anything invented after you're thirty-five is against the natural order of 
things."
-- Douglas Adams, The Salmon of Doubt

--
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Roel Schroeven

Op 24/10/2022 om 9:42 schreef Roel Schroeven:
Using html5lib (install package html5lib) instead of html.parser seems 
to do the trick: it inserts  right before the next , and one 
before the closing  . On my system the same happens when I don't 
specify a parser, but IIRC that's a bit fragile because other systems 
can choose different parsers of you don't explicity specify one.


Just now I noticed: when I don't specify a parser, BeautifulSoup emits a 
warning with the parser it selected. In one of my venv's it's html5lib, 
in another it's lxml. Both seem to get a correct result.


--

"I love science, and it pains me to think that to so many are terrified
of the subject or feel that choosing science means you cannot also
choose compassion, or the arts, or be awed by nature. Science is not
meant to cure us of mystery, but to reinvent and reinvigorate it."
-- Robert Sapolsky

--
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Chris Angelico
On Mon, 24 Oct 2022 at 18:43, Roel Schroeven  wrote:
>
> Op 24/10/2022 om 4:29 schreef Chris Angelico:
> > Parsing ancient HTML files is something Beautiful Soup is normally
> > great at. But I've run into a small problem, caused by this sort of
> > sloppy HTML:
> >
> > from bs4 import BeautifulSoup
> > # See: https://gsarchive.net/gilbert/plays/princess/tennyson/tenniv.htm
> > blob = b"""
> > 
> > 'THERE sinks the nebulous star we call the Sun,
> > If that hypothesis of theirs be sound,'
> > Said Ida;' let us down and rest:' and we
> > Down from the lean and wrinkled precipices,
> > By every coppice-feather'd chasm and cleft,
> > Dropt thro' the ambrosial gloom to where below
> > No bigger than a glow-worm shone the tent
> > Lamp-lit from the inner. Once she lean'd on me,
> > Descending; once or twice she lent her hand,
> > And blissful palpitations in the blood,
> > Stirring a sudden transport rose and fell.
> > 
> > """
> > soup = BeautifulSoup(blob, "html.parser")
> > print(soup)
> >
> >
> > On this small snippet, it works acceptably, but puts a large number of
> >  tags immediately before the . On the original file (see
> > link if you want to try it), this blows right through the default
> > recursion limit, due to the crazy number of "nested" list items.
> >
> > Is there a way to tell BS4 on parse that these  elements end at
> > the next , rather than waiting for the final ? This would
> > make tidier output, and also eliminate most of the recursion levels.
> >
> Using html5lib (install package html5lib) instead of html.parser seems
> to do the trick: it inserts  right before the next , and one
> before the closing  . On my system the same happens when I don't
> specify a parser, but IIRC that's a bit fragile because other systems
> can choose different parsers of you don't explicity specify one.
>

Ah, cool. Thanks. I'm not entirely sure of the various advantages and
disadvantages of the different parsers; is there a tabulation
anywhere, or at least a list of recommendations on choosing a suitable
parser?

I'm dealing with a HUGE mess of different coding standards, all the
way from 1990s-level stuff (images for indentation, tables for
formatting, and ) up through HTML4 (a good few
of the pages have at least some  tags and declare their
encodings, mostly ISO-8859-1 or similar), to fairly modern HTML5.
There's even a couple of pages that use frames - yes, the old style
with a  block in case the browser can't handle it. I went
with html.parser on the expectation that it'd give the best "across
all standards" results, but I'll give html5lib a try and see if it
does better.

Would rather not try to use different parsers for different files, but
if necessary, I'll figure something out.

(For reference, this is roughly 9000 HTML files that have to be
parsed. Doing things by hand is basically not an option.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Roel Schroeven

Op 24/10/2022 om 4:29 schreef Chris Angelico:

Parsing ancient HTML files is something Beautiful Soup is normally
great at. But I've run into a small problem, caused by this sort of
sloppy HTML:

from bs4 import BeautifulSoup
# See: https://gsarchive.net/gilbert/plays/princess/tennyson/tenniv.htm
blob = b"""

'THERE sinks the nebulous star we call the Sun,
If that hypothesis of theirs be sound,'
Said Ida;' let us down and rest:' and we
Down from the lean and wrinkled precipices,
By every coppice-feather'd chasm and cleft,
Dropt thro' the ambrosial gloom to where below
No bigger than a glow-worm shone the tent
Lamp-lit from the inner. Once she lean'd on me,
Descending; once or twice she lent her hand,
And blissful palpitations in the blood,
Stirring a sudden transport rose and fell.

"""
soup = BeautifulSoup(blob, "html.parser")
print(soup)


On this small snippet, it works acceptably, but puts a large number of
 tags immediately before the . On the original file (see
link if you want to try it), this blows right through the default
recursion limit, due to the crazy number of "nested" list items.

Is there a way to tell BS4 on parse that these  elements end at
the next , rather than waiting for the final ? This would
make tidier output, and also eliminate most of the recursion levels.

Using html5lib (install package html5lib) instead of html.parser seems 
to do the trick: it inserts  right before the next , and one 
before the closing  . On my system the same happens when I don't 
specify a parser, but IIRC that's a bit fragile because other systems 
can choose different parsers of you don't explicity specify one.


--
"I love science, and it pains me to think that to so many are terrified
of the subject or feel that choosing science means you cannot also
choose compassion, or the arts, or be awed by nature. Science is not
meant to cure us of mystery, but to reinvent and reinvigorate it."
-- Robert Sapolsky

--
https://mail.python.org/mailman/listinfo/python-list


Re: Typing: Is there a "cast operator"?

2022-10-24 Thread Thomas Passin

On 10/23/2022 11:14 PM, Dan Stromberg wrote:

On Sun, Oct 23, 2022 at 2:11 PM Paulo da Silva <
p_d_a_s_i_l_v_a...@nonetnoaddress.pt> wrote:


Hello!

I am in the process of "typing" of some of my scripts.
Using it should help a lot to avoid some errors.
But this is new for me and I'm facing some problems.

Let's I have the following code (please don't look at the program content):

f=None  # mypy naturally assumes Optional(int) because later, at open,
it is assigned an int.
..
if f is None:
 f=os.open(...
..
if f is not None:
 os.write(f, ...)
..
if f is not None:
 os.close(f)

When I use mypy, it claims
Argument 1 to "write" has incompatible type "Optional[int]"; expected "int"
Argument 1 to "close" has incompatible type "Optional[int]"; expected "int"

How to solve this?
Is there a way to specify that when calling os.open f is an int only?

I use None a lot for specify uninitialized vars.



I've found that mypy understands simple assert statements.

So if you:
if f is not None:
 assert f is not None
 os.write(f, ...)

You might be in good shape.


I'm not very familiar with anything but the simplest typing cases as 
yet, but mypy is happy with these two fragments.


if f:
os.write(int(f)) # f must be an int if it is not None, so we can 
cast it to int.


Or something like this (substitute write() for print() as needed) -

from typing import Optional, Any

def f1(x:int)->Optional[int]:
if x == 42:
return x
return None

def zprint(arg:Any):
if type(arg) == int:
print(arg)

y0 = f1(0)  # None
y42 = f1(42) # 42

zprint(y0)  # Prints nothing
zprint(y42) # Prints 42

Another possibility that mypy is happy with (and probably the simplest) 
- just declare g:int = None instead of g = None:


g: int = None
def yprint(arg: int):
if arg:
yprint(arg)
else:
print('arg is None')

yprint(g)  # Prints "arg is None"


And **please** let's not go doing this kind of redundant and inelegant 
construction:


if f is not None:
assert f is not None
os.write(f, ...)
--
https://mail.python.org/mailman/listinfo/python-list