On Fri, 2023-02-24 at 19:33 +0100, Paul Gevers wrote:
> Hi Diane,
>
> On 23-02-2023 08:12, Diane Trout wrote:
> > the version of python3-xlrd 1.2.0-3 in unstable/testing is too old
> > to
> > be used with pandas 1.5.3. (See Bug #1031701).
>
> Do I understand correctly that this isn't an issue from the point of
> python3-xlrd and that only pandas is effected? While investigating
> for
> this reply I noticed src:pandas doesn't even have a dependency in any
> of
> its binaries.
It looks like the xlrd dependency was commented out because the Debian
version is too old, though apparently that was done 7 months ago.
https://salsa.debian.org/science-team/pandas/-/blob/main/debian/control#L45
Here's the pandas module that conditionally uses xlrd if it's
available.
https://salsa.debian.org/science-team/pandas/-/blob/main/pandas/io/excel/_xlrd.py
>
> > As it is a really common
> > workflow to use pandas to read excel files, it'd be nice if the
> > version
> > of xlrd in bookworm was compatible.
>
> As the maintainer of pandas, do you consider it an RC issue that
> pandas
> can't convert it? I guess not because you say "it'd be nice" and you
> don't even have the required dependency. How severe do you consider
> this
> issue for pandas? pandas has a quite extensive autopkgtest, doesn't
> it
> cover this use case? Apparently you knew this earlier, why do you
> bring
> this up now?
The issue is somewhere between a minor and a normal bug, it breaks a
small component of the library.
I wouldn't claim to be a maintainer of pandas, I feel Rebecca Palmer
has been doing the vast amount of work keeping pandas updated in
Debian.
I started investigating this up after my coworker ran into while trying
to process an .xls file. And when I looked, saw someone else had also
recently filed the same bug report.
>
> > Because of the freeze I wanted to check if it was appropriate to
> > upload
> > the new version,
>
> I'd hope that the "rules" are clear:
> https://release.debian.org/testing/freeze_policy.html#soft. You can
> contact the Release Team if you need further clarification.
>
> > and what kind of warning I should give to the other
> > developers.
>
> It depends. I'm worried about what you write below.
That's fair.
The counter argument is that xlrd's support for handling the xml based
.xslx files was unsafe since Python 3.9, and it has been recommended to
switching to another package like openpyxl to handle xlsx files for a
while.
(Release from xlrd announcement for thread mentioning the removal, and
then goes into discussing the security issues)
https://groups.google.com/g/python-excel/c/IRa8IWq_4zk/m/Af8-hrRnAgAJ
The reason the issue doesn't show up much is .xls files are deprecated
by nearly everyone, this only shows up when you're reading old data or
generated by old software.
The reason this is likely a minor issue, is there's a simple work
around which is to convert your xls file to a xlsx file.
Here's Pandas's discussion about deprecating xlrd for xlsx files.
https://github.com/pandas-dev/pandas/issues/28547
> > Here's the list of packages I found that have any relationship to
> > python-xlrd, if it looked like the autopkgtests actually tested
> > using
> > the xlrd library and what the level of declared dependency is.
> > (none
> > means the package lacks autopackage tests)
> >
> > > nemo | none | Recommends |
> > > odoo-14 | none | Depends |
> > > ofxstatement-plugins | none | Depends |
> > > psychopy | unlikely | Depends |
> > > python3-agateexcel | yes | Depends |
> > > python3-canmatrix | no | Recommends |
> > > python3-drslib | no | Recommends |
> > > python3-glue | yes | Depends |
> > > python3-pyspectral | probably | Suggests |
> > > python3-rows | unlikely | Recommends |
> > > python3-tablib | unlikely | Depends |
> > > visidata | none | Build-Depends |
> > > vistrails | none | Build-Depends |
> > > python-xrt | none | Build-Depends |
> > > pyutilib | none | Build-Depends |
>
> If I read everything correctly, it seems like you're too late with
> this
> change.
With a bit more wakefulness, I looked through the packages that have
any dependency on xlrd.
I think odoo-14 is the package most likely to have issues. They use
xlrd and seem to expect to be able to read and write xls & xlsx files
using xlrd. Needless to say, updating xlrd would then break the ability
to process xlsx files. Though of course the xlrd upstream thinks that's
unreliable, and I have no idea how important this feature is to them.
(the odoo repository also has tests, and someone could in theory write
autopkgtests for it)
I couldn't figure out what pyspectral is doing.
These packages ofxstatement-plugins, psychopy, python3-agateexcel,
python3-rows, python3-tablib, and visidata appear