On Fri, 2023-02-24 at 19:33 +0100, Paul Gevers wrote: > Hi Diane, > > On 23-02-2023 08:12, Diane Trout wrote: > > the version of python3-xlrd 1.2.0-3 in unstable/testing is too old > > to > > be used with pandas 1.5.3. (See Bug #1031701). > > Do I understand correctly that this isn't an issue from the point of > python3-xlrd and that only pandas is effected? While investigating > for > this reply I noticed src:pandas doesn't even have a dependency in any > of > its binaries.
It looks like the xlrd dependency was commented out because the Debian version is too old, though apparently that was done 7 months ago. https://salsa.debian.org/science-team/pandas/-/blob/main/debian/control#L45 Here's the pandas module that conditionally uses xlrd if it's available. https://salsa.debian.org/science-team/pandas/-/blob/main/pandas/io/excel/_xlrd.py > > > As it is a really common > > workflow to use pandas to read excel files, it'd be nice if the > > version > > of xlrd in bookworm was compatible. > > As the maintainer of pandas, do you consider it an RC issue that > pandas > can't convert it? I guess not because you say "it'd be nice" and you > don't even have the required dependency. How severe do you consider > this > issue for pandas? pandas has a quite extensive autopkgtest, doesn't > it > cover this use case? Apparently you knew this earlier, why do you > bring > this up now? The issue is somewhere between a minor and a normal bug, it breaks a small component of the library. I wouldn't claim to be a maintainer of pandas, I feel Rebecca Palmer has been doing the vast amount of work keeping pandas updated in Debian. I started investigating this up after my coworker ran into while trying to process an .xls file. And when I looked, saw someone else had also recently filed the same bug report. > > > Because of the freeze I wanted to check if it was appropriate to > > upload > > the new version, > > I'd hope that the "rules" are clear: > https://release.debian.org/testing/freeze_policy.html#soft. You can > contact the Release Team if you need further clarification. > > > and what kind of warning I should give to the other > > developers. > > It depends. I'm worried about what you write below. That's fair. The counter argument is that xlrd's support for handling the xml based .xslx files was unsafe since Python 3.9, and it has been recommended to switching to another package like openpyxl to handle xlsx files for a while. (Release from xlrd announcement for thread mentioning the removal, and then goes into discussing the security issues) https://groups.google.com/g/python-excel/c/IRa8IWq_4zk/m/Af8-hrRnAgAJ The reason the issue doesn't show up much is .xls files are deprecated by nearly everyone, this only shows up when you're reading old data or generated by old software. The reason this is likely a minor issue, is there's a simple work around which is to convert your xls file to a xlsx file. Here's Pandas's discussion about deprecating xlrd for xlsx files. https://github.com/pandas-dev/pandas/issues/28547 > > Here's the list of packages I found that have any relationship to > > python-xlrd, if it looked like the autopkgtests actually tested > > using > > the xlrd library and what the level of declared dependency is. > > (none > > means the package lacks autopackage tests) > > > > > nemo | none | Recommends | > > > odoo-14 | none | Depends | > > > ofxstatement-plugins | none | Depends | > > > psychopy | unlikely | Depends | > > > python3-agateexcel | yes | Depends | > > > python3-canmatrix | no | Recommends | > > > python3-drslib | no | Recommends | > > > python3-glue | yes | Depends | > > > python3-pyspectral | probably | Suggests | > > > python3-rows | unlikely | Recommends | > > > python3-tablib | unlikely | Depends | > > > visidata | none | Build-Depends | > > > vistrails | none | Build-Depends | > > > python-xrt | none | Build-Depends | > > > pyutilib | none | Build-Depends | > > If I read everything correctly, it seems like you're too late with > this > change. With a bit more wakefulness, I looked through the packages that have any dependency on xlrd. I think odoo-14 is the package most likely to have issues. They use xlrd and seem to expect to be able to read and write xls & xlsx files using xlrd. Needless to say, updating xlrd would then break the ability to process xlsx files. Though of course the xlrd upstream thinks that's unreliable, and I have no idea how important this feature is to them. (the odoo repository also has tests, and someone could in theory write autopkgtests for it) I couldn't figure out what pyspectral is doing. These packages ofxstatement-plugins, psychopy, python3-agateexcel, python3-rows, python3-tablib, and visidata appear to also depend on/recommend openpyxl so they likely use the xlrd for .xls files and openpyxl for .xsx files as xlrd has been recommending. python3-canmatrix uses a different package python3-xlsxwriter to deal with xlsx files https://salsa.debian.org/python-team/packages/python-canmatrix/-/blob/debian/main/setup.py#L104 Nemo looks to only be using xlrd for older .xls files, and has a different tool for the newer files. They seem to be using mimetypes and use this block for .xlsx files. https://salsa.debian.org/search?search=vnd.openxmlformats-officedocument.spreadsheetml.sheet&nav_source=navbar&project_id=17703&group_id=2992&search_code=true&repository_ref=master and this block for .xls files https://salsa.debian.org/cinnamon-team/nemo/-/blob/master/search-helpers/mso-xls.nemo_search_helper python3-drslib appears to be expecting to be used on .xls files. (looking through) https://sources.debian.org/src/drslib/0.3.1.p3-2/drslib/p_cmip5/init.py/ vistrails only lists xlrd as a build depends, and it's tests seems to think it might work with both xls and xlrx files, but the test code in the package seems to only test xls files. And as an aside, I found that python-xrt probably should remove python3-xlrd from it's build dependencies as the package doesn't seem to use it. https://codesearch.debian.net/search?q=package%3Apython-xrt+xlrd Ultimately the argument that this is a relatively minor feature, cuts both ways. It both suggests the risk of updating is relatively low, but also there's less reason to update. Thank you for your time evaluating this request. Diane
signature.asc
Description: This is a digitally signed message part