Re: XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
On Thursday, September 30, 2021 at 5:20:04 AM UTC+8, Peter J. Holzer wrote: > On 2021-09-29 01:22:03 -0700, hongy...@gmail.com wrote: > > I tried to convert a xls file into csv with the following command, but > > failed: > > > > $ in2csv --sheet 'Sheet1' 2021-2022-1.xls > > XLRDError: Unsupported format, or corrupt file: Expected BOF record; found > > b'\r\n\r\n\r\n\r\n' > > > > The above testing file is located at here [1]. > > > > [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls > Why is that file name .xls when it's obviously an HTML file? Good catch! Thank you for pointing this out. This file is automatically exported from my university's teaching management system, and it was assigned the .xls extension by default. HZ -- https://mail.python.org/mailman/listinfo/python-list
Re: OT: AttributeError
On 30/09/21 7:28 am, dn wrote: Oh yes! The D2 kit - I kept those books for years... https://www.electronixandmore.com/adam/temp/6800trainer/mek6800d2.html My 6800 system was nowhere near as fancy as that! It was the result of replacing the CPU in my homebrew Miniscamp. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: OT: AttributeError
Ah, Z80s (deep sigh). Those were the days! You could disassemble the entire CP/M operating system (including the BIOS), and still have many Kb to play with! Real programmers don't need gigabytes! On 29/09/2021 03:03, 2qdxy4rzwzuui...@potatochowder.com wrote: On 2021-09-29 at 09:21:34 +1000, Chris Angelico wrote: On Wed, Sep 29, 2021 at 9:10 AM <2qdxy4rzwzuui...@potatochowder.com> wrote: On 2021-09-29 at 11:38:22 +1300, dn via Python-list wrote: For those of us who remember/can compute in binary, octal, hex, or decimal as-needed: Why do programmers confuse All Hallows'/Halloween for Christmas Day? That one is also very old. (Yes, I know the answer. No, I will not spoil it for those who might not.) What do I have to do to gain the insight necessary to have discovered that question and answer on my own? You'd have to be highly familiar with numbers in different notations, to the extent that you automatically read 65 and 0x41 as the same number ... I do that. And I have done that, with numbers that size, since the late 1970s (maybe the mid 1970s, for narrow definitions of "different"). There's at least one more [sideways, twisted] leap to the point that you even think of translating the names of those holidays into an arithmetic riddle. ... Or, even better, to be able to read off a hex dump and see E8 03 and instantly read it as "1,000 little-endian". 59535 big endian. Warningm flamebait ahead: Who thinks in little endian? (I was raised on 6502s and 680XX CPUs; 8080s and Z80s always did things backwards.) -- https://mail.python.org/mailman/listinfo/python-list
RE: XML Considered Harmful
I think that to make electricity comprehend, you need a room temperature superconductor. The Cooper Pairs took a while to comprehend but now ... I think, seriously, we have established the problems with guessing that others are using the language in a way we assume. So how many comprehensions does Python have? [] - list comprehension {} - dictionary OR set comprehension () - generator expression Tuples are incomprehensible and I wonder if any other comprehensions might make sense to add, albeit we may need new symbols. -Original Message- From: Python-list On Behalf Of Michael F. Stemper Sent: Wednesday, September 29, 2021 9:04 AM To: python-list@python.org Subject: Re: XML Considered Harmful On 28/09/2021 18.21, Greg Ewing wrote: > On 29/09/21 4:37 am, Michael F. Stemper wrote: >> I'm talking about something made >> from tons of iron and copper that is oil-filled and rotates at 1800 rpm. > > To avoid confusion, we should rename them "electricity comprehensions". Hah! -- Michael F. Stemper If you take cranberries and stew them like applesauce they taste much more like prunes than rhubarb does. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
On 2021-09-29 01:22:03 -0700, hongy...@gmail.com wrote: > I tried to convert a xls file into csv with the following command, but failed: > > $ in2csv --sheet 'Sheet1' 2021-2022-1.xls > XLRDError: Unsupported format, or corrupt file: Expected BOF record; found > b'\r\n\r\n\r\n\r\n' > > The above testing file is located at here [1]. > > [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls Why is that file name .xls when it's obviously an HTML file? hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
Re: OT: AttributeError
On 29/09/2021 19.16, Greg Ewing wrote: > On 29/09/21 3:03 pm, 2qdxy4rzwzuui...@potatochowder.com wrote: >> Who thinks in little >> endian? (I was raised on 6502s and 680XX CPUs; 8080s and Z80s always >> did things backwards.) > > The first CPU I wrote code for was a National SC/MP, which doesn't > have an endianness, since it never deals with more than a byte at > a time. The second was a 6800, which is big-endian. That's definitely > more convenient when you're hand-assembling code! I can see the > advantages of little-endian when you're implementing a CPU, though. Oh yes! The D2 kit - I kept those books for years... https://www.electronixandmore.com/adam/temp/6800trainer/mek6800d2.html -- Regards, =dn -- https://mail.python.org/mailman/listinfo/python-list
Re: NUmpy
Am 29.09.21 um 18:16 schrieb Jorge Conforte: Hi, I have a netcdf file "uwnd_850_1981.nc" and I'm using the commands to read it: Your code is incomplete: from numpy import dtype fileu ='uwnd_850_1981.nc' ncu = Dataset(fileu,'r') Where is "Dataset" defined? uwnd=ncu.variables['uwnd'][:] and I had: :1: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations I didn't how I have this message. My numpy verison is 1.21.2. Please, how can I solve this. First, it is only a warning, therefore it should still work. Second, the problem is not in the code that you posted. POssibly in the definition of "Dataset". Maybe the netCDF-File contains boolean values and the package you use to read it should be updated? Christian -- https://mail.python.org/mailman/listinfo/python-list
ANN: Wing Python IDE 8.0.4 has been released
Wing 8.0.4 adds Close Unmodified Others to the editor tab's context menu, documents using sitecustomize to automatically start debug, fixes the debugger on some Windows systems, improves icon rendering with some Windows scaling factors, and makes several other improvements. Details: https://wingware.com/news/2021-09-28 Downloads: https://wingware.com/downloads == About Wing == Wing is a light-weight but full-featured Python IDE designed specifically for Python, with powerful editing, code inspection, testing, and debugging capabilities. Wing's deep code analysis provides auto-completion, auto-editing, and refactoring that speed up development. Its top notch debugger works with any Python code, locally or on a remote host, container, or cluster. Wing also supports test-driven development, version control, UI color and layout customization, and includes extensive documentation and support. Wing is available in three product levels: Wing Pro is the full-featured Python IDE for professional developers, Wing Personal is a free Python IDE for students and hobbyists (omits some features), and Wing 101 is a very simplified free Python IDE for beginners (omits many features). Learn more at https://wingware.com/ -- https://mail.python.org/mailman/listinfo/python-list
Re: OT: AttributeError
On 2021-09-29 03:03, 2qdxy4rzwzuui...@potatochowder.com wrote: On 2021-09-29 at 09:21:34 +1000, Chris Angelico wrote: On Wed, Sep 29, 2021 at 9:10 AM <2qdxy4rzwzuui...@potatochowder.com> wrote: > > On 2021-09-29 at 11:38:22 +1300, > dn via Python-list wrote: > > > For those of us who remember/can compute in binary, octal, hex, or > > decimal as-needed: > > Why do programmers confuse All Hallows'/Halloween for Christmas Day? > > That one is also very old. (Yes, I know the answer. No, I will not > spoil it for those who might not.) What do I have to do to gain the > insight necessary to have discovered that question and answer on my own? You'd have to be highly familiar with numbers in different notations, to the extent that you automatically read 65 and 0x41 as the same number ... I do that. And I have done that, with numbers that size, since the late 1970s (maybe the mid 1970s, for narrow definitions of "different"). There's at least one more [sideways, twisted] leap to the point that you even think of translating the names of those holidays into an arithmetic riddle. ... Or, even better, to be able to read off a hex dump and see E8 03 and instantly read it as "1,000 little-endian". 59535 big endian. Warningm flamebait ahead: Who thinks in little endian? (I was raised on 6502s and 680XX CPUs; 8080s and Z80s always did things backwards.) 6502 is little-endian. -- https://mail.python.org/mailman/listinfo/python-list
NUmpy
Hi, I have a netcdf file "uwnd_850_1981.nc" and I'm using the commands to read it: from numpy import dtype fileu ='uwnd_850_1981.nc' ncu = Dataset(fileu,'r') uwnd=ncu.variables['uwnd'][:] and I had: :1: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations I didn't how I have this message. My numpy verison is 1.21.2. Please, how can I solve this. Thanks, Conrado -- https://mail.python.org/mailman/listinfo/python-list
Re: XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
On 29/09/2021 13.10, hongy...@gmail.com wrote: On Wednesday, September 29, 2021 at 5:40:58 PM UTC+8, J.O. Aho wrote: On 29/09/2021 10.22, hongy...@gmail.com wrote: I tried to convert a xls file into csv with the following command, but failed: $ in2csv --sheet 'Sheet1' 2021-2022-1.xls XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n' The above testing file is located at here [1]. [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls Any hints for fixing this problem? You need to delete the 13 first lines in the file Yes. After deleting the top 3 lines, the problem has been fixed. or you see to that your code does first trim the data before start xml parse it. Yes. I really want to do this trick programmatically, but how do I do it without manually editing the file? You could do something like loading the XML into a string (myxmlstr) and then find the fist < in that string xmlstart = myxmlstr.find('<') xmlstr = myxmlstr[xmlstart:] then use the xmlstr in the xml parser, sure not as convenient as loading the file directly to the xml parser. I don't say this is the best way of doing it, I'm sure some python wiz here would have a smarter solution. -- //Aho -- https://mail.python.org/mailman/listinfo/python-list
Re: XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
On Wednesday, September 29, 2021 at 8:12:08 PM UTC+8, J.O. Aho wrote: > On 29/09/2021 13.10, hongy...@gmail.com wrote: > > On Wednesday, September 29, 2021 at 5:40:58 PM UTC+8, J.O. Aho wrote: > >> On 29/09/2021 10.22, hongy...@gmail.com wrote: > >>> I tried to convert a xls file into csv with the following command, but > >>> failed: > >>> > >>> $ in2csv --sheet 'Sheet1' 2021-2022-1.xls > >>> XLRDError: Unsupported format, or corrupt file: Expected BOF record; > >>> found b'\r\n\r\n\r\n\r\n' > >>> > >>> The above testing file is located at here [1]. > >>> > >>> [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls > >>> > >>> Any hints for fixing this problem? > >> You need to delete the 13 first lines in the file > > > > Yes. After deleting the top 3 lines, the problem has been fixed. > > > >> or you see to that your code does first trim the data before start xml > >> parse it. > > > > Yes. I really want to do this trick programmatically, but how do I do it > > without manually editing the file? > You could do something like loading the XML into a string (myxmlstr) How to do this operation? As you have seen, the file refused to be loaded at all. > and then find the fist < in that string > > xmlstart = myxmlstr.find('<') > > xmlstr = myxmlstr[xmlstart:] > > then use the xmlstr in the xml parser, sure not as convenient as loading > the file directly to the xml parser. > > I don't say this is the best way of doing it, I'm sure some python wiz > here would have a smarter solution. Another very strange thing: I trimmed the first 3 lines in the original file and saved it into a new one named as 2021-2022-1-trimmed-top-3-lines.xls. [1] Then I read the file with the following python script named as pandas-excel.py: -- import pandas as pd excel_file='2021-2022-1-trimmed-top-3-lines.xls' #print(pd.ExcelFile(excel_file).sheet_names) newpd=pd.read_excel(excel_file, sheet_name='Sheet1') for i in newpd.index: if i >1: for j in newpd.columns: if int(j.split()[1]) > 2: if not pd.isnull(newpd.loc[i][j]): print(newpd.loc[i][j]) -- $ python pandas-excel.py | sort -u 汽车实用英语 [1-8]周 1-4节 38 汽车楼413基础电气实训室II 汽修1932 汽车车载网络系统的检测与修复 [1-12]周 1-4节 38 汽车楼416安全、舒适系统实训室 汽修1932 OTOH, I also tried to read the file with in2csv as follows: $ in2csv --sheet Sheet1 2021-2022-1-trimmed-top-3-lines.xls 2>/dev/null |tr ',' '\n' | \ sed -re '/^$/d' | sort -u | awk '{print length($0),$0}' | sort -k1n | tail -3 | cut -d ' ' -f2- 汽车实用英语 [1-8]周 1-4节 38 汽车楼413基础电气实训室II 汽修1932 智能网联汽车概论 [1-8]周 6-9节 45 汽车楼511汽车营销策划实训室 汽销1931 汽车车载网络系统的检测与修复 [1-12]周 1-4节 38 汽车楼416安全、舒适系统实训室 汽修1932 As you can see, the above two methods give different results. I'm very puzzled by this phenomenon. Any hints/tips/comments will be greatly appreciated. [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1-trimmed-top-3-lines.xls Regards, HZ -- https://mail.python.org/mailman/listinfo/python-list
Re: XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
On Wednesday, September 29, 2021 at 5:40:58 PM UTC+8, J.O. Aho wrote: > On 29/09/2021 10.22, hongy...@gmail.com wrote: > > I tried to convert a xls file into csv with the following command, but > > failed: > > > > $ in2csv --sheet 'Sheet1' 2021-2022-1.xls > > XLRDError: Unsupported format, or corrupt file: Expected BOF record; found > > b'\r\n\r\n\r\n\r\n' > > > > The above testing file is located at here [1]. > > > > [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls > > > > Any hints for fixing this problem? > You need to delete the 13 first lines in the file Yes. After deleting the top 3 lines, the problem has been fixed. > or you see to that your code does first trim the data before start xml parse > it. Yes. I really want to do this trick programmatically, but how do I do it without manually editing the file? HZ -- https://mail.python.org/mailman/listinfo/python-list
Re: XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
On 29/09/2021 10.22, hongy...@gmail.com wrote: I tried to convert a xls file into csv with the following command, but failed: $ in2csv --sheet 'Sheet1' 2021-2022-1.xls XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n' The above testing file is located at here [1]. [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls Any hints for fixing this problem? You need to delete the 13 first lines in the file or you see to that your code does first trim the data before start xml parse it. -- //Aho -- https://mail.python.org/mailman/listinfo/python-list
Re: XML Considered Harmful
On 28/09/2021 18.21, Greg Ewing wrote: On 29/09/21 4:37 am, Michael F. Stemper wrote: I'm talking about something made from tons of iron and copper that is oil-filled and rotates at 1800 rpm. To avoid confusion, we should rename them "electricity comprehensions". Hah! -- Michael F. Stemper If you take cranberries and stew them like applesauce they taste much more like prunes than rhubarb does. -- https://mail.python.org/mailman/listinfo/python-list
Re: OT: AttributeError
On 29/09/21 3:03 pm, 2qdxy4rzwzuui...@potatochowder.com wrote: Who thinks in little endian? (I was raised on 6502s and 680XX CPUs; 8080s and Z80s always did things backwards.) The first CPU I wrote code for was a National SC/MP, which doesn't have an endianness, since it never deals with more than a byte at a time. The second was a 6800, which is big-endian. That's definitely more convenient when you're hand-assembling code! I can see the advantages of little-endian when you're implementing a CPU, though. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'
I tried to convert a xls file into csv with the following command, but failed: $ in2csv --sheet 'Sheet1' 2021-2022-1.xls XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n' The above testing file is located at here [1]. [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls Any hints for fixing this problem? Regards, HZ -- https://mail.python.org/mailman/listinfo/python-list
Automated data testing, checking, validation, reporting for data assurance
There appear to be a few options for this. Has anyone tested and got experience with automated data testing, validation and reporting? Can anyone enlighten me? Regards, David -- https://mail.python.org/mailman/listinfo/python-list