On Thursday, September 30, 2021 at 9:20:37 AM UTC+8, hongy...@gmail.com wrote: > On Thursday, September 30, 2021 at 5:20:04 AM UTC+8, Peter J. Holzer wrote: > > On 2021-09-29 01:22:03 -0700, hongy...@gmail.com wrote: > > > I tried to convert a xls file into csv with the following command, but > > > failed: > > > > > > $ in2csv --sheet 'Sheet1' 2021-2022-1.xls > > > XLRDError: Unsupported format, or corrupt file: Expected BOF record; > > > found b'\r\n\r\n\r\n\r\n' > > > > > > The above testing file is located at here [1]. > > > > > > [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls > > Why is that file name .xls when it's obviously an HTML file? > Good catch! Thank you for pointing this out. This file is automatically > exported from my university's teaching management system, and it was assigned > the .xls extension by default.
According to the above comment, after I change the extension to html, the following python code will do the trick: import sys import pandas as pd if len(sys.argv) != 2: print('Usage: ' + sys.argv[0] + ' input-file') exit(1) myhtml_pd = pd.read_html(sys.argv[1]) #In [25]: len(myhtml_pd) #Out[25]: 3 for i in myhtml_pd[2].index: if i > 0: for j in myhtml_pd[2].columns: if j >1 and not pd.isnull(myhtml_pd[2].loc[i][j]): print(myhtml_pd[2].loc[i][j]) HZ -- https://mail.python.org/mailman/listinfo/python-list