On Thursday, September 30, 2021 at 9:20:37 AM UTC+8, [email protected] wrote:
> On Thursday, September 30, 2021 at 5:20:04 AM UTC+8, Peter J. Holzer wrote:
> > On 2021-09-29 01:22:03 -0700, [email protected] wrote:
> > > I tried to convert a xls file into csv with the following command, but
> > > failed:
> > >
> > > $ in2csv --sheet 'Sheet1' 2021-2022-1.xls
> > > XLRDError: Unsupported format, or corrupt file: Expected BOF record;
> > > found b'\r\n\r\n\r\n\r\n'
> > >
> > > The above testing file is located at here [1].
> > >
> > > [1] https://github.com/hongyi-zhao/temp/blob/master/2021-2022-1.xls
> > Why is that file name .xls when it's obviously an HTML file?
> Good catch! Thank you for pointing this out. This file is automatically
> exported from my university's teaching management system, and it was assigned
> the .xls extension by default.
According to the above comment, after I change the extension to html, the
following python code will do the trick:
import sys
import pandas as pd
if len(sys.argv) != 2:
print('Usage: ' + sys.argv[0] + ' input-file')
exit(1)
myhtml_pd = pd.read_html(sys.argv[1])
#In [25]: len(myhtml_pd)
#Out[25]: 3
for i in myhtml_pd[2].index:
if i > 0:
for j in myhtml_pd[2].columns:
if j >1 and not pd.isnull(myhtml_pd[2].loc[i][j]):
print(myhtml_pd[2].loc[i][j])
HZ
--
https://mail.python.org/mailman/listinfo/python-list