On 2019-03-26 19:55, Adam Funk wrote:
Hi,
I have a Python 3 (using 3.6.7) program that reads a TSV file, does
some churning with the data, and writes a TSV file out.
#v+
print('reading', options.input_file)
with open(options.input_file, 'r', encoding='utf-8-sig') as f:
for line in f.readlines():
row = line.split('\t')
# DO STUFF WITH THE CELLS IN THE ROW
# ...
print('writing', options.output_file)
with open(options.output_file, 'w', encoding='utf-8') as f:
# MAKE THE HEADER list of str
f.write('\t'.join(header) + '\n')
for doc_id in sorted(all_ids):
# CREATE A ROW list of str FOR EACH DOCUMENT ID
f.write('\t'.join(row) + '\n')
#v-
I noticed that the file command on the output returns "UTF-8 Unicode
text, with very long lines, with LF, NEL line terminators".
I'd never come across NEL terminators until now, and I've never
(AFAIK) created a file with them before. Any idea why this is
happening?
(I tried changing the input encoding from 'utf-8-sig' to 'utf-8' but
got the same results with the output.)
Does the input contain any NEL? Do the strings that you write out
contain them?
--
https://mail.python.org/mailman/listinfo/python-list