On 09/02/2015 03:44, Skip Montanaro wrote:
I am trying to process a CSV file using Python 3.5 (CPython tip as of a
week or so ago). According to chardet[1], the file is encoded as utf-8:

 >>> s = open("data/meets-usms.csv", "rb").read()
 >>> len(s)
562272
 >>> import chardet
 >>> chardet.detect(s)
{'encoding': 'utf-8', 'confidence': 0.99}

so I created the reader like so:

         rdr = csv.DictReader(open(csvfile, encoding="utf-8"))

This seems to work. The rows are read and records added to a SQLite3
database. When I go into sqlite3, I get what looks to be raw utf-8 on
output:

% LANG=en_US.UTF-8 sqlite3 topten.db
SQLite version 3.8.5 2014-08-15 22:37:57
Enter ".help" for usage hints.
sqlite> select * from swimmeet where meetname like '%Barracuda%';
sqlite> select count(*) from swimmeet;
0
sqlite> select count(*) from swimmeet;
4171
sqlite> select meetname from swimmeet where meetname like
'%Barracuda%Patrick%';
Anderson Barracudas St. Patrick's Day Swim Meet
Anderson Barracuda Masters - 2010 St. Patrick’s Day Swim Meet
Anderson Barracuda Masters 2011 St. Patrick’s Day Swim Meet
Anderson Barracuda Masters St. Patrick's Day Meet
Anderson Barracuda Masters St. Patrick's Day Meet 2014
Anderson Barracuda Masters 2015 St. Patrick’s Day Swim Meet


How is meetname defined? Is it a varchar or nvarchar?

My only experience is with MS-SQL and C# but reading from a utf-8 encoded file with a StreamReader set to utf-8 and trying to insert that into varchar fields results in similar issues to what you are showing. I changed to using nvarchar and it all start working as expected.



--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to