Tiago Wright added the comment:
Attached is a .py file with 32 test cases for the Sniff class, 18 that
fail, 14 that pass.
My hope is that these samples can be used to improve the delimiter
detection code.
-Tiago
--
Added file: http://bugs.python.org/file40149/testround8.py
Tiago Wright added the comment:
I've run the Sniffer against the same data set, but varied the size of the
sample given to the code. It seems that feeding it more data actually seems
to make the results less accurate. Table attached.
On Thu, Aug 6, 2015 at 12:29 PM R. David Murray
Tiago Wright added the comment:
It seems the HTML file did not come through correctly. Trying a text
version, please view this in a monospace font:
| Sniffer
|
Human | , | ; | \t | \ | space|Except | : | ) |
c | e |
R. David Murray added the comment:
Yes, much better :)
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24787
___
___
Python-bugs-list mailing
Tiago Wright added the comment:
I apologize, it seems the text table got line wrapped. This time as a TXT
attachment.
-Tiago
On Thu, Aug 6, 2015 at 12:22 PM Tiago Wright rep...@bugs.python.org wrote:
Tiago Wright added the comment:
--
Added file:
R. David Murray added the comment:
Your best bet is to attach an ascii text file as an uploaded file.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24787
___
Tiago Wright added the comment:
Table attached.
-Tiago
On Wed, Aug 5, 2015 at 8:14 PM Skip Montanaro rep...@bugs.python.org
wrote:
Skip Montanaro added the comment:
Tiago, sorry, but your last post with results is completely
unintelligible. Can you toss the table in a file and attach it
Skip Montanaro added the comment:
Tiago, sorry, but your last post with results is completely unintelligible. Can
you toss the table in a file and attach it instead?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24787
Tiago Wright added the comment:
I've run the Sniffer against 1614 csv files on my computer and compared the
delimiter it detects to what I have set manually. Here are the results:
SnifferHuman,;\t\(blank)Error:)ceMpGrand TotalError rate,498 2
110 1 5122.7%; 1 10.0%\t3
Peter Otten added the comment:
The sniffer actually changes its mind in the fourth line:
Python 3.4.0 (default, Jun 19 2015, 14:20:21)
[GCC 4.8.2] on linux
Type help, copyright, credits or license for more information.
import csv
csv.Sniffer().sniff(\
... Invoice File,Credit Memo,Amount
Skip Montanaro added the comment:
I should have probably pointed out that the Sniffer class is the unloved
stepchild of the csv module. In my experience it is rarely necessary. You
either:
* Are reading CSV files which are about what Excel would produce with its
default settings
or
* Know
R. David Murray added the comment:
If you look at the algorithm it is doing some fancy things with metrics, but
does have a 'preferred delimiters' list that it checks. It is possible things
could be improved either by tweaking the threshold or by somehow giving added
weight to the metrics
Tiago Wright added the comment:
I agree that the parameters are easily deduced for any one csv file after a
quick inspection. The reason I went searching for a good sniffer was that I
have ~2100 csv files of slightly different formats coming from different
sources. In some cases, a csv file is
New submission from Tiago Wright:
csv.Sniffer().sniff() guesses M for the delimiter of the first dataset below.
The same error occurs when the , is replaced by \t. However, it correctly
guesses , for the second dataset.
---Dataset 1
Invoice File,Credit Memo,Amount
Skip Montanaro added the comment:
How are you calling the sniff() method? Note that it takes a sample of the CSV
file. For example, this works for me:
f = open(sniff1.csv)
dialect = csv.Sniffer().sniff(next(open(sniff1.csv)))
dialect.delimiter
','
dialect.lineterminator
'\r\n'
where
15 matches
Mail list logo