Tiago Wright added the comment:
Attached is a .py file with 32 test cases for the Sniff class, 18 that
fail, 14 that pass.
My hope is that these samples can be used to improve the delimiter
detection code.
-Tiago
--
Added file: http://bugs.python.org/file40149/testround8.py
Tiago Wright added the comment:
I've run the Sniffer against the same data set, but varied the size of the
sample given to the code. It seems that feeding it more data actually seems
to make the results less accurate. Table attached.
On Thu, Aug 6, 2015 at 12:29 PM R. David Murray rep
Tiago Wright added the comment:
It seems the HTML file did not come through correctly. Trying a text
version, please view this in a monospace font:
| Sniffer
|
Human | , | ; | \t | \ | space|Except | : | ) |
c | e
Tiago Wright added the comment:
I apologize, it seems the text table got line wrapped. This time as a TXT
attachment.
-Tiago
On Thu, Aug 6, 2015 at 12:22 PM Tiago Wright rep...@bugs.python.org wrote:
Tiago Wright added the comment:
--
Added file: http://bugs.python.org/file40140
Tiago Wright added the comment:
Table attached.
-Tiago
On Wed, Aug 5, 2015 at 8:14 PM Skip Montanaro rep...@bugs.python.org
wrote:
Skip Montanaro added the comment:
Tiago, sorry, but your last post with results is completely
unintelligible. Can you toss the table in a file and attach
Tiago Wright added the comment:
I've run the Sniffer against 1614 csv files on my computer and compared the
delimiter it detects to what I have set manually. Here are the results:
SnifferHuman,;\t\(blank)Error:)ceMpGrand TotalError rate,498 2
110 1 5122.7%; 1 10.0%\t3
Tiago Wright added the comment:
I agree that the parameters are easily deduced for any one csv file after a
quick inspection. The reason I went searching for a good sniffer was that I
have ~2100 csv files of slightly different formats coming from different
sources. In some cases, a csv file
New submission from Tiago Wright:
csv.Sniffer().sniff() guesses M for the delimiter of the first dataset below.
The same error occurs when the , is replaced by \t. However, it correctly
guesses , for the second dataset.
---Dataset 1
Invoice File,Credit Memo,Amount Claimed,Description