[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-07 Thread Tiago Wright
Tiago Wright added the comment: Attached is a .py file with 32 test cases for the Sniff class, 18 that fail, 14 that pass. My hope is that these samples can be used to improve the delimiter detection code. -Tiago -- Added file: http://bugs.python.org/file40149/testround8.py

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread Tiago Wright
Tiago Wright added the comment: I've run the Sniffer against the same data set, but varied the size of the sample given to the code. It seems that feeding it more data actually seems to make the results less accurate. Table attached. On Thu, Aug 6, 2015 at 12:29 PM R. David Murray rep

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread Tiago Wright
Tiago Wright added the comment: It seems the HTML file did not come through correctly. Trying a text version, please view this in a monospace font: | Sniffer | Human | , | ; | \t | \ | space|Except | : | ) | c | e

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread Tiago Wright
Tiago Wright added the comment: I apologize, it seems the text table got line wrapped. This time as a TXT attachment. -Tiago On Thu, Aug 6, 2015 at 12:22 PM Tiago Wright rep...@bugs.python.org wrote: Tiago Wright added the comment: -- Added file: http://bugs.python.org/file40140

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread Tiago Wright
Tiago Wright added the comment: Table attached. -Tiago On Wed, Aug 5, 2015 at 8:14 PM Skip Montanaro rep...@bugs.python.org wrote: Skip Montanaro added the comment: Tiago, sorry, but your last post with results is completely unintelligible. Can you toss the table in a file and attach

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-05 Thread Tiago Wright
Tiago Wright added the comment: I've run the Sniffer against 1614 csv files on my computer and compared the delimiter it detects to what I have set manually. Here are the results: SnifferHuman,;\t\(blank)Error:)ceMpGrand TotalError rate,498 2 110 1 5122.7%; 1 10.0%\t3

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-04 Thread Tiago Wright
Tiago Wright added the comment: I agree that the parameters are easily deduced for any one csv file after a quick inspection. The reason I went searching for a good sniffer was that I have ~2100 csv files of slightly different formats coming from different sources. In some cases, a csv file

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-03 Thread Tiago Wright
New submission from Tiago Wright: csv.Sniffer().sniff() guesses M for the delimiter of the first dataset below. The same error occurs when the , is replaced by \t. However, it correctly guesses , for the second dataset. ---Dataset 1 Invoice File,Credit Memo,Amount Claimed,Description