[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-07 Thread Tiago Wright
Tiago Wright added the comment: Attached is a .py file with 32 test cases for the Sniff class, 18 that fail, 14 that pass. My hope is that these samples can be used to improve the delimiter detection code. -Tiago -- Added file: http://bugs.python.org/file40149/testround8.py

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread Tiago Wright
Tiago Wright added the comment: I've run the Sniffer against the same data set, but varied the size of the sample given to the code. It seems that feeding it more data actually seems to make the results less accurate. Table attached. On Thu, Aug 6, 2015 at 12:29 PM R. David Murray

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread Tiago Wright
Tiago Wright added the comment: It seems the HTML file did not come through correctly. Trying a text version, please view this in a monospace font: | Sniffer | Human | , | ; | \t | \ | space|Except | : | ) | c | e |

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread R. David Murray
R. David Murray added the comment: Yes, much better :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24787 ___ ___ Python-bugs-list mailing

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread Tiago Wright
Tiago Wright added the comment: I apologize, it seems the text table got line wrapped. This time as a TXT attachment. -Tiago On Thu, Aug 6, 2015 at 12:22 PM Tiago Wright rep...@bugs.python.org wrote: Tiago Wright added the comment: -- Added file:

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread R. David Murray
R. David Murray added the comment: Your best bet is to attach an ascii text file as an uploaded file. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24787 ___

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-06 Thread Tiago Wright
Tiago Wright added the comment: Table attached. -Tiago On Wed, Aug 5, 2015 at 8:14 PM Skip Montanaro rep...@bugs.python.org wrote: Skip Montanaro added the comment: Tiago, sorry, but your last post with results is completely unintelligible. Can you toss the table in a file and attach it

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-05 Thread Skip Montanaro
Skip Montanaro added the comment: Tiago, sorry, but your last post with results is completely unintelligible. Can you toss the table in a file and attach it instead? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24787

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-05 Thread Tiago Wright
Tiago Wright added the comment: I've run the Sniffer against 1614 csv files on my computer and compared the delimiter it detects to what I have set manually. Here are the results: SnifferHuman,;\t\(blank)Error:)ceMpGrand TotalError rate,498 2 110 1 5122.7%; 1 10.0%\t3

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-04 Thread Peter Otten
Peter Otten added the comment: The sniffer actually changes its mind in the fourth line: Python 3.4.0 (default, Jun 19 2015, 14:20:21) [GCC 4.8.2] on linux Type help, copyright, credits or license for more information. import csv csv.Sniffer().sniff(\ ... Invoice File,Credit Memo,Amount

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-04 Thread Skip Montanaro
Skip Montanaro added the comment: I should have probably pointed out that the Sniffer class is the unloved stepchild of the csv module. In my experience it is rarely necessary. You either: * Are reading CSV files which are about what Excel would produce with its default settings or * Know

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-04 Thread R. David Murray
R. David Murray added the comment: If you look at the algorithm it is doing some fancy things with metrics, but does have a 'preferred delimiters' list that it checks. It is possible things could be improved either by tweaking the threshold or by somehow giving added weight to the metrics

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-04 Thread Tiago Wright
Tiago Wright added the comment: I agree that the parameters are easily deduced for any one csv file after a quick inspection. The reason I went searching for a good sniffer was that I have ~2100 csv files of slightly different formats coming from different sources. In some cases, a csv file is

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-03 Thread Tiago Wright
New submission from Tiago Wright: csv.Sniffer().sniff() guesses M for the delimiter of the first dataset below. The same error occurs when the , is replaced by \t. However, it correctly guesses , for the second dataset. ---Dataset 1 Invoice File,Credit Memo,Amount

[issue24787] csv.Sniffer guesses M instead of \t or , as the delimiter

2015-08-03 Thread Skip Montanaro
Skip Montanaro added the comment: How are you calling the sniff() method? Note that it takes a sample of the CSV file. For example, this works for me: f = open(sniff1.csv) dialect = csv.Sniffer().sniff(next(open(sniff1.csv))) dialect.delimiter ',' dialect.lineterminator '\r\n' where