Tilman Hausherr created TIKA-4278:
-------------------------------------

             Summary: TextAndCSVParser doesn't detect semicolon separated file
                 Key: TIKA-4278
                 URL: https://issues.apache.org/jira/browse/TIKA-4278
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 2.9.2
            Reporter: Tilman Hausherr


I ran the code from the attached SO issue and yes it doesn't detect semicolon 
separated files. The reason is this line in {{TextAndCSVParser.java}}:
{code:java}
private static final char[] DEFAULT_DELIMITERS = new char[]\{',', '\t'};
{code}
This is later uses by {{CSVSniffer}}. For some reason the other delimiters 
(pipe, colon and semicolon) aren't in that array, although they are in 
{{CHAR_TO_STRING_DELIMITER_MAP}}. I modified {{DEFAULT_DELIMITERS}} and now it 
works for semicolon.

Can I change this by adding the missing delimiters or was there a reason that I 
missed?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to