[ 
https://issues.apache.org/jira/browse/SANDBOX-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847419#action_12847419
 ] 

Peter Koszek commented on SANDBOX-263:
--------------------------------------

RFC 4180 defines commas to be field separators.
The Excel strategy uses the local configuration to identify the separator.

The following approach can help to predict a field separator:

On Windows, read registry key "HKCU\Control Panel\International\sList".
On other systems, try to avoid a collision with the floating point separator 
like this:

         // The following idea is based on a comment from
         // 
http://www.experts-exchange.com/Programming/Languages/Q_24113673.html
         DecimalFormatSymbols dfs = 
DecimalFormatSymbols.getInstance(Locale.getDefault());
         char decimalSeparator = dfs.getDecimalSeparator();
         char listSeparator = ',';
         if (decimalSeparator == listSeparator) {
             // If the floating point separator is a comma, use semi-colon to 
minimize encapsulation
             listSeparator = ';';
         }

CSV should be a standard, Excel is a specific application which uses the CSV 
standard in a special way.
I wouldn't expect a CSV framework to be able to simulate Excel exactly.
CSV based formatting works with every arbitrary separator character.
I expect a CSV framework to fully support the standard and to give me the 
possibility to configure individual solutions.

> Excel strategy uses wrong separator
> -----------------------------------
>
>                 Key: SANDBOX-263
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-263
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: CSV
>            Reporter: Gunnar Wagenknecht
>
> The Excel strategy is defined as follows.
> {code}
>     public static CSVStrategy EXCEL_STRATEGY   = new CSVStrategy(',', '"', 
> COMMENTS_DISABLED, ESCAPE_DISABLED, false, 
>                                                                  false, 
> false, false);
> {code}
> However, when I do a "Save as" in Excel the separator used is actually 
> {{';'}}. Thus, parsing the CSV file as suggested in the JavaDoc of 
> {{CSVParser}} fails.
> {code}
> String[][] data =
>    (new CSVParser(new StringReader("a;b\nc;d"), 
> CSVStrategy.EXCEL_STRATEGY)).getAllValues();
> {code}
> Simple test to reproduce:
> {code}
> import java.io.IOException;
> import java.io.StringReader;
> import org.apache.commons.csv.CSVParser;
> import org.apache.commons.csv.CSVStrategy;
> public class CSVExcelStrategyBug {
>       public static void main(final String[] args) {
>               try {
>                       System.out.println("Using ;");
>                       parse("a;b\nc;d");
>                       System.out.println();
>                       System.out.println("Using ,");
>                       parse("a,b\nc,d");
>               } catch (final IOException e) {
>                       e.printStackTrace();
>               }
>       }
>       private static void parse(final String input) throws IOException {
>               final String[][] data = (new CSVParser(new StringReader(input), 
> CSVStrategy.EXCEL_STRATEGY)).getAllValues();
>               for (final String[] row : data) {
>                       System.out.print("[");
>                       for (final String cell : row) {
>                               System.out.print("(" + cell + ")");
>                       }
>                       System.out.println("]");
>               }
>       }
> }
> {code}
> Actual output:
> {noformat}
> Using ;
> [(a;b)]
> [(c;d)]
> Using ,
> [(a)(b)]
> [(c)(d)]
> {noformat}
> Expected output:
> {noformat}
> Using ;
> [(a)(b)]
> [(c)(d)]
> Using ,
> [(a,b)]
> [(c,d)]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to