[jira] [Commented] (CSV-227) first column always quoting when multilingual language, when not on second column
[ https://issues.apache.org/jira/browse/CSV-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940104#comment-16940104 ] Gary D. Gregory commented on CSV-227: - A PR on GitHub with tests would help ;) > first column always quoting when multilingual language, when not on second > column > - > > Key: CSV-227 > URL: https://issues.apache.org/jira/browse/CSV-227 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.5 >Reporter: Jisun, Shin >Priority: Major > > when including multilingual character (utf-8 encoding), > CSVPrinter always quote only first column, not other columns. > > {code:java} > // example code > CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.MINIMAL); > CSVPrinter printer = new CSVPrinter(System.out, format); > List temp = new ArrayList(); > temp.add(new String[] { "ㅁㅎㄷㄹ", "ㅁㅎㄷㄹ", "", "test2" }); > temp.add(new String[] { "한글3", "hello3", "3한글3", "test3" }); > temp.add(new String[] { "", "hello4", "", "test4" }); > for (String[] temp1 : temp) { > printer.printRecord(temp1); > } > printer.close(); > {code} > > result => > "ㅁㅎㄷㄹ",ㅁㅎㄷㄹ,,test2 > "한글3",hello3,3한글3,test3 > "",hello4,,test4 > > i found the code. > multilingual charaters are out of 0x7E. first record and multilinguage > always print quotes. > > {code:java} > // CSVFormat.class > ... > 1173: char c = value.charAt(pos); > 1174: > 1175: // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA = %x20-21 / > %x23-2B / %x2D-7E > 1176: if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B && c < > 0x2D || c > 0x7E)) { > 1177: quote = true; > 1178: } else if (c <= COMMENT) { > ...{code} > > would you fix this bug? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CSV-227) first column always quoting when multilingual language, when not on second column
[ https://issues.apache.org/jira/browse/CSV-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924858#comment-16924858 ] Yuji Konishi commented on CSV-227: -- All columns were not quoting when the following code was executed. {code:java} @Test public void testFoo() throws IOException { CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.MINIMAL); CSVPrinter printer = new CSVPrinter(System.out, format); List temp = new ArrayList(); temp.add(new String[] { "ㅁㅎㄷㄹ", "ㅁㅎㄷㄹ", "", "test2" }); temp.add(new String[] { "한글3", "hello3", "3한글3", "test3" }); temp.add(new String[] { "", "hello4", "", "test4" }); for (String[] temp1 : temp) { printer.printRecord(temp1); } printer.close(); } {code} ㅁㅎㄷㄹ,ㅁㅎㄷㄹ,,test2 한글3,hello3,3한글3,test3 "",hello4,,test4 $ git log commit 1a7c6140825bd7b3abe73c5dd732b090acc84b61 (HEAD -> master, origin/master, origin/HEAD) > first column always quoting when multilingual language, when not on second > column > - > > Key: CSV-227 > URL: https://issues.apache.org/jira/browse/CSV-227 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.5 >Reporter: Jisun, Shin >Priority: Major > > when including multilingual character (utf-8 encoding), > CSVPrinter always quote only first column, not other columns. > > {code:java} > // example code > CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.MINIMAL); > CSVPrinter printer = new CSVPrinter(System.out, format); > List temp = new ArrayList(); > temp.add(new String[] { "ㅁㅎㄷㄹ", "ㅁㅎㄷㄹ", "", "test2" }); > temp.add(new String[] { "한글3", "hello3", "3한글3", "test3" }); > temp.add(new String[] { "", "hello4", "", "test4" }); > for (String[] temp1 : temp) { > printer.printRecord(temp1); > } > printer.close(); > {code} > > result => > "ㅁㅎㄷㄹ",ㅁㅎㄷㄹ,,test2 > "한글3",hello3,3한글3,test3 > "",hello4,,test4 > > i found the code. > multilingual charaters are out of 0x7E. first record and multilinguage > always print quotes. > > {code:java} > // CSVFormat.class > ... > 1173: char c = value.charAt(pos); > 1174: > 1175: // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA = %x20-21 / > %x23-2B / %x2D-7E > 1176: if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B && c < > 0x2D || c > 0x7E)) { > 1177: quote = true; > 1178: } else if (c <= COMMENT) { > ...{code} > > would you fix this bug? > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CSV-227) first column always quoting when multilingual language, when not on second column
[ https://issues.apache.org/jira/browse/CSV-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867455#comment-16867455 ] Daniel Cattlin commented on CSV-227: With the default "QuoteMode.MINIMAL" I've seen some pretty weird behaviour too. It's pretty easy to reproduce if you use 2 columns of data that contain the same values and do a side by side comparison. Here are some example data that was output by the CSV writer with unexpected quoting: Notice that any row that starts with a unicode character gets the first field in that row quoted but not the second field which is the same - this includes the escape character, which I find a bit odd. I also checked any fields with the delimiter are quoted just fine. I saw a similar question on Stack Overflow [https://stackoverflow.com/questions/36663273/unexpected-quoting-in-apache-commons-csv] {code:java} "[QElmqgucZ",[QElmqgucZ "`K^bPRa\Xm",`K^bPRa\Xm NJ[\LWwY`Z,NJ[\LWwY`Z c[n`zOk]qv,c[n`zOk]qv y[KIphm]Bk,y[KIphm]Bk "\rin\toDOP",\rin\toDOP McLbuXeP]a,McLbuXeP]a "\x`U^BHnVj",\x`U^BHnVj "_\MzHJA]RO",_\MzHJA]RO XslXnTQOEc,XslXnTQOEc "-UHlnX\hNu",-UHlnX\hNu ObGYlN_`g`,ObGYlN_`g` "[FazYv\vtd",[FazYv\vtd{code} > first column always quoting when multilingual language, when not on second > column > - > > Key: CSV-227 > URL: https://issues.apache.org/jira/browse/CSV-227 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.5 >Reporter: Jisun, Shin >Priority: Major > > when including multilingual character (utf-8 encoding), > CSVPrinter always quote only first column, not other columns. > > {code:java} > // example code > CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.MINIMAL); > CSVPrinter printer = new CSVPrinter(System.out, format); > List temp = new ArrayList(); > temp.add(new String[] { "ㅁㅎㄷㄹ", "ㅁㅎㄷㄹ", "", "test2" }); > temp.add(new String[] { "한글3", "hello3", "3한글3", "test3" }); > temp.add(new String[] { "", "hello4", "", "test4" }); > for (String[] temp1 : temp) { > printer.printRecord(temp1); > } > printer.close(); > {code} > > result => > "ㅁㅎㄷㄹ",ㅁㅎㄷㄹ,,test2 > "한글3",hello3,3한글3,test3 > "",hello4,,test4 > > i found the code. > multilingual charaters are out of 0x7E. first record and multilinguage > always print quotes. > > {code:java} > // CSVFormat.class > ... > 1173: char c = value.charAt(pos); > 1174: > 1175: // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA = %x20-21 / > %x23-2B / %x2D-7E > 1176: if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B && c < > 0x2D || c > 0x7E)) { > 1177: quote = true; > 1178: } else if (c <= COMMENT) { > ...{code} > > would you fix this bug? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CSV-227) first column always quoting when multilingual language, when not on second column
[ https://issues.apache.org/jira/browse/CSV-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813984#comment-16813984 ] Jisun, Shin commented on CSV-227: - Sorry, I did not explain it enough. I want to strip quotes. So. set "QuoteMode.MINIMAL". And after second column, i works. but first column doesn't. > first column always quoting when multilingual language, when not on second > column > - > > Key: CSV-227 > URL: https://issues.apache.org/jira/browse/CSV-227 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.5 >Reporter: Jisun, Shin >Priority: Major > > when including multilingual character (utf-8 encoding), > CSVPrinter always quote only first column, not other columns. > > {code:java} > // example code > CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.MINIMAL); > CSVPrinter printer = new CSVPrinter(System.out, format); > List temp = new ArrayList(); > temp.add(new String[] { "ㅁㅎㄷㄹ", "ㅁㅎㄷㄹ", "", "test2" }); > temp.add(new String[] { "한글3", "hello3", "3한글3", "test3" }); > temp.add(new String[] { "", "hello4", "", "test4" }); > for (String[] temp1 : temp) { > printer.printRecord(temp1); > } > printer.close(); > {code} > > result => > "ㅁㅎㄷㄹ",ㅁㅎㄷㄹ,,test2 > "한글3",hello3,3한글3,test3 > "",hello4,,test4 > > i found the code. > multilingual charaters are out of 0x7E. first record and multilinguage > always print quotes. > > {code:java} > // CSVFormat.class > ... > 1173: char c = value.charAt(pos); > 1174: > 1175: // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA = %x20-21 / > %x23-2B / %x2D-7E > 1176: if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B && c < > 0x2D || c > 0x7E)) { > 1177: quote = true; > 1178: } else if (c <= COMMENT) { > ...{code} > > would you fix this bug? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CSV-227) first column always quoting when multilingual language, when not on second column
[ https://issues.apache.org/jira/browse/CSV-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730462#comment-16730462 ] Amit Chaurasia commented on CSV-227: [~Trichotomy], it works as you expect it to when QuoteMode.ALL. CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.ALL); > first column always quoting when multilingual language, when not on second > column > - > > Key: CSV-227 > URL: https://issues.apache.org/jira/browse/CSV-227 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.5 >Reporter: Jisun, Shin >Priority: Major > > when including multilingual character (utf-8 encoding), > CSVPrinter always quote only first column, not other columns. > > {code:java} > // example code > CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.MINIMAL); > CSVPrinter printer = new CSVPrinter(System.out, format); > List temp = new ArrayList(); > temp.add(new String[] { "ㅁㅎㄷㄹ", "ㅁㅎㄷㄹ", "", "test2" }); > temp.add(new String[] { "한글3", "hello3", "3한글3", "test3" }); > temp.add(new String[] { "", "hello4", "", "test4" }); > for (String[] temp1 : temp) { > printer.printRecord(temp1); > } > printer.close(); > {code} > > result => > "ㅁㅎㄷㄹ",ㅁㅎㄷㄹ,,test2 > "한글3",hello3,3한글3,test3 > "",hello4,,test4 > > i found the code. > multilingual charaters are out of 0x7E. first record and multilinguage > always print quotes. > > {code:java} > // CSVFormat.class > ... > 1173: char c = value.charAt(pos); > 1174: > 1175: // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA = %x20-21 / > %x23-2B / %x2D-7E > 1176: if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B && c < > 0x2D || c > 0x7E)) { > 1177: quote = true; > 1178: } else if (c <= COMMENT) { > ...{code} > > would you fix this bug? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)