[
https://issues.apache.org/jira/browse/FLINK-36627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hector Miuler Malpica Gallegos updated FLINK-36627:
---------------------------------------------------
Description:
I have error in read csv with charset ISO-8859, my error is the following:
{{{color:#de350b}_Caused by: java.io.CharConversionException: Invalid UTF-8
middle byte 0x41 (at char #1247, byte #1246): check content encoding, does not
look like UTF-8_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportInvalidOther(UTF8Reader.java:520)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportDeferredInvalid(UTF8Reader.java:531)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.read(UTF8Reader.java:177)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.loadMore(CsvDecoder.java:458)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder._nextUnquotedString(CsvDecoder.java:782)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.nextString(CsvDecoder.java:732)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:963)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvParser.nextFieldName(CsvParser.java:763)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:321)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:283)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:199)_{color}}}
{{{color:#de350b} _... 11 more_{color}}}
{{My code is the following:}}
{color:#0747a6}_{{{}val env =
StreamExecutionEnvironment.createLocalEnvironment(){}}}{{{}val csvFormat =
CsvReaderFormat.forPojo(Empresa::class.java){}}}_{color}
{color:#0747a6}_{{val csvSource = FileSource}}_{color}
{color:#0747a6}_{{.forRecordStreamFormat(csvFormat,
Path("/miuler/PadronRUC_202410.csv"))}}_{color}
{color:#0747a6}_{{.build()}}_{color}
{color:#0747a6}_{{val empresaStreamSource = env.fromSource(csvSource,
WatermarkStrategy.noWatermarks(), "CSV Source")}}_{color}
{color:#0747a6}_{{empresaStreamSource.print()}}_{color}
{color:#0747a6}_{{env.execute("Load CSV")}}_{color}
My dependencies:
_{color:#0747a6}{{val kotlinVersion = "1.20.0"}}{color}_
_{color:#0747a6}{{dependencies {}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-shaded-jackson:2.15.3-19.0")}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-core:$kotlinVersion")}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-runtime:$kotlinVersion")}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-runtime-web:$kotlinVersion")}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-clients:$kotlinVersion")}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-streaming-java:$kotlinVersion")}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-csv:$kotlinVersion")}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-connector-base:$kotlinVersion")}}{color}_
_{color:#0747a6}{{implementation("org.apache.flink:flink-connector-files:$kotlinVersion")}}{color}_
_{color:#0747a6}}{color}_
was:
I have error in read csv with charset ISO-8859, my error is the following:
{{{color:#de350b}_Caused by: java.io.CharConversionException: Invalid UTF-8
middle byte 0x41 (at char #1247, byte #1246): check content encoding, does not
look like UTF-8_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportInvalidOther(UTF8Reader.java:520)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportDeferredInvalid(UTF8Reader.java:531)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.read(UTF8Reader.java:177)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.loadMore(CsvDecoder.java:458)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder._nextUnquotedString(CsvDecoder.java:782)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.nextString(CsvDecoder.java:732)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:963)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvParser.nextFieldName(CsvParser.java:763)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:321)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:283)_{color}}}
{{{color:#de350b} _at
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:199)_{color}}}
{{{color:#de350b} _... 11 more_{color}}}
{{My code is the following:}}
{{{}{color:#0747a6}_val env =
StreamExecutionEnvironment.createLocalEnvironment()_{color}{}}}{{{}{color:#0747a6}_val
csvFormat = CsvReaderFormat.forPojo(Empresa::class.java)_{color}{}}}
{{{color:#0747a6}_val csvSource = FileSource_{color}}}
{{{color:#0747a6}_.forRecordStreamFormat(csvFormat,
Path("/miuler/PadronRUC_202410.csv"))_{color}}}
{{{color:#0747a6}_.build()_{color}}}
{{val empresaStreamSource = env.fromSource(csvSource,
WatermarkStrategy.noWatermarks(), "CSV Source")}}
{{empresaStreamSource.print()}}
{{env.execute("Load CSV")}}
my dependencies:
{{{color:#0747a6}val kotlinVersion = "1.20.0"{color}}}
{{{color:#0747a6}// FLINK{color}}}
{{{color:#0747a6}dependencies {{color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-shaded-jackson:2.15.3-19.0"){color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-core:$kotlinVersion"){color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-runtime:$kotlinVersion"){color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-runtime-web:$kotlinVersion"){color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-clients:$kotlinVersion"){color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-streaming-java:$kotlinVersion"){color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-csv:$kotlinVersion"){color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-connector-base:$kotlinVersion"){color}}}
{{{color:#0747a6}
implementation("org.apache.flink:flink-connector-files:$kotlinVersion"){color}}}
{{{color:#0747a6}}{color}}}
> Failure to process a CSV file in Flink due to a character encoding mismatch:
> the file is in ISO-8859 and the application expects UTF-8.
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-36627
> URL: https://issues.apache.org/jira/browse/FLINK-36627
> Project: Flink
> Issue Type: Bug
> Reporter: Hector Miuler Malpica Gallegos
> Priority: Major
>
> I have error in read csv with charset ISO-8859, my error is the following:
> {{{color:#de350b}_Caused by: java.io.CharConversionException: Invalid UTF-8
> middle byte 0x41 (at char #1247, byte #1246): check content encoding, does
> not look like UTF-8_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportInvalidOther(UTF8Reader.java:520)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportDeferredInvalid(UTF8Reader.java:531)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.read(UTF8Reader.java:177)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.loadMore(CsvDecoder.java:458)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder._nextUnquotedString(CsvDecoder.java:782)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.impl.CsvDecoder.nextString(CsvDecoder.java:732)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:963)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvParser.nextFieldName(CsvParser.java:763)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:321)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:283)_{color}}}
> {{{color:#de350b} _at
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:199)_{color}}}
> {{{color:#de350b} _... 11 more_{color}}}
>
>
> {{My code is the following:}}
> {color:#0747a6}_{{{}val env =
> StreamExecutionEnvironment.createLocalEnvironment(){}}}{{{}val csvFormat =
> CsvReaderFormat.forPojo(Empresa::class.java){}}}_{color}
> {color:#0747a6}_{{val csvSource = FileSource}}_{color}
> {color:#0747a6}_{{.forRecordStreamFormat(csvFormat,
> Path("/miuler/PadronRUC_202410.csv"))}}_{color}
> {color:#0747a6}_{{.build()}}_{color}
> {color:#0747a6}_{{val empresaStreamSource = env.fromSource(csvSource,
> WatermarkStrategy.noWatermarks(), "CSV Source")}}_{color}
> {color:#0747a6}_{{empresaStreamSource.print()}}_{color}
> {color:#0747a6}_{{env.execute("Load CSV")}}_{color}
>
>
> My dependencies:
> _{color:#0747a6}{{val kotlinVersion = "1.20.0"}}{color}_
> _{color:#0747a6}{{dependencies {}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-shaded-jackson:2.15.3-19.0")}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-core:$kotlinVersion")}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-runtime:$kotlinVersion")}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-runtime-web:$kotlinVersion")}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-clients:$kotlinVersion")}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-streaming-java:$kotlinVersion")}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-csv:$kotlinVersion")}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-connector-base:$kotlinVersion")}}{color}_
>
> _{color:#0747a6}{{implementation("org.apache.flink:flink-connector-files:$kotlinVersion")}}{color}_
> _{color:#0747a6}}{color}_
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)